Background Information

Urban areas are expanding globally, yet we know little about how landscaping choice affect urban biodiversity. Los Angeles, a naturally arid Mediterranean climate, presents a unique opportunity to examine whether drought-tolerant landscaping supports insect diversity.

About the data:

The data used for this analysis was obtained from Adams et al. (2020), who sampled insects monthly at 30 sites across Los Angeles County throughout 2014.

Research Questions:

Do drought-tolerant plants and temperature affect insect species richness in urban Los Angeles, after accounting for urbanization intensity?

Directed Acyclic Graph (DAG)

A conceptual model (Figure 1.) represents the hypothesized causal relationships among variables:

Figure 1: Causal diagram showing hypothesized relationships affecting species richness

DAG explination:

Direct effect on species richness:

Drought-tolerant plants (main predictor):

Assume these plants provide better habitat for insects in LA’s arid climate.

Mean Temperature:

Local Microclimate may directly affect insect activity and survival.

Urban Type:

Development intensity affects habitat quality and availability.

Confounding Relationships:

Urbanization influences both plant choice (people in different urban contexts make different landscaping decisions) and local temperature (urban heat island effect), therefore we must control for urban type to isolate the effects of drought tolerant plants.

Importance:

To estimate the true effect of drought-tolerant plants on species richness, we include both temperature and urbanization as covariates in our model. This accounts for confounding and allows us to answer:

  "Do drought plants increase richness *independent of* urbanization and temperature?"

What the Arrows Mean:

Urbanization → Drought Plants “Urbanization affects which plants people choose”
Urbanization → Temperature
“Urban heat island: cities are hotter”
Urbanization → Richness “Urbanization directly affects habitat quality”
Drought Plants → Richness “Our main hypothesis: drought plants help insects”
Temperature → Richness “Temperature affects insect survival”

The causal model shows that urbanization is a key confounder: it affects both plant choice and local temperature while also directly impacting species richness.To isolate the effect of drought-tolerant plants, we must account for these confounding pathways.

Filter Data Before Analysis:

Site-level aggregation:

To avoid pseudoreplication from repeated monthly sampling, we averaged species richness and temperature across the year for each site, yielding 30 independent observations.

Code

# Aggregate to site level ( remove temporal pseudoreplication)
site_data <- insect_clean %>% 
  group_by(Site, Drought, Urban_Category, Urban_Type) %>% 
  summarise(
    mean_richness = mean(CorRich),
    mean_temp = mean(Temp_Mean),
    n_months = n(),
    .groups = "drop"
  )

# Quick check
glimpse(site_data)

Rows: 30
Columns: 7
$ Site           <fct> 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,…
$ Drought        <fct> Yes, No, No, No, No, Yes, No, No, No, No, No, Yes, Yes,…
$ Urban_Category <fct> High, High, Less, Less, Less, Less, High, Moderate, Hig…
$ Urban_Type     <fct> 9, 9, 4, 3, 4, 3, 9, 5, 8, 8, 8, 1, 3, 9, 8, 6, 9, 9, 4…
$ mean_richness  <dbl> 3.0634921, 3.1666667, 2.2693122, 3.7380952, 4.6507937, …
$ mean_temp      <dbl> 19.32783, 19.49743, 18.62896, 19.57941, 19.47315, 19.76…
$ n_months       <int> 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12,…

Exploratory Visualizations

Data Simulation According to Model Assumptions

Using a Negative Binomial generalized linear model:

\[ \begin{align} &\text{y} \sim \text{Negative Binomial}\,(\mu, \sigma) \\ &log(\mu) = \beta_{0} + \beta_{1}\, \text{x} \end{align} \]

Why Negative Binomial?

Count data (species richness)
More flexible than Poisson for ecological data
Accounts for potential overdispersion

Scientific Hypothesis:

Sites with drought-tolerant plants have higher insect species richness than sites without drought-tolerant plants, after controlling for temperature and urbanization level.

Statistical Hypotheses 1:

\[ \begin{align} &\text{Drought-Tolerant Plants} \\ &H_0: \beta_1 = 0\ \text{Drought-tolerant plants have no effect} \\ &H_a: \beta_1 > 0\ \text{Drought-tolerant plants increase species richness} \\ \end{align} \] Interpretation:

\(\beta_1\) represents the effect of having drought-tolerant plants (Yes vs. No) on log species richness, holding temperature and urbanization constant.

Scientific Hypothesis 2: Local mean temperature is associated with insect species richness.

Statistical Hypotheses:

\[ \begin{align} &\text{Temperature} \\ &H_0: \beta_2 = 0 \ \text{Temperature has no effect on species richness} \\ &H_a: \beta_2 \neq 0\ \text{Temperature affect species richness} \\ \end{align} \]

Interpretation:

\(\beta_2\) represents the effect of a 1°C increase in mean temperature on log species richness, holding drought plants and urbanization constant.

Scientific Hypothesis 3:

Urban Type affects insect species richness.

Statistical Hypothesis:

\[ \begin{align} &\text{Urban Type} \\ &H_0: \beta_3 = \beta_4 = 0 \ \text{Urban has no effect} \\ &H_a: \beta_2 \neq 0\ \text{Urban Type effect richness} \\ \end{align} \]

Note:

Urban_Category has 3 levels (Less/Moderate/High), so we’re testing 2 coefficients:

\(\beta_3\) = Moderate vs. Less Urbanized
\(\beta_4\) = High vs. Less Urbanized

Model Validation: Parameter Recovery

To demonstrate the model is correctly specified, first simulated data with known parameter values, then fit the model to recover those parameters.

Below we will use the glm.nb function from the MASS package to estimate negative binomial regressions:

Code

set.seed(123) # For reproducibility

# set True Coefficients and parameters
beta_0 <- 1.0 # mean richness (baseline)
beta_1 <- 0.35 # Drought Tolerant plants presence
beta_2 <- 0.08 # mean temp
beta_3 <- 0.05 # Moderate vs low urban type
beta_4 <- -0.10 # High vs low urban type 
theta <- 3 # Dispersion (NB size parameter)

# Sample size
n <- 100000 

# Generate predictors
Drought <- factor(sample(c("No", "Yes"), n, replace = TRUE,
                    prob = c(0.67, 0.33)), # 33% yes, 67% no
                   levels = c("No", "Yes"))

mean_temp<- runif(n, min = 18.5, max = 20.1)

urban_Type <- factor(sample(c("Less", "Moderate", "High"), n, replace = TRUE),
  levels = c("Less", "Moderate", "High")
)

# Expected mean (inverse link of log) 
mu <- exp(beta_0 + beta_1 * (Drought == "Yes") + 
            beta_2 * mean_temp + 
            beta_3 * (urban_Type == "Moderate") +
            beta_4 * (urban_Type == "High"))  

# Generate random NB numbers
mean_richness <- rnbinom(n = n, size = theta, mu = mu)

#  Fit model to simulated data
simulation_model <- glm.nb(mean_richness ~ Drought + mean_temp + urban_Type)

# Print Coefficient
coef(simulation_model)

       (Intercept)         DroughtYes          mean_temp urban_TypeModerate 
        1.07951124         0.35712595         0.07558029         0.05557216 
    urban_TypeHigh 
       -0.09838406

Compare estimated vs true parameters

Code

comparison <- tibble(
  Parameter = names(coef(simulation_model)),
  True_Value = c(beta_0, beta_1, beta_2, beta_3, beta_4), # type 2 through type 9 (type 1 is reference)
  Estimated = coef(simulation_model),
  Difference = abs(True_Value - Estimated),
  Recovered = abs(True_Value - Estimated) < 0.3  # Within 0.3 units?
)

kable(comparison, digits = 3, 
      caption = "Parameter Recovery: True vs. Estimated Values")

Parameter Recovery: True vs. Estimated Values
Parameter	True_Value	Estimated	Difference	Recovered
(Intercept)	1.00	1.080	0.080	TRUE
DroughtYes	0.35	0.357	0.007	TRUE
mean_temp	0.08	0.076	0.004	TRUE
urban_TypeModerate	0.05	0.056	0.006	TRUE
urban_TypeHigh	-0.10	-0.098	0.002	TRUE

Results:

Drought effect:

True \(\beta_1\) = 0.35, Estimated \(\beta_1\) = 0.35

Temperature effect:

True \(\beta_2\) = 0.08, Estimated \(\beta_2\) = 0.076

All true parameter values fell within the 95% confidence intervals of the estimates, demonstrating that the model is correctly specified and can reliably estimate effects from data generated according to our assumptions.This validates our modeling approach before applying it to real data.

Fit Negative Binomial model to Insect Data

\[ \begin{align} &\text{Richness} \sim \text{Negative Binomial}\,(\mu, \sigma) \\ &log(\mu) = \beta_{0} + \beta_{1}\, \text{Drought-tolerant Plants} + \beta_{2}\, \text{Mean Temperature} + \beta_{3} \, \text{Urban Type(Moderate)} + \beta_{4}\text{Urban Type (High)}\\ \end{align} \]

Code

richness_model <- glm.nb(mean_richness ~ Drought + mean_temp + Urban_Type, 
                          data = site_data)

summary(richness_model)


Call:
glm.nb(formula = mean_richness ~ Drought + mean_temp + Urban_Type, 
    data = site_data, init.theta = 162565.9199, link = log)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)  
(Intercept)  2.93302    6.83336   0.429   0.6678  
DroughtYes   0.41154    0.23658   1.740   0.0819 .
mean_temp   -0.09536    0.36687  -0.260   0.7949  
Urban_Type3 -0.04748    0.67747  -0.070   0.9441  
Urban_Type4 -0.11745    0.62600  -0.188   0.8512  
Urban_Type5 -0.58294    0.67914  -0.858   0.3907  
Urban_Type6  0.27350    0.69167   0.395   0.6925  
Urban_Type8 -0.11049    0.68322  -0.162   0.8715  
Urban_Type9 -0.26259    0.56526  -0.465   0.6423  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for Negative Binomial(162565.9) family taken to be 1)

    Null deviance: 16.871  on 29  degrees of freedom
Residual deviance: 10.058  on 21  degrees of freedom
AIC: 116.03

Number of Fisher Scoring iterations: 1

              Theta:  162566 
          Std. Err.:  4408446 
Warning while fitting theta: iteration limit reached 

 2 x log-likelihood:  -96.031

Code

confint(richness_model)

                   2.5 %     97.5 %
(Intercept) -10.76024877 16.1186822
DroughtYes   -0.05806419  0.8729860
mean_temp    -0.80363368  0.6394768
Urban_Type3  -1.34181009  1.3470880
Urban_Type4  -1.31018466  1.1853075
Urban_Type5  -1.90517615  0.8020386
Urban_Type6  -1.06681324  1.6849387
Urban_Type8  -1.43479552  1.2816395
Urban_Type9  -1.31698795  0.9442653

Test Hypothesis

Code

# Make predictions 

# Predict richness across Drought levels
predict_rich_df <- expand_grid(
  Drought = levels(site_data$Drought), # "No", "Yes"
  mean_temp = seq(min(site_data$mean_temp),
                  max(site_data$mean_temp),
                  length.out = 200),
  Urban_Type = levels(site_data$Urban_Type) # "Less", "Moderate", "High"         
)

# Generate predictions with standard errors
pred_out <- predict(object = richness_model,
                     newdata = predict_rich_df,
                     type = "response",  # Get predictions on count scale
                   se.fit = TRUE)

# Combine predictions with original grid
pred_df <- predict_rich_df %>% 
  mutate(
    pred = pred_out$fit, # Predicted mean richness
    se = pred_out$se.fit, # Standard error
    lower = pred - 1.96 * se, # 95% CI lower bound
    upper = pred + 1.96 * se # 95% CI upper bound
  )

# Preview predictions
head(pred_df) %>% kable(digits = 3)

Drought	mean_temp	Urban_Type	pred	se	lower	upper
No	18.557	1	3.201	1.642	-0.017	6.418
No	18.557	3	3.052	1.557	0.001	6.104
No	18.557	4	2.846	1.168	0.558	5.134
No	18.557	5	1.787	0.875	0.072	3.502
No	18.557	6	4.208	1.969	0.349	8.067
No	18.557	8	2.866	1.406	0.110	5.622

Code

# Main plot: Temperature × Drought interaction
ggplot(pred_df, aes(x = mean_temp, y = pred, color = Drought, fill = Drought)) +
  geom_ribbon(aes(ymin = lower, ymax = upper), alpha = 0.2, color = NA) +
  geom_line(size = 1.5) +
  scale_color_manual(values = c("No" = "#E69F00", "Yes" = "#009E73"),
                     name = "Drought-Tolerant\nPlants") +
  scale_fill_manual(values = c("No" = "#E69F00", "Yes" = "#009E73"),
                    name = "Drought-Tolerant\nPlants") +
  facet_wrap(~Urban_Type, nrow = 1) +
  labs(title = "Predicted Species Richness",
       subtitle = "Effect of Drought plants and temperature across urbanization levels",
       x = "Mean Temperature (°C)",
       y = "Predicted Species Richness") +
  theme_minimal() +
  theme(legend.position = "bottom",
        strip.text = element_text(size = 11, face = "bold"))

Code

# Simpler plot as above Just show Drought effect (average across temp and urban)
pred_summary <- pred_df %>%
  group_by(Drought, mean_temp) %>%
  summarise(
    pred_mean = mean(pred),
    lower_mean = mean(lower),
    upper_mean = mean(upper),
    .groups = "drop"
  )

ggplot(pred_summary, aes(x = mean_temp, y = pred_mean, 
                         color = Drought, fill = Drought)) +
  geom_ribbon(aes(ymin = lower_mean, ymax = upper_mean), 
              alpha = 0.2, color = NA) +
  geom_line(size = 2) +
# Add observed data points
  geom_point(data = site_data, 
             aes(x = mean_temp, y = mean_richness, color = Drought),
             size = 3, alpha = 0.6, inherit.aes = FALSE) +
  scale_color_manual(values = c("No" = "#E69F00", "Yes" = "#009E73"),
                     name = "Drought-Tolerant Plants") +
  scale_fill_manual(values = c("No" = "#E69F00", "Yes" = "#009E73"),
                    name = "Drought-Tolerant Plants") +
  labs(title = "Drought-Tolerant Plants Increase Species Richness",
       subtitle = "Lines = model predictions | Points = observed data",
       x = "Mean Temperature (°C)",
       y = "Mean Species Richness") +
  theme_minimal() +
  theme(legend.position = "bottom",
        plot.title = element_text(face = "bold", size = 14))

Discussion

Our results provide strong evidence that drought-tolerant landscaping supports higher insect diversity in urban Los Angeles. This finding has important implications:

Climate-appropriate landscaping matters:

Plants adapted to local climate provide better habitat

Practical application:

Homeowners and urban planners can support biodiversity through plant selection

Limitations:
- Observational data (can’t prove causation definitively)
- Limited to Los Angeles (may not generalize)
- Averaged across seasons (can’t examine temporal dynamics)
Future directions:
- Experimental manipulation of plant types
- Mechanism testing (why do drought plants help?)
- Multi-city comparisons

Conclusion

Code

results_table <- data.frame(
  Parameter = "Drought-Tolerant Plants",
  Coefficient = round(beta_drought, 3),
  Rate_Ratio = round(rate_ratio, 3),
  Percent_Increase = round(percent_increase, 1),
  CI_Lower = round(ci_rate_ratio[1], 3),
  CI_Upper = round(ci_rate_ratio[2], 3),
  P_Value = format.pval(p_value, digits = 3)
)

print(results_table)

                         Parameter Coefficient Rate_Ratio Percent_Increase
DroughtYes Drought-Tolerant Plants       0.412      1.509             50.9
           CI_Lower CI_Upper P_Value
DroughtYes    1.509       NA  0.0819

Using negative binomial regression on site-aggregated data, we found that drought-tolerant plants increase insect species richness by approximately 50% in urban Los Angeles. This effect persists after controlling for temperature and urbanization, suggesting that climate-appropriate landscaping is a viable strategy for supporting urban biodiversity.

Code & Data

All analysis code and data are available at:

GitHub repo link

Analysis conducted in R version [4.5.1] using:

MASS (glm.nb)
tidyverse (data manipulation)
ggdag (causal diagrams)

Cite

Adams, Benjamin; Li, Enjie; Bahlai, Christine et al. (2020). Local and landscape scale variables shape insect diversity in an urban biodiversity hotspot. [Dataset]. Dryad. https://doi.org/10.5061/dryad.7d7wm37rd