Code
library(tidyverse)
library(here)
library(sf)
library(tmap)
library(spatstat)
Olivia Hemond
February 16, 2024
This analysis looks at inland oil spills across the state of California in 2008, as documented by the California Department of Fish and Wildlife Office of Spill Prevention and Response (OSPR).
Data source: California Department of Fish and Wildlife. Oil Spill Incident Tracking. Published Jul 29 2009. Last updated Oct 24 2023. Data download available here.
This analysis had three main goals:
Visualize the locations of 2008 oil spills across the state of California
Identify which counties in the state had the highest number of oil spills that year
Assess whether oil spills are spatially clustered or randomly spaced across the state
Read in California counties shapefile
Read in CSV file containing oil spill data
Convert oil spill dataframe to simple features object
Check the CRS of the counties file; set oil spill sf to same CRS
Create map of California with points denoting oil spills
Make map interactive so the user can zoom and click on points
Spatial join the counties with the oil spill points
Calculate the number of oil spills in each county
Visualize on a static choropleth map to identify counties with highest oil spill incidences
### Read in California counties
ca_counties_sf <- read_sf(here('posts', '2024-02-16-spatial-data', 'data', 'ca_counties'), layer = 'CA_Counties_TIGER2016') %>%
janitor::clean_names() %>%
select(name)
### Read in oil spill csv
oil_df <- read_csv(here('posts', '2024-02-16-spatial-data', 'data', 'oil_spill.csv')) %>%
janitor::clean_names()
### Spatial join counties and oil spills
counties_oil_sf <- st_join(ca_counties_sf, oil_sf)
### Count the number of oil spills in each county
oil_counts_sf <- counties_oil_sf %>%
group_by(name) %>%
summarize(oil_count = n())
### Plot
ggplot(oil_counts_sf) +
geom_sf(aes(fill = oil_count)) +
labs(fill = "Number of Oil Spills") +
scale_fill_gradientn(colors = c("white", "lightblue", "blue", "darkblue")) +
theme_void()
The counties with the greatest number of oil spills, in order, are Los Angeles and San Diego in Southern California, and San Mateo, Alameda, and Contra Costa in Northern California.
### Convert oil spill observations to spatial point pattern (to use with spatstat package)
oil_ppp <- as.ppp(oil_sf)
### Set our observation window to be the extent of California
ca_counties_win <- as.owin(ca_counties_sf)
### Create point pattern dataset
oil_full <- ppp(oil_ppp$x, oil_ppp$y, window = ca_counties_win)
### Make a sequence of distances over which you'll calculate G(r)
r_vec <- seq(0, 20000, by = 200)
### Calculate the actual and theoretical G(r) values, using 100 simulations of CSR for the "theoretical" outcome
gfunction_out <- envelope(oil_full, fun = Gest, r = r_vec, nsim = 100, verbose = FALSE)
### Convert output to dataframe, and pivot to tidy form
gfunction_long <- gfunction_out %>%
as.data.frame() %>%
pivot_longer(cols = obs:hi, names_to = "model", values_to = "g_val")
### Then make a graph in ggplot:
ggplot(data = gfunction_long, aes(x = r, y = g_val, group = model)) +
geom_line(aes(color = model)) +
scale_color_manual(values = c("red", "red", "blue", "black"),
name = "",
labels = c("hi" = "High 95th Percentile",
"lo" = "Low 95th Percentile",
"theo" = "Theoretical Complete Spatial Randomness",
"obs" = "Observed")) +
labs(x = 'Distance (m)', y = 'G(r)') +
theme_minimal() +
theme(legend.position = "bottom")