Basics of Data analysis and visualisations

Visualizing complex data

In conservation science, our data is often complex, multi-dimensional, and filled with uncertainty. Raw data—tables of coordinates, catch numbers, or model outputs—rarely inspires action. The bridge between complex data and meaningful policy change is effective data visualization. A single, powerful graphic can achieve what pages of text cannot: it can create understanding, evoke an emotional response, and provide clear evidence for decision-makers.

Common Pitfalls: How to make a bad visualization

Just as important as knowing what makes a good visualization is recognizing what makes a bad one. Poor visualizations don’t just fail to communicate; they actively mislead, eroding trust and potentially leading to flawed conclusions. By understanding these common traps, you can learn to create clearer, more honest graphics and critically evaluate visualizations made by others.

3D Charts

While they may look flashy, 3D bar charts are a cardinal sin of data visualization. The third dimension serves no purpose and the perspective effect distorts the data, making it impossible for the human eye to accurately compare the heights of the bars. A simple 2D bar chart is always clearer and more honest.

Pie Charts

Pie charts are only effective for showing parts of a single whole with a very small number of categories (ideally, no more than three). As soon as you need to compare the sizes of different slices, or if you have many categories, the chart becomes useless. The human brain is not good at comparing angles; it is excellent at comparing lengths. For this reason, a bar chart is almost always a better choice for showing proportions.

Misleading Line Charts

A line chart’s story can be easily manipulated. Truncating the y-axis (starting it at a value other than zero) can dramatically exaggerate small fluctuations, making them seem significant. Conversely, a poorly chosen scale can flatten a steep trend into obscurity. Another common mistake is using a secondary y-axis, which can create the illusion of a causal relationship between two variables that have nothing in common other than their scale.

Overly Complex Plots

The goal of a visualization is to simplify, not to complicate. A plot cluttered with “chartjunk”—unnecessary gridlines, distracting background images, garish colors, and redundant labels—buries the key message. Every element in a chart should have a purpose. If it doesn’t help the viewer understand the data, it should be removed.

Misleading Maps

In spatial analysis, a common error is creating a choropleth map (where areas are shaded by a value) using raw counts instead of rates or densities. For example, a map showing the total number of shark sightings per state will simply highlight the biggest states, not the states with the highest density of sharks. To tell an accurate story, you must normalize your data, for instance, by calculating sightings per square kilometer or per unit of coastline.

Key Principles for Impactful Visualizations

Know Your Audience, Know Your Goal: A map for a scientific publication will be different from a chart for a policy brief or a graphic for a community meeting.
- Example: For a scientific paper, you might create a complex map showing individual shark tracks colored by behavior (e.g., red for feeding, blue for migrating). For a public outreach campaign, you would simplify this into a single, compelling map showing only the key feeding “hotspots” to argue for their protection.

Tell a Story: A good visualization has a narrative. It guides the viewer to a conclusion. Don’t just show data; present an argument.
- Example: A simple line chart showing a steep decline in the population of a specific shark species over 20 years tells a clear and alarming story of endangerment. If you then add a vertical line indicating when a new fishing regulation was introduced and the decline subsequently slows, your story becomes one of effective intervention.

Simplify and Declutter: The most important elements should be the most visually prominent. Remove anything that doesn’t support the core message (unnecessary gridlines, redundant labels, distracting colors).
- Example: When plotting shark hotspots on a map, use a simple, light-colored basemap (like a grayscale political map) instead of a busy satellite image. This makes your data (the hotspots) the “hero” of the graphic and ensures your message is not lost in visual noise.

Show the Uncertainty: Conservation data is rarely perfect. Ignoring uncertainty is dishonest; visualizing it builds trust and leads to more robust decisions.
- Example: A satellite tag doesn’t give a perfect location. Instead of just a single point on a map, a more honest visualization shows a transparent circle or “error bars” around that point. This clearly communicates to a policymaker that a proposed protected area might need a larger buffer to account for the full potential range of the animal’s location.

Combine Data for Deeper Insight: The most powerful stories often come from layering different types of data.
- Example: Imagine creating a map that layers two key datasets: 1) Red dots showing where satellite-tagged Tiger Sharks spend most of their time, and 2) Blue polygons showing the boundaries of existing marine protected areas (MPAs). If the red dots are mostly outside the blue polygons, you’ve created a powerful, undeniable visual argument that the current MPAs are not protecting Tiger Sharks effectively. This synthesis is what drives policy change.

Through the next two exercises, we will use mock data to analyze and visualize data to clearly highlight information often used for shark and ray conservation. Before we start them, lets install the R packages we will use in these exercises. If you haven’t already got these packages, use the button to find the instructions to do so.

For this course, make sure you have downloaded and installed the most updated versions of the following software:

1. Download R for your relevant operating system from the CRAN website

2. Download RStudio for your relevant operating system from the RStudio website

3. Once you’ve installed the above software, make sure you install the following packages prior to the start of the course

## Packages that are on CRAN

install.packages(c("tidyverse",
                   "sf",
                   "terra",
                   "ggspatial",
                   "ggaminate",
                   "ggrepel",
                   "ggforce",
                   "gifski",
                   "scatterpie",
                   "leaflet",
                   "remotes",
                   "patchwork",
                   "rnaturalearthhires"), 
                 dependencies = TRUE)

## Install packages from GitHub and other external repositories

remotes::install_github("r-spatial/mapview", build_vignettes = TRUE)
install.packages("aniMotum", 
                 repos = c("https://cloud.r-project.org",
                 "https://ianjonsen.r-universe.dev"),
                 dependencies = TRUE)

When downloading from GitHub, the R console will sometimes ask you if it should also update other packages. In most cases, you can skip updating other packages (option [3]) as this often stalls the whole process of downloading the package you want. If there are package dependencies, these will often be downloaded automatically (but not always!). If you want to update the other packages you have, you can do it separately.

Example 1: Fish Market Surveys in the Philippines

Fisheries data from markets provides crucial insight into which species are being caught and where. In this example, we’ll analyze mock data from a 10-year survey of elasmobranchs (sharks and rays) across three major landing sites in the Philippines.

Our Goals:

Map the survey locations and visualize the species composition at each site using pie charts.
Map the fishing effort at the locations
Analyze how the overall catch composition and Catch per unit effort has changed over the 10-year period.

Step 1.1: Load Libraries and Create Mock Market Data

# Load necessary libraries for this analysis
library(tidyverse)
library(ggrepel)
library(sf)
library(ggspatial)
library(patchwork)
library(scatterpie) # For pie charts on maps

# Lets input some mock market data
market_data <- read_csv("https://raw.githubusercontent.com/vinayudyawer/2025_Palawan_BCC/refs/heads/main/Data/market_data.csv")

head(market_data)

## # A tibble: 6 × 13
##    year location     lon   lat `Blue Shark` `Silky Shark` `Cowtail Ray`
##   <dbl> <chr>      <dbl> <dbl>        <dbl>         <dbl>         <dbl>
## 1  2014 Luzon Port  121.  14.5           21            15            69
## 2  2015 Luzon Port  121.  14.5           32            22            70
## 3  2016 Luzon Port  121.  14.5           32            34            70
## 4  2017 Luzon Port  121.  14.5           36            33            64
## 5  2018 Luzon Port  121.  14.5           40            41            68
## 6  2019 Luzon Port  121.  14.5           36            51            62
## # ℹ 6 more variables: `Mako Shark` <dbl>, `Hammerhead Shark` <dbl>,
## #   `Thresher Shark` <dbl>, `Devil Ray` <dbl>, `Whitespotted Guitarfish` <dbl>,
## #   number_of_boats <dbl>

# Prepare the data for the pie chart map
market_summary <- 
  market_data %>%
  group_by(location, lon, lat) %>%
  summarise(across("Blue Shark":number_of_boats, sum))

head(market_summary)

## # A tibble: 6 × 12
## # Groups:   location, lon [6]
##   location       lon   lat `Blue Shark` `Silky Shark` `Cowtail Ray` `Mako Shark`
##   <chr>        <dbl> <dbl>        <dbl>         <dbl>         <dbl>        <dbl>
## 1 Iloilo        122.  10.7         1391           467           698          667
## 2 Luzon Port    121.  14.5          411           426           749          705
## 3 Mindanao Do…  126.   7.1          428           350           725          754
## 4 Palawan       119.   9.7          438           365           762          741
## 5 Sorsogon Bay  124   12.9          430           338           758          739
## 6 Visayas Hub   124.  10.3          411           342           793          720
## # ℹ 5 more variables: `Hammerhead Shark` <dbl>, `Thresher Shark` <dbl>,
## #   `Devil Ray` <dbl>, `Whitespotted Guitarfish` <dbl>, number_of_boats <dbl>

## Lets set up some non-standard basemaps for plotting on maps
esri_sat <- paste0('https://services.arcgisonline.com/arcgis/rest/services/',
                   'World_Imagery/MapServer/tile/${z}/${y}/${x}.jpeg')

esri_topo <- paste0('https://services.arcgisonline.com/arcgis/rest/services/',
                   'World_Street_Map/MapServer/tile/${z}/${y}/${x}.jpeg')

esri_terrain <- paste0('https://services.arcgisonline.com/arcgis/rest/services/',
                     'World_Terrain_Base/MapServer/tile/${z}/${y}/${x}.jpeg')

Step 1.2: Map of Survey Sites with Species Composition

Here, lets plot out the fishing effort and also use scatterpie to embed pie charts onto our ggplot2 map.

# Define a bounding box for the Philippines
ph_bbox <- c(xmin = 117, xmax = 127, ymin = 5, ymax = 19)

# Create a bubble plot map of fishing effort
effort_map <-
  ggplot() +
  annotation_map_tile(type = esri_sat, zoom = 6) +
  geom_spatial_label_repel(data = market_summary, box.padding = 1, size = 3,
                     aes(x = lon, y = lat, label = location)) +
  geom_spatial_point(data = market_summary, color = "red",
                     aes(x = lon, y = lat, size = number_of_boats)) +
  coord_sf(xlim = ph_bbox[1:2], ylim = ph_bbox[3:4], crs = 4326) +
  annotation_scale(location = "bl", line_col = "white", text_col = "white") +
  annotation_north_arrow(location = "tr", 
                         style = north_arrow_fancy_orienteering(line_col = "white",
                                                                text_col = "white")) +
  labs(title = "Fishing effort", size = "Number of boats",
       subtitle = "Number of boats from 2014-2023",
       x = "Longitude", y = "Latitude") +
  theme_minimal() +
  theme(legend.position = "bottom")


# Create a scatterplot map of catch composition
market_map <-
  ggplot() +
  annotation_map_tile(type = esri_sat, zoom = 6) +
  geom_spatial_label_repel(data = market_summary, box.padding = 1, size = 3,
                     aes(x = lon, y = lat, label = location)) +
  geom_scatterpie(data = market_summary, lwd = 0.2, 
                  aes(x = lon, y = lat, r = 0.4), # r controls pie size
                  cols = names(market_summary)[-c(1:3,12)]) +
  coord_sf(xlim = ph_bbox[1:2], ylim = ph_bbox[3:4], crs = 4326) +
  scale_fill_brewer(palette = "Set2", name = "Species") +
  annotation_scale(location = "bl", line_col = "white", text_col = "white") +
  annotation_north_arrow(location = "tr", 
                         style = north_arrow_fancy_orienteering(line_col = "white",
                                                                text_col = "white")) +
  labs(title = "Elasmobranch catch composition",
       subtitle = "Total catch from 2014-2023",
       x = "Longitude", y = "Latitude") +
  guides(fill = guide_legend(nrow = 3, byrow = TRUE)) +
  theme_minimal() +
  theme(legend.position = "bottom")


# Lets put them together
final_map <- wrap_plots(effort_map, market_map)

final_map

Step 1.3: Temporal Analysis of Catch Composition

Now, let’s visualize how the proportion of each species in the catch has changed over time, by each port and summarised across all the data.

## Summarised area chart across all ports
# Prepare data for temporal plot
temporal_data_summary <-
  market_data %>%
  group_by(year) %>%
  summarise(across("Blue Shark":"Whitespotted Guitarfish", sum)) %>%
  pivot_longer(cols = -year, names_to = "species", values_to = "count")

# Create a stacked area chart
temporal_plot_summary <- 
  temporal_data_summary %>% 
  ggplot(aes(x = year, y = count, fill = species)) +
  geom_area(position = 'fill', color="white", linewidth=0.2) +
  scale_y_continuous(labels = scales::percent_format(), expand = c(0,0)) +
  scale_x_continuous(breaks = seq(2014, 2023, by = 1), expand = c(0,0)) +
  scale_fill_brewer(palette = "Set2", name = "Species") +
  labs(title = "Overall change in market catch composition", 
       subtitle = "Across all surveyed sites in the Philippines", 
       x = "Year", y = "Proportion of Total Catch") +
  theme_bw() +
  theme(legend.position = "top") +
  guides(fill = guide_legend(nrow = 2))


temporal_plot_summary

# Create a stacked area chart
temporal_plot_ports <-
  market_data %>%
  pivot_longer(cols = -c(1:4, 13), names_to = "species", values_to = "count") %>% 
  ggplot(aes(x = year, y = count, fill = species)) +
  geom_area(position = 'fill', color="white") +
  scale_y_continuous(labels = scales::percent_format(), name = "Proportion of Total Catch", expand = c(0,0)) +
  scale_x_continuous(breaks = seq(2014, 2023, by = 2), expand = c(0,0)) +
  scale_fill_brewer(palette = "Set2", name = "Species") +
  facet_wrap(~location) +
  labs(title = "Port-specific change in market catch composition",
       x = "Year") +
  theme_bw() +
  theme(legend.position = c(0.68, 0.15), legend.direction = "horizontal")

temporal_plot_ports

Lets use the data to identify trends in total catch per-species and fishing effort

# Create a line chart showing changes in fishing effort over time
fishing_effort_plot <-
  market_data %>% 
  group_by(year, location) %>% 
  summarise(number_of_boats = sum(number_of_boats, na.rm = T)) %>% 
  ggplot(aes(x = year, y = number_of_boats, col = location)) +
  geom_point() +
  geom_path() +
  scale_x_continuous(breaks = 2014:2023, expand = c(0,0)) +
  scale_color_brewer(palette = "Reds", name = "Location") +
  labs(title = "Fishing effort over time", 
       x = "Year", y = "Number of boats per year") +
  theme_bw() +
  theme(legend.position = "top")

# Create a line chart showing change in catch over time
species_plot <-
  market_data %>%
  pivot_longer(cols = -c(1:4, 13), names_to = "species", values_to = "count") %>% 
  ggplot(aes(x = year, y = count, color = species, fill = species)) +
  stat_summary(fun = "mean") + 
  stat_summary(fun = "mean", geom = "line") + 
  stat_summary(fun.data = "mean_cl_boot", geom = "ribbon", alpha = 0.2, color = NA) +
  scale_x_continuous(breaks = 2014:2023, expand = c(0,0)) +
  scale_fill_brewer(palette = "Set2", name = "Species") +
  scale_color_brewer(palette = "Set2", name = "Species") +
  labs(title = "Species-specific total catch over time", 
       x = "Year", y = "Total Catch (count)") +
  theme_bw() +
  theme(legend.position = "top")

temporal_plots <- wrap_plots(fishing_effort_plot, species_plot)

temporal_plots

# Lets integrate the fishing effort into this plot
# calculate catch per unit effort (numbers per boat)
cpue_summary <-
  market_data %>%
  group_by(year) %>%
  summarise(across("Blue Shark":"Whitespotted Guitarfish", ~.x/number_of_boats)) %>%
  pivot_longer(cols = -year, names_to = "species", values_to = "cpue")
  
cpue_plot_species <-
  cpue_summary %>% 
  ggplot(aes(x = year, y = cpue, color = species, fill = species)) +
  stat_summary(fun = "mean") +
  stat_summary(fun = "mean", geom = "line") +
  stat_summary(fun.data = "mean_cl_boot", geom = "ribbon", alpha = 0.2, color = NA) +
  scale_y_continuous(name = "Catch per unit effort (count per boat)") +
  scale_x_continuous(breaks = 2014:2023, expand = c(0,0)) +
  scale_fill_brewer(palette = "Set2", name = "Species") +
  scale_color_brewer(palette = "Set2", name = "Species") +
  labs(title = "Catch per unit effort per species over time",
       x = "Year") +
  theme_bw() +
  theme(legend.position = "top")

cpue_plot_species

This analysis reveals a consistently high catch rates in Whitespotted Guitarfish over time while showing declining catch for Hammerhead Sharks, a trend that could inform conservation priorities.

Example 2: Satellite Tracking in the Philippines

Satellite tags allow us to follow animals in near real-time. Here, we’ll map and animate the movements of three different sharks tagged and tracked in the Philippines.

Our Goals:

Create a static map showing the full tracks of all three sharks.
Create static maps for fishing effort and Marine Protected Areas.
Create an animation of their movements over time.

Step 2.1: Load Libraries

# Load libraries to visualise and make animations of maps
library(gganimate)
library(gifski) 
library(ggforce)
library(terra)
library(mapview)
library(leaflet)
library(aniMotum)
library(rnaturalearthhires)

Step 2.2: Static Map of All Shark Tracks

We will use some mock tracking data, with information on fishing effort and Marine Protected Areas. The Fishing effort data was processed from the Global Fishing Watch dataset.

# Lets import some mock tracking data
all_tracks_sf <- st_read("https://raw.githubusercontent.com/vinayudyawer/2025_Palawan_BCC/refs/heads/main/Data/track_data.geojson")

## Reading layer `track_data' from data source 
##   `https://raw.githubusercontent.com/vinayudyawer/2025_Palawan_BCC/refs/heads/main/Data/track_data.geojson' 
##   using driver `GeoJSON'
## Simple feature collection with 539 features and 10 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 118.1425 ymin: 8.063707 xmax: 127.9466 ymax: 15.42223
## Geodetic CRS:  WGS 84

head(all_tracks_sf)

## Simple feature collection with 6 features and 10 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 118.6513 ymin: 9.701957 xmax: 118.904 ymax: 9.927064
## Geodetic CRS:  WGS 84
##            id       date      lon      lat lc smaj smin eor lonerr laterr
## 1 Whale Shark 2023-01-01 118.9040 9.755130  G   NA   NA  NA     NA     NA
## 2 Whale Shark 2023-01-02 118.7128 9.927064  G   NA   NA  NA     NA     NA
## 3 Whale Shark 2023-01-03 118.6513 9.701957  G   NA   NA  NA     NA     NA
## 4 Whale Shark 2023-01-04 118.7589 9.755544  G   NA   NA  NA     NA     NA
## 5 Whale Shark 2023-01-05 118.7590 9.865780  G   NA   NA  NA     NA     NA
## 6 Whale Shark 2023-01-06 118.8158 9.882642  G   NA   NA  NA     NA     NA
##                    geometry
## 1   POINT (118.904 9.75513)
## 2 POINT (118.7128 9.927064)
## 3 POINT (118.6513 9.701957)
## 4 POINT (118.7589 9.755544)
## 5   POINT (118.759 9.86578)
## 6 POINT (118.8158 9.882642)

# Now lets import spatial data on marine parks and fishing pressure
mpa_sf <- st_read("https://raw.githubusercontent.com/vinayudyawer/2025_Palawan_BCC/refs/heads/main/Data/MPA_data.geojson")

## Reading layer `MPA_data' from data source 
##   `https://raw.githubusercontent.com/vinayudyawer/2025_Palawan_BCC/refs/heads/main/Data/MPA_data.geojson' 
##   using driver `GeoJSON'
## Simple feature collection with 86 features and 6 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 116.8727 ymin: 5.498214 xmax: 126.7402 ymax: 21.18099
## Geodetic CRS:  WGS 84

head(mpa_sf)

## Simple feature collection with 6 features and 6 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 119.1222 ymin: 8.530928 xmax: 124.3418 ymax: 16.3624
## Geodetic CRS:  WGS 84
##                                               NAME MARINE     IUCN_CAT
## 1                     Hundred Island National Park      1 Not Assigned
## 2     Agoo-Damortis Protected Landscape & Seascape      2            V
## 3                            Apo Reef Natural Park      2           II
## 4 Initao-Libertad Protected Landscape and Seascape      2            V
## 5              Island of Alibijaban Wildeness Area      1 Not Assigned
## 6              Malampaya Sound Protected Landscape      1            V
##                          DESIG_ENG     STATUS      NO_TAKE
## 1                    National Park Designated Not Reported
## 2 Protected Landscape and Seascape Designated Not Reported
## 3                     Natural Park Designated Not Reported
## 4 Protected Landscape and Seascape Designated Not Reported
## 5                  Wilderness Area Designated Not Reported
## 6              Protected Landscape Designated Not Reported
##                         geometry
## 1 MULTIPOLYGON (((120.0598 16...
## 2 MULTIPOLYGON (((120.3403 16...
## 3 MULTIPOLYGON (((120.5636 12...
## 4 MULTIPOLYGON (((124.3376 8....
## 5 MULTIPOLYGON (((122.7239 13...
## 6 MULTIPOLYGON (((119.3251 11...

# Lets get some fishing effort data from Global Fishing Watch (already processed)
fishing_effort <- read_csv("https://raw.githubusercontent.com/vinayudyawer/2025_Palawan_BCC/refs/heads/main/Data/GFW_FishingEffort.csv")

head(fishing_effort)

## # A tibble: 6 × 4
##     lon   lat  year fishing_hours
##   <dbl> <dbl> <dbl>         <dbl>
## 1  114.   9.2  2024          8.32
## 2  114.   9.5  2024          2.52
## 3  114    9.2  2024          0.51
## 4  114    9.4  2024          2.1 
## 5  114.   8.8  2023          0.74
## 6  114.   9.5  2024          3.69

Lets plot a static map of the track, showing the position errors for each tag

# Create a non-spatial version for plotting geoms that need x/y aesthetics
all_tracks_df <- 
  all_tracks_sf %>%
  st_drop_geometry()

# Create the static map
static_track_map_with_error <-
  ggplot() +
  annotation_map_tile(type = esri_topo, zoom = 6) +
  # --- Add Core Track Data ---
  geom_spatial_path(data = all_tracks_sf, size = 1, crs = 4326,
                     aes(x = lon, y = lat, color = id)) +
  # --- Add Error Visualizations ---
  # 1. Tiger Shark: Error Ellipses
  # Note: converting meters to degrees is an approximation and varies with latitude.
  # 1 degree of latitude is ~111.32 km.
  geom_ellipse(data = filter(all_tracks_df, id %in% "Tiger Shark"),
               aes(x0 = lon, y0 = lat,
                   a = smaj / 111320,
                   b = smin / 111320,
                   angle = eor),
               fill = "firebrick", alpha = 0.2, color = NA) +
  # 2. Thresher Shark: Error Ellipses
  geom_ellipse(data = filter(all_tracks_df, id %in% "Thresher Shark"),
               aes(x0 = lon, y0 = lat, 
                   a = abs(lonerr), 
                   b = abs(laterr), 
                   angle = 0),
               fill = "forestgreen", alpha = 0.2, color = NA) +
  # Use facet_wrap to give each shark its own mini-plot with its track
  facet_wrap(~id, nrow = 1) +
  coord_sf(crs = 4326) +
  annotation_scale() +
  labs(title = "Satellite Tracks of Three Shark Species with Location Error",
       x = "Longitude", y = "Latitude") +
  scale_color_brewer(palette = "Dark2") +
  theme_minimal() +
  theme(legend.position = "none")


static_track_map_with_error

Lets visualize the fishing effort and MPA data

# process the fishing effort data to a summary raster
fishing_effort_summary <-
  fishing_effort %>% 
  group_by(lon, lat) %>% 
  summarise(fishing_hours = sum(fishing_hours, na.rm = T)) %>% 
  terra::rast()

# we have to add the spatial reference
crs(fishing_effort_summary) <- "epsg:4326"

# Lets plot the overall pattern in fishing effort
ggplot() +
  annotation_map_tile(type = esri_terrain, zoom = 5) +
  layer_spatial(fishing_effort_summary) +
  layer_spatial(mpa_sf, fill = "darkgreen", col = NA, alpha = 0.7) +
  coord_sf(xlim = range(fishing_effort$lon),
           ylim = range(fishing_effort$lat),
           crs = 4326) +
  scale_fill_gradientn(
    trans = 'log10',
    colors = viridisLite::plasma(100), 
    na.value = NA,
    labels = scales::comma) +
  labs(x = NULL, y = NULL, fill = "Fishing hours\n(2021-2024)", 
       caption = "Source: Global Fishing Watch") +
  theme_bw() +
  theme(legend.key.height = unit(2, "cm"))

# Lets plot fishing effort over time
fishing_over_time <-
  ggplot() +
  annotation_map_tile(type = esri_terrain, zoom = 5) +
  geom_tile(data = fishing_effort,
            aes(x = lon, y = lat, fill = fishing_hours)) +
  coord_sf(xlim = range(fishing_effort$lon),
           ylim = range(fishing_effort$lat),
           crs = 4326) +
  scale_fill_gradientn(
    trans = 'log10',
    colors = viridisLite::plasma(100), 
    na.value = NA,
    labels = scales::comma) +
  facet_wrap(~year, nrow = 1) +
  labs(x = NULL, y = NULL, fill = "Fishing hours", 
       caption = "Source: Global Fishing Watch") +
  theme_bw() +
  theme(legend.position = "bottom", legend.key.width = unit(2, "cm"))

fishing_over_time

# Interactive (but static) plot of the raw data with MPA and fishing effort data

# First lets convert the points into trajectories
all_tracks_traj <-
  all_tracks_sf %>% 
  group_by(id) %>% 
  summarise(do_union = FALSE) %>% 
  st_cast("LINESTRING")

m <- 
  mapview(all_tracks_traj, zcol = "id", map.type = "Esri.WorldTerrain",
        burst = TRUE, legend = FALSE, homebutton = FALSE) +
  mapview(all_tracks_sf, zcol = "id", 
          burst = TRUE, cex = 3, legend = FALSE, homebutton = FALSE) +
  mapview(log10(fishing_effort_summary + 1), layer.name = "Fishing Effort",
          legend = FALSE, homebutton = FALSE, na.color = "transparent", alpha = 0.8) +
  mapview(mpa_sf, color = NA, col.regions = "darkgreen", col.alpha = 0.8, alpha = 0,
          layer.name = "MPA", legend = FALSE, homebutton = FALSE)

m@map %>% 
  addLayersControl(baseGroups = unique(all_tracks_df$id), 
                   overlayGroups = c("MPA", "Fishing Effort"),
                   options = layersControlOptions(collapsed = FALSE)) %>% 
  hideGroup(group = c("MPA", "Fishing Effort"))

Step 2.3: Using State-space models to refine tracking data

Lets use the error data on the locations to clean-up the data before we visualize it. We will use the aniMotum package developed by Dr. Ian Jonsen. This package provides a quick and easy means to refine positions by integrating the error associated with location data, and modelling the data to be able to predict locations at fixed intervals. For more details on the package you can explore the vignettes provided with the package, and also explore its applications in this paper.

There are three main movement processes that aniMotum provides:

Random Walk (RW) models - where movements between positions are modeled to be random in direction and magnitude.
Correlated Random Walk (CRW) model - where movements between positions are modeled as random and correlated in direction and magnitude.
Continuous-time Move Persistence (MP) model - where movements between positions are modeled as random with correlations in direction and magnitude that vary in time.

You can explore the specifics of this package from the vignette here.

For our data, we will use the Correlated Random Walk model (‘crw’ option in the code). We can run this model by simply using the fit_ssm() function in aniMotum. The function requires some basic information on the dispersal ability of our study species (i.e., maximum swimming speed). If not available, it uses defaults, but is more accurate if you have them. We can also use this function to predict positions at fixed time periods to standardize the data. In our mock data, the data is already standardized (one position per day), so it isn’t required, but in real data, standardizing the data is necessary. The fitted model can then also be piped into the routh_path() function that corrects the paths around landmasses to make the tracks more accurate for mariune animals.

# Lets fit a Correlated Ramdon Walk model to the data and re-route the paths around land
fit <- 
  fit_ssm(all_tracks_df, 
          model = "crw") %>% 
  route_path(map_scale = 10)

## Note that map_scale = 10 is only available if you have the `rnaturalearthhires` package

Lets visualize and explore the outputs of the model:

## Lets have a look at the fitted component of the model 

plot(fit, 
     what = "rerouted", ## what component of the model to plot ('fitted', 'predicted' or 'rerouted')
     type = 2, ## type of plot to make
     pages = 1, 
     ncol = 3)

In the above plots, the blue points are the original positions, the orange are where the model has ‘refined’ positions, and the black ’x’s are outliers identified by the model.

We can now use the grab() function to extract specific components of the model output to have a closer look at them, or produce our own maps using previously covered packages (eg., ggspatial or mapview). The grab() function also allows users to extract the positional data as a sf object and makes it so easy to plot them using other packages!

## Lets plot our own version of the rerouted component using mapview
re_data <- grab(fit, what = "rerouted")

## Lets convert the dataset into points and paths using the `sf` package
re_point <-
  re_data %>% 
  st_as_sf(coords = c("lon", "lat"), crs = 4326, remove = F)

re_path <- 
  re_point %>% 
  group_by(id) %>% 
  summarise(do_union = FALSE) %>% 
  st_cast("LINESTRING")

## Now lets plot a nice interactive plot of the move persistence data using `mapview` and 'leaflet'

mm <- 
  mapview(re_path, zcol = "id", map.type = "Esri.WorldTerrain",
        burst = TRUE, legend = FALSE, homebutton = FALSE) +
  mapview(re_point, zcol = "id", 
          burst = TRUE, cex = 3, legend = FALSE, homebutton = FALSE) +
  mapview(log10(fishing_effort_summary + 1), layer.name = "Fishing Effort",
          legend = FALSE, homebutton = FALSE, na.color = "transparent", alpha = 0.8) +
  mapview(mpa_sf, color = NA, col.regions = "darkgreen", col.alpha = 0.8, alpha = 0,
          layer.name = "MPA", legend = FALSE, homebutton = FALSE)

mm@map %>% 
  addLayersControl(baseGroups = unique(re_path$id), 
                   overlayGroups = c("MPA", "Fishing Effort"),
                   options = layersControlOptions(collapsed = FALSE)) %>% 
  hideGroup(group = c("MPA", "Fishing Effort"))

Step 2.4: Animating the Shark Movements

Now for the fun part. We can now use gganimate to bring the static map of the sharks movement to life.

# Note: Rendering animations can be slow.
# Create the animation object
anim <-
  ggplot() +
  annotation_map_tile(type = esri_terrain, zoom = 7, progress = "none") +
  geom_spatial_point(data = re_data, size = 2, crs = 4326,
                     aes(x = lon, y = lat, color = id, group = id)) +
  geom_spatial_path(data = re_data, size = 1, crs = 4326, alpha = 0.5,
                     aes(x = lon, y = lat, color = id, group = id)) +
  coord_sf(crs = 4326) +
  labs(x = "Longitude", y = "Latitude", color = NULL) +
  scale_color_brewer(palette = "Dark2") +
  theme(legend.position = "top") +
  # the gganimate magic happens here!
  transition_reveal(date)



# Animate and save the gif
# This will save a file named 'shark_animation.gif' in your working directory
## ---------------------- CAUTION ---------------------- ##
# The rendering process may take a bit of time...
anim_save("shark_animation.gif", animation = anim, 
          width = 6, height = 5, units = "in", res = 150,
          fps = 20, duration = 15, end_pause = 10,
          renderer = gifski_renderer(loop = FALSE))
## ----------------------------------------------------- ##

Signoff!

This is where we end our analysis workshop! There may have been a few bits of code that you had trouble with or need more time to work through. I encourage you to discuss these with me as well as others at the workshop to help get a handle on the R code.

If you have any comments or queries reguarding this workshop feel free to contact me:

Dr. Vinay Udyawer: vinay@sharkspacific.org