Data Sources & Codebook

Detailled Script.

Tarik Benmarhnia https://profiles.ucsd.edu/tarik.benmarhnia (UCSD & Scripps Institute)https://benmarhniaresearch.ucsd.edu/ , Marie-Abèle Bind https://scholar.harvard.edu/marie-abele (Biostatistics Center, Massachusetts General Hospital)https://biostatistics.massgeneral.org/faculty/marie-abele-bind-phd/ , Léo Zabrocki https://lzabrocki.github.io/ (RFF-CMCC EIEE)https://www.eiee.org/
2022-11-22

In this document, we describe the data sources and provide the codebook of the variables.

Should you have any questions, please do not hesitate to contact us at

Required Packages and Data Loading

To reproduce exactly the data_sources_codebook.html document, we first need to have installed:

Once everything is set up, we load the following packages:

# load required packages
library(knitr) # for creating the R Markdown document
library(here) # for files paths organization
library(tidyverse) # for data manipulation and visualization
library(DT) # for displaying the data as tables

We finally load the data:

# load the data
data <-
  readRDS(here::here("inputs", "1.data", "environmental_data.rds"))

Data Sources

The dataset we use in our tutorial was gathered for a previous work by Tarik Benmarhnia et al. (2015).

Health Data

All non-accidental deaths that occurred in the summers (June, July and August) of 1990-2007 were retrieved for the island of Montreal, Canada. The Quebec life table for Montreal for the years 2000 to 2002 was used to compute the total number of years of life lost (YLL).

Weather Data

Daily mean outdoor temperatures (°C) and daily relative humidity (%) were obtained for the period 1981–2010 from Environment Canada meteorological observation station at the Montreal Pierre Elliott Trudeau International Airport. We defined a heat wave day as any day with daily maximum temperature exceeding 30°C following the defined threshold for triggering the “active watch” level in the Montreal Heat Action Plan.

Air Pollution Data

We retrieved air pollution concentrations from the National Air Pollution Surveillance network of fixed-site monitors in Montreal (https://www.ec.gc.ca/rnspa-naps/). We averaged hourly concentrations over all stations and calculated daily (and lagged) mean concentrations for ozone (O3) and nitrogen dioxide (NO2).

Data Used in our Tutorial

The final dataset contains 1376 daily observations for the summers of the 1990-2007 period and 23 variables. Over that period, 122 heat waves occurred. Below are summary statistics for the variables:

Please show me the code!
data %>%
  dplyr::select(yll, temperature_average, temperature_maximum, humidity_relative:o3, no2) %>%
  pivot_longer(cols = everything(.), names_to = "Variable", values_to = "value") %>%
  group_by(Variable) %>%
  summarise(Mean = mean(value),
            SD = sd(value),
            Min = min(value),
            Max = max(value)) %>%
  mutate_at(vars(Mean:Max), ~ round(., 1)) %>%
  kable(., align = c("l", "c", "c", "c", "c"))
Variable Mean SD Min Max
humidity_relative 68.7 10.4 38.5 95.8
no2 25.3 8.6 2.3 62.4
o3 25.5 11.7 0.8 76.1
temperature_average 20.4 3.3 9.6 29.2
temperature_maximum 24.9 3.8 12.0 35.4
yll 2661.1 503.7 1075.7 5208.3

Codebook

We load below the codebook of the data:

Please show me the code!
# load the codebook
read.csv(here::here("inputs", "1.data", "codebook.csv"), sep = ";") %>%
  datatable(.)

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.