CoronaNet External Data Harmonization

The CoronaNet Research Project has been actively trying to ensure that its data on COVID-19 government policies are as comprehensive as possible. To that end, since early 2021, we have been taking steps to harmonize data from other COVID-19 trackers, including ACAPS, COVIDAMP, CIHI, John Hopkins HIT-COVID, OxCGRT, the WHO EURO and CDC.

In total, we have identified around 150,000 observations from other datasets which could potentially be harmonized. At the end of this harmonization process, we will produce a single dataset that will be able to amalgamate (e.g. remove duplicate information, clean observations, harmonize data to the same taxonomy) existing information for government COVID-19 policies made until September 2021. For more information, please see our working paper here.

Our methodology can be summarized as follows:

  1. Step 1: Create taxonomy maps for each of the external datasets and CoronaNet, which we make publicly available here. Based on these maps, we then mapped data available for each external dataset, into the CoronaNet taxonomy.

  2. Step 2: Perform basic cleaning and subsetting of external data to only observations clearly relevant existing CoronaNet data collection efforts.

  3. Step 3: Removed a portion of duplicated policies using customized automated algorithms with respect to: Duplication within each respective external dataset / Duplication across the different external datasets.

  4. Step 4: Pilot our data harmonization efforts for a select few countries (over the summer of 2021).

  5. Step 5: Release the resulting curated external data to our community of volunteer research assistants to Manually assess the overlap between PHSM data found in the CoronaNet dataset with that found in the ACAPS, COVIDAMP, CIHI, John Hopkins HIT-COVID, OxCGRT, the WHO EURO and CDC respectively and; Manually recode data found in the external datasets that were not already in the CoronaNet dataset into the CoronaNet taxonomy.

Where we are now:

With the help of hundreds of volunteers around the world, we are currently still working through Step 5 of the data harmonization process. Interested users can identify which policies were originally sourced from external datasets from the collab and collab_id columns in the raw event dataset available here.

  1. collab: records which, if any external dataset the source information for the policy was found from.

  2. collab_id: records the original id found in the external dataset that users can match a given observation back to, assuming the policy_id was sourced from an external dataset.

How we have done it:

In the below, we make publicly available the taxonomy maps that we made during Step 1 of the harmonization process.

 ACAPS - CoronaNet Taxonomy Map
CIHI - CoronaNet Taxonomy Map
COVIDAMP - CoronaNet Taxonomy Map
JHU HIT-COVID - CoronaNet Taxonomy Map
OxCGRT - CoronaNet Taxonomy Map
WHO PHSMs - CoronaNet Taxonomy Map