JHU HIT-COVID - CoronaNet Taxonomy Map
0 Introduction
This document maps the taxonomy used by the Johns Hopkins Health Intervention Tracking for COVID-19 (HIT-COVID) dataset to document government policies made in response to COVID-19 into the CoronaNet Research Project taxonomy. Each section maps the general area for which the taxonomy is mapped and each sub-section provides further detail as necessary. Following each explanation for how the mapping is conceptualized, there is R code for operationalizing this mapping. Please refer to the HIT-COVID Data Dictionary and Codebook accessible through their github and the CoronaNet Codebook for more information on their respective taxonomies.
You can access (i) the original version of the HIT-COVID dataset, “hit-covid-longdata.csv”, as well as (ii) the version which transforms this version of the HIT-COVID dataset into the CoronaNet taxonomy, “hit_coronanet_map_2b.csv” (the rest of this document details how this transformation was implemented) from the CoronaNet pubic git repo.
1 Setup
To create replicate this taxonomy mapping exercise, users will need to load the following R packages and to read in the original HIT-COVID data.
library(readr)
library(dplyr)
library(magrittr)
library(tidyr)
library(here)
'%!in%' <- function(x, y)
! ('%in%'(x, y))
hit = read_csv(here("data", "collaboration", "jhu", "hit-covid-longdata.csv"))2 Data Preparation
- To save some time later on in the mapping process a new column
catis added to the HIT-COVID Data set. This column is just a cleaned version of the unique_id column and represents the policy type of an observation. It will be used throughout the mapping to filter for each type of policy. - The idea behind the column is to save some time later on because it contains the intervention_group as well as the intervention_name, which are essentially the subcategories of the dataset, e.g. the unique id ‘3_screening_air’ contains the information that the intervention_group is ‘symp_screening’ and the intervention name is ‘Symptom screening when entering by air’.
- A
record_idis also extracted to group certain policies together later on. - The HIT-COVID dataset captures information about compliance with the policies in the variable
required, this is changed to match the CoronaNet taxonomy with ‘required’ becoming ‘Mandatory (Unspecified/Implied)’ and ‘recommended’ becoming ‘Voluntary/Recommended but No Penalties’ and added to the map.
hit$cat = gsub('[0-9]','', hit$unique_id)
hit$cat = gsub('^.','', hit$cat)
hit$cat = gsub('_',' ', hit$cat)
hit$record_id = gsub('\\_.*','', hit$unique_id)3 Map Creation
The following code creates a map to translate the HIT-COVID data to the CoronaNet taxonomy. Where there is a straightforward one-to-one relationship between the two taxonomies, these are directly mapped in the below:
The HIT-COVID
unique_idvariable allows each unique observation to be identifiable. This is conceptually the same as CoronaNet’srecord_idvariable.The HIT-COVID
detailsvariable is a close approximation to CoronaNet’sdescriptionvariable. The main difference is that (at least in theory), CoronaNet’s description variable must always contain certain information (the policy initiator, the type of policy, the date the policy started, and if applicable: the geographic target of the policy, the demographic target of the policy and the end date of the policy) while there does not appear to be the same amount of information consistently captured in the JHU-HIT’sdetailsvariable. As such, it will likely be necessary to back code for this information for observations in the HIT-COVID dataset that are not in the CoronaNet dataset.The HIT-COVID
date_of_updatevariable is a good match for thedate_startvariable in the CoronaNet taxonomy, the latter of which captures when a policy was implemented.The HIT-COVID
country_namevariable andcountryvariable, which document the ISO code and name of the initiating country or a policy respectively, are direct matches for thecountryvariable andISO_A3variable in the CoronaNet taxonomy.The HIT-COVID
admin1_namedocumenting information on the province that a policy initiates from, which is a direct match for theprovincevariable in the CoronaNet taxonomy.The HIT-COVID
urlvariable, which captures information on the URL link for the raw source of information on which the policy is based, is a direct match for thelinkvariable in the CoronaNet taxonomy.The HIT-COVID
source_document_urvariable, which captures information on the PDF link for the raw source of information on which the policy is based, is a direct match for thepdf_linkvariable in the CoronaNet taxonomy.The HIT-COVID
entry_timevariable, which captures information on when a policy was recorded, is a direct match for therecorded_datevariable in the CoronaNet taxonomy.
hit_coronanet_map = data.frame(unique_id = hit$unique_id,
entry_type = NA,
correct_type= NA,
update_type= NA,
update_level= NA,
description= hit$details,
date_announced= NA,
date_start= hit$date_of_update,
date_end= NA,
country = hit$country_name,
ISO_A3 = hit$country,
ISO_A2 = NA,
init_country_level= NA,
domestic_policy= NA,
province = NA,
city= NA,
type= NA,
type_sub_cat= NA,
type_2 = NA,
type_text= NA,
institution_status= NA,
target_country= NA,
target_geog_level= NA,
target_region= NA,
target_province= NA,
target_city= NA,
target_other= NA,
target_who_what= NA,
target_who_gen = NA,
target_direction= NA,
travel_mechanism= NA,
type_mass_gathering= NA,
institution_cat= NA,
compliance= NA,
enforcer= NA,
index_high_est= NA,
index_med_est= NA,
index_low_est= NA,
index_country_rank= NA,
pdf_link = hit$source_document_url,
link = hit$url,
date_updated = NA,
recorded_date = hit$entry_time)3.1 Countries
The following code adjust for the different ways each tracker documents policies originating from certain regions of the world. In particular:
- HIT-COVID considers Puerto Rico as a country while CoronaNet considers it to be a province of the United States. The following code adjusts this data accordingly.
country = hit %>%
mutate(
ISO_A3 =
case_when(
country_name == 'Puerto Rico' ~ 'USA',
TRUE~ country
),
country =
case_when(
country_name == 'Puerto Rico' ~ 'United States of America',
TRUE ~ country_name
)
) %>%
select(country, ISO_A3, unique_id)
hit_coronanet_map = rows_update(hit_coronanet_map, country, by = 'unique_id')3.1 Provinces
The following code adjust for the different ways each tracker documents policies originating from certain regions of the world. In particular
HIT-COVID considers Puerto Rico as a country while CoronaNet considers it to be a province of the United States. The following code adjusts this data accordingly.
HIT-COVID considers Taiwan as a province while CoronaNet considers it to be a country. The following code adjusts this data accordingly.
prov = hit %>%
mutate(
province =
case_when(
admin1_name == 'Taiwan' ~ as.character(NA),
country_name == 'Puerto Rico' ~ 'Puerto Rico',
TRUE ~ admin1_name
)
) %>%
select(province, unique_id)
hit_coronanet_map = rows_update(hit_coronanet_map, prov, by = 'unique_id')2.5 National Entry
The init_country_level variable in the CoronaNet taxonomy captures information as to which level of government a COVID-19 policy originates from, which the HIT-COVID taxonomy does not directly document. However, the HIT-COVID variable national_entry does record whether a policy was initiated at the national level or not. In the following code:
If the
national_entryvariable in the HIT-COVID taxonomy takes a value of Yes, we map theinit_country_levelin the CoronaNet taxonomy to be take the value of National.If the
national_entryvariable in the HIT-COVID taxonomy takes a value of No and the policy is documented as applying to a province, as noted by having a value for theadmin1_namevariable, we map theinit_country_levelin the CoronaNet taxonomy to be take the value of Provincial.If the
national_entryvariable in the HIT-COVID taxonomy takes a value of No, the policy is not documented as applying to a province, as noted by having not a value for theadmin1_namevariable and is documented as applying to a US conty, as noted by having a value for theusa_country_codevariable, we map theinit_country_levelin the CoronaNet taxonomy to be take the value of Other (e.g., county).Ifan observation in the HIT-COVID data takes on no value for the
national_entry,admin1_nameandusa_county_codevariables, we map theinit_country_levelin the CoronaNet taxonomy to be take the value of National.
init_gov = hit %>%
mutate(
init_country_level = case_when(
national_entry == 'Yes' ~ 'National',
national_entry == 'No' & !is.na(admin1_name) ~ 'Provincial',
national_entry == 'No' & is.na(admin1_name) & !is.na(usa_county_code) ~ "Other (e.g., county)",
is.na(national_entry) & is.na(admin1_name) & is.na(usa_county_code) ~ 'National',
TRUE ~ as.character(NA)
)
) %>% select(init_country_level, unique_id)
hit_coronanet_map = rows_patch(hit_coronanet_map, init_gov, by = 'unique_id')2.6 Date of Update
HIT-COVID’s date of update variable documents whether there has been an update to a policy for a policies grouped together by its record_id variable. This information allows us to map whether a policy should be considered a New Entry or an Update for a given group of policies in the CoronaNet taxonomy, which is documented in the entry_type variable in the CoronaNet taxonomy.
hit_entry = hit %>%
arrange(date_of_update) %>%
dplyr:::group_by(record_id, intervention_group) %>%
dplyr:::mutate(
entry_type = case_when(
update == 'Update' & !is.na(date_of_update) & row_number()==1 ~ 'New Entry',
update == 'Update'& !is.na(date_of_update) & row_number()!=1 ~ 'Update',
update == 'No Update'~ 'New Entry',
TRUE ~ as.character(NA)
)) %>% ungroup %>%
select(entry_type, unique_id)
hit_coronanet_map = rows_patch(hit_coronanet_map, hit_entry, by = 'unique_id')2.7 Required/Compliance
The HIT-COVID taxonomy documents whether a policy is mandatory or recommended in its required variable. In the following code:
If the
requiredvariable in the HIT-COVID taxonomy takes a value of reqired, we map thecompliancein the CoronaNet taxonomy to be take the value of Mandatory (Unspecified/Implied). There may be some mis-mappings here that will need to be adjusted downstream in the manual harmonization process.If the
requiredvariable in the HIT-COVID taxonomy takes a value of recommended, we map thecompliancein the CoronaNet taxonomy to be take the value of Voluntary/Recommended but No Penalties.
hit_compliance = hit %>%
mutate(compliance = case_when(
required == 'required' ~ 'Mandatory (Unspecified/Implied)',
required == 'recommended' ~ 'Voluntary/Recommended but No Penalties',
)
)%>%
select(compliance, unique_id)
hit_coronanet_map = rows_patch(hit_coronanet_map, hit_compliance, by = 'unique_id')4 Policy Type
The following mapping exercise is implemented by creating a data frame for each of the HIT_COVID categories. These categories have been extracted from the HIT-COVID’s unique_ids and stored in the cat column. These data frames get populated with as many values as possible. This is done by either reading the HIT-COVID’s codebook, knowing that these types of policy would all have a common variable in the CoronaNet taxonomy and adding them manually, or extracting them from existing HIT-COVID variables. After populating each data frame, they are added to the overall map.
4.1 Closed Border
The following code maps HIT-COVID’s data on border closure policies to the CoronaNet taxonomy.
Border policies are a subset from the
hitobject into its own object calledborder.Two new variables are created,
travel_mechanismandtarget_directionto mirror the same variables in the CoronaNet data.travel_mechanismis populated by pulling information from theunique_id, e.g. ‘43_border_in_air’ will first become ‘air’ and later mutated to ‘Flights’ to match the CoronaNet taxonomy.target_directionis populated by pulling information from theintervention_name, e.g. ‘Border closures for entering by air’ contains the word ‘entering’, by filtering for ‘entering’ and ‘leaving’ either ‘Inbound’ or ‘Outbound’ will be assigned astarget_direction.
The data in the
borderobject is then transformed such that there is a unique observation for every border restriction implemented by a given country on a given day regardless of the travel mechanism or target direction it applies.The data is further processed in the
border_matchobject to map as many options as possible to the CoronaNet taxonomy.Duplicate entries are removed from the raw
hitobject.
border = hit %>%
filter(intervention_group == 'closed_border')
border = border %>%
mutate(
travel_mechanism= sub('.*_', '', unique_id),
target_direction =
case_when(
grepl("leaving", intervention_name) ~ "Outbound",
grepl("entering", intervention_name) ~ "Inbound",
TRUE ~ as.character(NA)
)
) %>%
arrange(intervention_name) %>%
group_by(record_id, required) %>%
mutate(
unique_id = paste(unique(unique_id), collapse = ','),
intervention_name = paste(unique(intervention_name), collapse = ','),
cat = paste(unique(cat), collapse = ','),
travel_mechanism = paste(unique(gsub('\\d','', travel_mechanism)), collapse = ','),
target_direction = paste(unique(target_direction), collapse = ','),
url = paste(unique(url), collapse = ','),
source_document_url = paste(unique(source_document_url), collapse = ',')
) %>%
ungroup() %>%
distinct %>%
group_by(
unique_id
) %>%
mutate(
count = 1:n(),
count = ifelse(count == 1, '', count),
unique_id = ifelse(count != '', paste0(unique_id,count), unique_id)
) %>%
ungroup
border_match = border %>%
select(unique_id, travel_mechanism, target_direction, status) %>%
mutate(type = 'External Border Restrictions',
type_2 = 'Quarantine',
travel_mechanism = case_when(
travel_mechanism == 'air' ~ 'Flights',
travel_mechanism == 'land' ~ 'Land Border,Trains,Buses',
travel_mechanism == 'sea' ~ 'Seaports,Cruises,Ferries',
travel_mechanism %in% c('air,land', 'land,air') ~ 'Flights,Land Border,Trains,Buses',
travel_mechanism %in% c('air,sea', 'sea,air') ~ 'Flights,Seaports,Cruises,Ferries',
travel_mechanism %in% c('air,land,sea', 'land,sea,air') ~ 'All kinds of transport',
TRUE ~ as.character(NA)
),
target_who_what = case_when(
target_direction == "Inbound/Outbound" ~ "All (Travelers + Residents)",
target_direction == 'Inbound' ~ 'All Travelers (Citizen Travelers + Foreign Travelers)',
target_direction == 'Outbound' ~ 'All Residents (Citizen Residents + Foreign Residents)'
),
type_sub_cat = ifelse(status == 'closed', "Total border crossing ban", NA)
) %>%
select(-status)
hit = rbind(hit %>% filter(intervention_group != 'closed_border'), border %>% select(-travel_mechanism, -target_direction, -count))4.2 Screenings
The following code maps how HIT-COVID captures screening policies to the CoronaNet taxonomy. The HIT-COVID screening policies concerning the screening of people within the border of a country are too diverse to properly map to the CoronaNet taxonomy. The following code approximates this mapping with the understanding that downstream manual data harmonization will be able to provide more argeted mappings.
The HIT-COVID taxonomy aims to capture such policies by coding the
intervention_groupas ‘symp_screening closed’catas not ‘screening within’.The CoronaNet taxonomy aims to capture such policies by coding the
typeas External Border Restrictions, thetype_sub_catas Health Screenings (e.g. temperature checks), thetarget_who_whatas All Travelers (Citizen Travelers + Foreign Travelers) and thetarget_who_genas No special population targeted.
screening_border <- hit %>%
filter(intervention_group == 'symp_screening' &
cat != 'screening within') %>%
mutate( travel_mechanism = case_when(
cat == "screening air" ~ "Flights",
cat == "screening land" ~'Land Border,Trains,Buses',
cat == "screening sea" ~ 'Seaports,Cruises,Ferries')) %>%
group_by(record_id, required) %>%
mutate(
unique_id = paste(unique(unique_id), collapse = ','),
intervention_name = paste(unique(intervention_name), collapse = ','),
cat = paste(unique(cat), collapse = ','),
travel_mechanism = paste(unique(travel_mechanism), collapse = ','),
url = paste(unique(url), collapse = ','),
details = paste(unique(details), collapse = ','),
source_document_url = paste(unique(source_document_url), collapse = ',')
) %>%
ungroup() %>%
distinct %>%
group_by(
unique_id
) %>%
mutate(
count = 1:n(),
count = ifelse(count == 1, '', count),
unique_id = ifelse(count != '', paste0(unique_id,count), unique_id)
) %>%
ungroup
screening_match = screening_border %>%
mutate(type = 'External Border Restrictions',
type_2 = 'Quarantine',
type_sub_cat = 'Health Screenings (e.g. temperature checks)',
target_who_what = 'All Travelers (Citizen Travelers + Foreign Travelers )') %>%
select(unique_id, type, type_2, type_sub_cat, target_who_what, travel_mechanism)
hit = rbind(hit %>% filter(!c(intervention_group == 'symp_screening' &
cat != 'screening within')),
screening_border %>% select(-travel_mechanism, -count))4.3 Contact Tracing
The following code maps how HIT-COVID captures a contact tracing that applies to the entire population to the CoronaNet taxonomy.
The HIT-COVID taxonomy aims to capture such policies by coding the
catas contact tracing.The CoronaNet taxonomy aims to capture such policies by coding the
typeas Health monitoring,type_sub_catis Who a person has come into contact with over time, thetarget_who_whatas All Residents (Citizen Residents + Foreign Residents) and thetarget_who_genas No special population targeted.
contact <- hit %>%
filter(cat == 'contact tracing') %>%
select(unique_id) %>%
mutate(type = 'Health Monitoring',
type_2 = 'Quarantine',
type_sub_cat = 'Who a person has come into contact with over time',
target_who_what = 'All Residents (Citizen Residents + Foreign Residents)')4.4 Emergency
The following code maps how HIT-COVID captures emergencies that apply to the entire population to the CoronaNet taxonomy.
The HIT-COVID taxonomy aims to capture such policies by coding the
catas emergency.The CoronaNet taxonomy aims to capture such policies by coding the
typeas Declaration of Emergency, thetarget_who_whatas All Residents (Citizen Residents + Foreign Residents) and thetarget_who_genas No special population targeted.
emergency <- hit %>%
filter(cat == 'emergency') %>%
select(unique_id) %>%
mutate(type = 'Declaration of Emergency',
type_2 = 'External Border Restrictions',
target_who_what = 'All Residents (Citizen Residents + Foreign Residents)',
target_who_gen = 'No special population targeted')4.5 Enforcement
Unfortunately, it is not possible to map these policies to CoronaNet’s taxonomy as there is no close match to any of CoronaNet’s policy types. This bullet point is merely for completeness’ sake. Downstream manual harmonization will be necessary to properly harmonize these policies.
4.6 Entertainment
The following code maps how HIT-COVID captures closures of the entertainment industry which applies to the entire population to the CoronaNet taxonomy.
The HIT-COVID taxonomy aims to capture such policies by coding the
catas entertainment.The CoronaNet taxonomy aims to capture such policies by coding the
typeas Restriction and Regulation of Businesses. We assume that the entertainment industry is not classified as essential in any country and as such we map theinstitution_catas Non-Essential Businesses. We further map thetarget_who_whatas All Residents (Citizen Residents + Foreign Residents) and thetarget_who_genas No special population targeted.
entertainment <- hit %>%
filter(cat == 'entertainment') %>%
select(unique_id) %>%
mutate(type = 'Restriction and Regulation of Businesses',
type_2 = "Restrictions of Mass Gatherings",
institution_cat = 'Non-Essential Businesses',
target_who_what = 'All Residents (Citizen Residents + Foreign Residents)',
target_who_gen = 'No special population targeted')4.7 Isolation
The following code maps how HIT-COVID captures isolation and quarantine policies to the CoronaNet taxonomy.
The HIT-COVID taxonomy aims to capture such policies by coding the
intervention_groupas ‘quar_iso’.The CoronaNet taxonomy aims to capture such policies by coding the
typeas Quarantine. To map the different target populations the HIT-COVID variablecatis used to distinguish between ’All Travelers (Citizen Travelers + Foreign Travelers) and All Residents (Citizen Residents + Foreign Residents).
isolation = subset(hit, hit$intervention_group == 'quar_iso')
names(isolation)[names(isolation) == 'cat'] <- 'target_who_what'
isolation <- isolation %>%
select(unique_id, target_who_what) %>%
mutate(type = 'Quarantine',
type_2 = 'External Border Resrictions',
target_who_what = case_when(
target_who_what == 'quar travel' ~ 'All Travelers (Citizen Travelers + Foreign Travelers)',
target_who_what != 'quar travel' ~ 'All Residents (Citizen Residents + Foreign Residents)',
TRUE ~ as.character(NA)
))4.8 Limited Movement
The following code maps how HIT-COVID captures closures of internal borders which apply to the entire population to the CoronaNet taxonomy.
The HIT-COVID taxonomy aims to capture such policies by coding the
catas limit mvt.The CoronaNet taxonomy aims to capture such policies by coding the
typeas Internal Border Restrictions, thetarget_who_whatas All Residents (Citizen Residents + Foreign Residents) and thetarget_who_genas No special population targeted.
limit_mvt <- hit %>%
filter(cat == 'limit mvt') %>%
select(unique_id) %>%
mutate(type = 'Internal Border Restrictions',
type_2 = 'Lockdown',
target_who_what = 'All Residents (Citizen Residents + Foreign Residents)',
target_who_gen = 'No special population targeted')4.9 Masks
The following code maps how HIT-COVID captures mask-wearing policies that apply to the entire population to the CoronaNet taxonomy.
The HIT-COVID taxonomy aims to capture such policies by coding the
catas mask.The CoronaNet taxonomy aims to capture such policies by coding the
typeas ‘Social Distancing’, thetarget_who_whatas All Residents (Citizen Residents + Foreign Residents) and thetarget_who_genas No special population targeted.
mask <- hit %>%
filter(cat == 'mask') %>%
select(unique_id) %>%
mutate(type = 'Social Distancing',
type_2 = "Restriction and Regulation of Businesses" ,
target_who_what = 'All Residents (Citizen Residents + Foreign Residents)',
target_who_gen = 'No special population targeted')4.10 School
The following code maps how HIT-COVID captures school closure policies to the CoronaNet taxonomy.
The HIT-COVID taxonomy aims to capture such policies by coding the
catas school_closed.The CoronaNet taxonomy aims to capture such policies by coding the
typeis Closure and Regulation of SchoolsDepending on what type of school is closed, the policies need to be mapped to a different
type_sub_cat(the subcategory) in the CoronaNet taxonomy. These can be either preschool, primary school, secondary schools or higher education instutitons.Depending on the status of the schools (closed/ partially closed/ open), the policies are mapped to the variable which captures such information in the CoronaNet taxonomy:
institution_status.The `target_who_what** variable in the CoronaNet taxonomy was defined to take the value of All Residents (Citizen Residents + Foreign Residents) as we assume that school policies affects all residents.
The
schooldata frame needs to be cleaned up before joining it with thehit_coronanet_map. Additional code was added to conduct this cleaning in the below.
school = hit %>% filter(intervention_group == 'school_closed')
school = school %>%
mutate(type = 'Closure and Regulation of Schools',
type_2 = NA,
type_sub_cat = case_when(
intervention_name == 'Nursery school closures' ~ 'Preschool or childcare facilities (generally for children ages 5 and below)',
intervention_name == 'Primary school closures' ~ 'Primary Schools (generally for children ages 10 and below)',
intervention_name == 'Secondary school closures' ~ 'Secondary Schools (generally for children ages 10 to 18)',
intervention_name == 'Post-secondary school closures' ~ 'Higher education institutions (i.e. degree granting institutions)'
),
institution_status = case_when(
intervention_name == 'Nursery school closures' & status == 'open' ~ 'Preschool or childcare facilities allowed to open with no conditions',
intervention_name == 'Nursery school closures' & status == 'partially closed' ~ 'Preschool or childcare facilities allowed to open with conditions',
intervention_name == 'Nursery school closures' & status == 'closed' ~ 'Preschool or childcare facilities closed/locked down',
intervention_name == 'Primary school closures' & status == 'open' ~ 'Primary Schools allowed to open with no conditions',
intervention_name == 'Primary school closures' & status == 'partially closed' ~ 'Primary Schools allowed to open with conditions',
intervention_name == 'Primary school closures' & status == 'closed' ~ 'Primary Schools closed/locked down',
intervention_name == 'Secondary school closures' & status == 'open' ~ 'Secondary Schools allowed to open with no conditions',
intervention_name == 'Secondary school closures' & status == 'partially closed' ~ 'Secondary Schools allowed to open with conditions',
intervention_name == 'Secondary school closures' & status == 'closed' ~ 'Secondary Schools closed/locked down',
intervention_name == 'Post-secondary school closures' & status == 'open' ~ 'Higher education institutions allowed to open with no conditions',
intervention_name == 'Post-secondary school closures' & status == 'partially closed' ~ 'Higher education institutions allowed to open with conditions',
intervention_name == 'Post-secondary school closures' & status == 'closed' ~ 'Higher education institutions closed/locked down',
),
target_who_what = 'All Residents (Citizen Residents + Foreign Residents)' ) %>%
group_by(record_id, required) %>%
mutate(
unique_id = paste(unique(unique_id), collapse = ','),
intervention_name = paste(unique(intervention_name), collapse = ','),
cat = paste(unique(cat), collapse = ','),
type_sub_cat = paste(unique(type_sub_cat), collapse = ','),
institution_status = paste(unique(institution_status), collapse = ','),
url = paste(unique(url), collapse = ','),
source_document_url = paste(unique(source_document_url), collapse = ','),
details = paste(unique(na.omit(details)), collapse = ','),
status = paste(unique(status), collapse = ','),
status_simp = paste(unique(status_simp), collapse = ','),
subpopulation = paste(unique( subpopulation), collapse = ',')
) %>%
ungroup() %>%
# select(unique_id, type, type_2, type_sub_cat, institution_status, target_who_what, pdf_link, link) %>%
distinct %>%
group_by(
unique_id
) %>%
mutate(
count = 1:n(),
count = ifelse(count == 1, '', count),
unique_id = ifelse(count != '', paste0(unique_id,count), unique_id)
) %>%
ungroup
school_match = school %>%
select(unique_id, type, type_2, type_sub_cat, institution_status, target_who_what)
hit = rbind(hit %>% filter(intervention_group != 'school_closed'),
school %>% select(-type, -type_2, -type_sub_cat, -institution_status, -target_who_what, -count))4.11 Nursing Homes
The following code maps how HIT-COVID captures policies regarding restrictions of nursing homes to the CoronaNet taxonomy.
The HIT-COVID taxonomy aims to capture such policies by coding the
catis nursing home.The CoronaNet taxonomy aims to capture such policies by coding the
typeis Social Distancing, thetype_sub_catis Restrictions on visiting nursing homes/long term care facilities, thetarget_who_whatis All Residents (Citizen Residents + Foreign Residents) and thetarget_who_genis No special population targeted.
nursing_homes <- hit %>%
filter(cat == 'nursing home') %>%
select(unique_id) %>%
mutate(type = 'Social Distancing',
type_2 = 'Health Resources',
type_sub_cat = 'Restrictions on visiting nursing homes/long term care facilities',
target_who_what = 'All Residents (Citizen Residents + Foreign Residents)',
)4.12 Offices
The following code maps how HIT-COVID captures office closure policies to the CoronaNet taxonomy.
The HIT-COVID taxonomy aims to capture such policies by coding the
catis office.The CoronaNet taxonomy aims to capture such policies by coding the
typeis Restriction and Regulation of Businesses, thetarget_who_whatis All Residents (Citizen Residents + Foreign Residents) and thetarget_who_genis No special population targeted.
office <- hit %>%
filter(cat == 'office') %>%
select(unique_id) %>%
mutate(type = 'Restriction and Regulation of Businesses',
type_2 = 'Restriction and Regulation of Government Services',
target_who_what = 'All Residents (Citizen Residents + Foreign Residents)',
target_who_gen = 'No special population targeted')4.13 Public Space
The following code maps how HIT-COVID captures public space closure policies to the CoronaNet taxonomy.
The HIT-COVID taxonomy aims to capture such policies by coding the
catis public space.The CoronaNet taxonomy aims to capture such policies by coding the
typeis Restriction and Regulation of Government Services, thetarget_who_whatis All Residents (Citizen Residents + Foreign Residents) and thetarget_who_genis No special population targeted.
public_space <- hit %>%
filter(cat == 'public space') %>%
select(unique_id) %>%
mutate(type = 'Restriction and Regulation of Government Services',
type_2 = 'Restrictions of Mass Gatherings',
target_who_what = 'All Residents (Citizen Residents + Foreign Residents)',
target_who_gen = 'No special population targeted')4.14 Public Transport
The following code maps how HIT-COVID captures restrictions of public transport to the CoronaNet taxonomy.
The HIT-COVID taxonomy aims to capture such policies by coding the
catis public space.The CoronaNet taxonomy aims to capture such policies by coding the
typeis Restriction and Regulation of Government Services, thetarget_who_whatis All Residents (Citizen Residents + Foreign Residents) and thetarget_who_genis No special population targeted.
public_transport <- hit %>%
filter(cat == 'public transport') %>%
select(unique_id) %>%
mutate(type = 'Social Distancing',
type_2 = NA,
type_sub_cat = 'Restrictions ridership of other forms of public transportation (please include details in the text entry)',
target_who_what = 'All Residents (Citizen Residents + Foreign Residents)',
target_who_gen = 'No special population targeted')4.15 Religion
The following code maps how HIT-COVID captures restrictions of religious gatherings to the CoronaNet taxonomy.
The HIT-COVID taxonomy aims to capture such policies by coding the
catis religion.The CoronaNet taxonomy aims to capture such policies by coding the
typeis Restrictions of Mass Gatherings,type_sub_catis Attendance at religious services restricted (e.g. mosque/church closings), thetarget_who_whatis All Residents (Citizen Residents + Foreign Residents) and thetarget_who_genas No special population targeted.This category was combined with Leisure and Entertainment in the HIT-COVID dataset until 06/02/2020, older entries may therefore be missing in this mapping.
religion <- hit %>%
filter(cat == 'religion') %>%
select(unique_id) %>%
mutate(type = 'Restrictions of Mass Gatherings',
type_2 = NA,
type_sub_cat = 'Attendance at religious services prohibited (e.g. mosque/church closings)',
target_who_what = 'All Residents (Citizen Residents + Foreign Residents)',
target_who_gen = 'No special population targeted')4.17 Store
The following code maps how HIT-COVID captures restrictions and regulations of stores to the CoronaNet taxonomy.
The HIT-COVID taxonomy aims to capture such policies by coding the
catis store.The CoronaNet taxonomy aims to capture such policies by coding the
typeis Restriction and Regulation of Businesses, thetarget_who_whatas All Residents (Citizen Residents + Foreign Residents) and thetarget_who_genas No special population targeted.To specify whether a store is essential or non-essential two filters are being implemented. The first one filters for the word ‘essential’ and the second one for all variants of writing ‘non-essential’. This information is then saved in the
institution_catvariable, which the CoronaNet taxonomy uses to make these distinctions.
store_closures = subset(hit, hit$cat == 'store')
store_closures$details <- tolower(store_closures$details)
store_closures = store_closures %>%
mutate( essential_yes = grepl( c("essential"), store_closures$details) ,
non_essential_yes = grepl(c("non essential|non-essential| not essential"), store_closures$details)
)
store_closures <- store_closures %>%
select(unique_id, essential_yes, non_essential_yes) %>%
mutate(type = 'Restriction and Regulation of Businesses',
type_2 = NA,
institution_cat = case_when(
essential_yes == T & non_essential_yes== F ~ "Essential Businesses",
TRUE ~ "Non-Essential Businesses"
),
target_who_what = 'All Residents (Citizen Residents + Foreign Residents)',
target_who_gen = 'No special population targeted')
store_closures$essential_yes = NULL
store_closures$non_essential_yes = NULL4.18 Testing
The following code maps how HIT-COVID captures testing policies to the CoronaNet taxonomy.
The HIT-COVID taxonomy aims to capture such policies by coding the
catis either testing asymp or testing symp.The CoronaNet taxonomy aims to capture such policies by coding the
typeis Health Testing, thetarget_who_whatas All Residents (Citizen Residents + Foreign Residents) and thetarget_who_genas Asymptomatic people or Symptomatic people depending on whetehrcattakes on the values of testing asymp or testing symp respectively .
testing <- hit %>%
filter(cat == 'testing asymp' |
cat == 'testing symp') %>%
mutate(type = 'Health Testing',
type_2 = 'Health Monitoring',
target_who_what = 'All Residents (Citizen Residents + Foreign Residents)',
target_who_gen = ifelse(cat == 'testing asymp', 'Asymptomatic people', ifelse( cat == 'testing symp', 'Symptomatic people', NA ))) %>%
group_by(record_id, required) %>%
mutate(
unique_id = paste(unique(unique_id), collapse = ','),
intervention_name = paste(unique(intervention_name), collapse = ','),
intervention_group = paste(unique(intervention_group), collapse = ','),
cat = paste(unique(cat), collapse = ','),
url= paste(unique(url), collapse = ','),
source_document_url = paste(unique(source_document_url), collapse = ','),
testing_population = paste(unique(testing_population ), collapse = ','),
target_who_gen = paste(unique(target_who_gen ), collapse = ','),
status = paste(unique(status ), collapse = ','),
status_simp = paste(unique( status_simp), collapse = ','),
subpopulation = paste(unique(subpopulation), collapse = ','),
entry_time = max(entry_time)
) %>%
ungroup() %>%
distinct %>%
group_by(
unique_id
) %>%
mutate(
count = 1:n(),
count = ifelse(count == 1, '', count),
unique_id = ifelse(count != '', paste0(unique_id,count), unique_id)
) %>%
ungroup
testing_match = testing %>%
select(type, type_2, target_who_what, target_who_gen, unique_id)
hit = rbind(hit %>% filter(!c(cat == 'testing asymp' |
cat == 'testing symp')), testing %>% select(-type, -type_2, -target_who_what, -target_who_gen, -count))4.19 Restaurant
The following code maps how HIT-COVID captures restrictions and regulations of restaurants to the CoronaNet taxonomy.
In the HIT-COVID taxonomy, the
restaurant_closedcategory can include restaurants, cafes, coffee shops, bars, and food vendors. They need to be coded separately in the Coronanet taxonomy. Since filtering out all of the different options is not possible to do systematically, the code focuses on making distinctions between the two most commonly targeted policies: restaurants and bars.To separate the between restaurnts and bars, first a subset of the data is saved in the data frame
rest_closures- The
detailsof this data frame get cleaned to only include lowercase letters to make the filtering process smoother. - Four filters are being implemented: one to detect restaurants, one to detect bars, and two to detect whether the
detailssay anything about closing or opening said establishments. (The latter two are not used in this code but might be useful in the future)
- The
A data frame
restaurantsis created by using the restaurant filter (type_sub_catis ‘Restaurants’)A data frame
barsis created by using the bars filter. (type_sub_catis ‘Bars’)The two dataframes get combined into one data frame
restaurantsA data frame
other_businessesis created by filtering out all theunique_idsthat have not been used in therestaurantsdata frame. There is notype_sub_catadded to avoid false mappings.- The data frame gets added to the map.
The big issue with filtering for bars and restaurants is that the
unique_idshave been duplicated in some cases since some of the policies are targeted at both and both filters are therefore ‘TRUE’. Adding therestaurantsdata frame to the map requires aunique_idhowever.- To solve this issue the data frame gets split into two.
restaurant_closures_non_dupcontaining all the non-duplicated entries andrestaurant_closures_dupcontaining all the duplicated entries. - The
restaurant_closures_non_dupdata frame gets added to the map. - Since the last dataframe left to implement only contains duplicates (except for the
type_sub_catwhich is ‘bars’ and not ‘restaurants’), theunique_idsof therestaurant_closures_dupare used to extract the policies matching policies from thehit_coronanet_mapand stored in thedup_filldataframe. - The
dup_fillrows are updates with therestaurant_closures_duprows (only changing thetype_sub_catto ‘bars’) - The
dup_fillcreated data frame gets added to the map.
- To solve this issue the data frame gets split into two.
rest_closures = subset(hit, hit$cat %in% c('restaurant closed', 'restaurant reduced'))
rest_closures$details <- tolower(rest_closures$details)
rest_closures$details = gsub("[^A-Za-z0-9 ]","", rest_closures$details)
rest_closures = rest_closures %>%
mutate( restaurants_yes = grepl( c("restaurant"), rest_closures$details) ,
bars_yes = grepl(c("bar|pub |pubs"), rest_closures$details),
open_yes = grepl(c("open"), rest_closures$details),
close_yes = grepl(c("close|suspend"), rest_closures$details))
restaurants <- rest_closures %>%
filter(restaurants_yes==T) %>%
select(unique_id, cat) %>%
mutate(type = 'Restriction and Regulation of Businesses',
type_2 = NA,
type_sub_cat = 'Restaurants',
institution_cat = 'Non-Essential Businesses',
institution_status = ifelse(cat == 'restaurant closed', 'This type of business ("Restaurants") is closed/locked down',
'This type of business ("Restaurants") is allowed to open with conditions'),
target_who_what = 'All Residents (Citizen Residents + Foreign Residents)',
target_who_gen = 'No special population targeted')
bars = rest_closures %>%
filter(bars_yes==T) %>%
select(unique_id, cat) %>%
mutate(type = 'Restriction and Regulation of Businesses',
type_sub_cat = 'Bars',
type_2 = NA,
institution_cat = 'Non-Essential Businesses',
institution_status = ifelse(cat == 'restaurant closed', 'This type of business ("Bars") is closed/locked down',
'This type of business ("Bars") is allowed to open with conditions'),
target_who_what = 'All Residents (Citizen Residents + Foreign Residents)',
target_who_gen = 'No special population targeted')
restaurant_closures = rbind(bars, restaurants) %>% select(-cat)
other_business_closures = anti_join(rest_closures, restaurant_closures, "unique_id")
other_business_closures = other_business_closures %>%
select(unique_id) %>%
mutate(type = 'Restriction and Regulation of Businesses',
institution_cat = 'Non-Essential Businesses',
target_who_what = 'All Residents (Citizen Residents + Foreign Residents)',
target_who_gen = 'No special population targeted')4.20 Confinement
The following code maps how HIT-COVID captures a lockdown that applies to the entire population to the CoronaNet taxonomy.
The HIT-COVID taxonomy aims to capture such policies by coding the
catis confinement.Since a partially restricted
statuscould mean a curfew or a special population/ geographic area targeted, only the policies that fully restrict the entire population are mapped as lockdown policiesThe CoronaNet taxonomy aims to capture such policies by coding the
typeis Lockdown, thetarget_who_whatas All Residents (Citizen Residents + Foreign Residents) and thetarget_who_genas No special population targeted.
confinement_all <- hit %>%
mutate(details = tolower(details))%>%
filter(cat == 'confinement' &
status == 'fully restricted' &
subpopulation == 'entire population'|
grepl("ockdown|tay at home|tay-at-home", details)) %>%
filter( intervention_group != 'school_closed') %>%
select(unique_id) %>%
mutate(type = 'Lockdown',
type_2 = 'Social Distancing',
target_who_what = 'All Residents (Citizen Residents + Foreign Residents)',
target_who_gen = 'No special population targeted')4.21 Curfew
The following code maps how HIT-COVID captures a curfew to the CoronaNet taxonomy.
The HIT-COVID taxonomy aims to capture such policies when the
catvariable takes on the value of confinement anddetailsfield references a curfew.The CoronaNet taxonomy aims to capture such policies by coding the
typeis Curfew, thetarget_who_whatas All Residents (Citizen Residents + Foreign Residents) and thetarget_who_genas No special population targeted.
confinement_curfew <- hit %>%
filter(cat == 'confinement' &
grepl('urfew', details)) %>%
filter( intervention_group != 'school_closed') %>%
select(unique_id) %>%
mutate(type = 'Curfew',
type_2 = NA,
target_who_what = 'All Residents (Citizen Residents + Foreign Residents)')4.22 Other Confinement
The following code maps how HIT-COVID captures policies that are likely curfew, quarantine, or lockdown policies to the CoronaNet taxonomy.
The HIT-COVID taxonomy aims to capture such policies by coding the
catis confinement and is not already mapped in the aboveBecause there was no systematic way to map such policies in a one to one manner, we mapped such policies as Lockdown or Curfew or Quarantine in the CoronaNet
typevariable to provide guidance to researchers manually harmonizing this data later downstream.
remaining_confinment = hit %>%
filter(unique_id %!in% c(confinement_all, confinement_curfew)) %>% select(unique_id) %>% pull
confinement_other <- hit %>%
filter(cat == 'confinement' &
unique_id %in% remaining_confinment) %>%
select(unique_id) %>%
mutate(type = 'Lockdown or Curfew or Quarantine',
type_2 = 'Closure and Regulation of Schools',
target_who_what = 'All Residents (Citizen Residents + Foreign Residents)')4.23 Screening within
When the cat is screening within there is no clear one-to-one matching with the CoronaNet taxonomy, in these cases, the best guesses are given:
Because there was no systematic way to map such policies in a one to one manner, we mapped such policies as External Border Restriction or Internal Border Restriction or Health Monitoring in the CoronaNet type variable to provide guidance to researchers manually harmonizing this data later downstream.
screening_within = hit %>%
filter(cat == 'screening within') %>%
mutate(
type = 'External Border Restriction or Internal Border Restriction or Health Monitoring'
) %>%
select(unique_id, type)4.24 Enforcment
The CoronaNet taxonomy does not systematically capture policies about enforcement . As such, these policies have been mapped such that thetype variable takes the value of Other Policy Not Listed Above
enforcement = hit %>%
filter(cat == 'enforcement') %>%
mutate(
type = 'Other Policy Not Listed Above'
) %>%
select(unique_id, type)5 Final Mapping
All the previously created data frames are merged into the map while taking care to not overwrite existing data (thus the use of rows_patch). After a few extra steps to implement the more detailed mappings of restaurant closures, the map is complete. The results are then exported in an .rds and .csv format, to be consolidated together with the other external databases to be harmonized. The final consolidated dataset is then processed for manual harmonisation into the CoronaNet Research Project dataset.
hit_coronanet_map = rows_patch(hit_coronanet_map, border_match, by = 'unique_id')
hit_coronanet_map = rows_patch(hit_coronanet_map, confinement_all, by = 'unique_id')
hit_coronanet_map = rows_patch(hit_coronanet_map, confinement_curfew, by = 'unique_id')
hit_coronanet_map = rows_patch(hit_coronanet_map, confinement_other, by = 'unique_id')
hit_coronanet_map = rows_patch(hit_coronanet_map, contact, by = 'unique_id')
hit_coronanet_map = rows_patch(hit_coronanet_map, emergency, by = 'unique_id')
hit_coronanet_map = rows_patch(hit_coronanet_map, entertainment, by = 'unique_id')
hit_coronanet_map = rows_patch(hit_coronanet_map, isolation, by = 'unique_id')
hit_coronanet_map = rows_patch(hit_coronanet_map, limit_mvt, by = 'unique_id')
hit_coronanet_map = rows_patch(hit_coronanet_map, mask, by = 'unique_id')
hit_coronanet_map = rows_patch(hit_coronanet_map, nursing_homes, by = 'unique_id')
hit_coronanet_map = rows_patch(hit_coronanet_map, office, by = 'unique_id')
hit_coronanet_map = rows_patch(hit_coronanet_map, public_space, by = 'unique_id')
hit_coronanet_map = rows_patch(hit_coronanet_map, public_transport, by = 'unique_id')
hit_coronanet_map = rows_patch(hit_coronanet_map, religion, by = 'unique_id')
hit_coronanet_map = rows_patch(hit_coronanet_map, screening_match, by = 'unique_id')
hit_coronanet_map = rows_patch(hit_coronanet_map, social_limits, by = 'unique_id')
hit_coronanet_map = rows_patch(hit_coronanet_map, store_closures, by = 'unique_id')
hit_coronanet_map = rows_patch(hit_coronanet_map, testing_match, by = 'unique_id')
hit_coronanet_map = rows_patch(hit_coronanet_map, school_match, by = 'unique_id')
hit_coronanet_map = rows_patch(hit_coronanet_map, other_business_closures, by = 'unique_id')
hit_coronanet_map = rows_patch(hit_coronanet_map, screening_within, by = 'unique_id')
hit_coronanet_map = rows_patch(hit_coronanet_map, enforcement, by = 'unique_id')
restaurant_closures_non_dup = restaurant_closures[!duplicated(restaurant_closures$unique_id),]
restaurant_closures_dup = restaurant_closures[duplicated(restaurant_closures$unique_id),]
hit_coronanet_map = rows_patch(hit_coronanet_map, restaurant_closures_non_dup, by = 'unique_id')
dup_fill = subset(hit_coronanet_map, hit_coronanet_map$unique_id %in% restaurant_closures_dup$unique_id)
dup_fill = rows_update(dup_fill, restaurant_closures_dup, by = 'unique_id')
hit_coronanet_map = rbind(hit_coronanet_map, dup_fill)
hit_coronanet_map = hit_coronanet_map %>%
group_by(unique_id) %>%
mutate(
unique_id = paste(unique(unique_id), collapse = ','),
type_sub_cat = paste(unique(type_sub_cat), collapse = ',') ,
institution_status = paste(unique( institution_status), collapse = ',')
) %>%
ungroup() %>%
distinct %>%
group_by(
unique_id
) %>%
mutate(
count = 1:n(),
count = ifelse(count == 1, '', count),
unique_id = ifelse(count != '', paste0(unique_id,count), unique_id)
) %>%
select(-count) %>%
ungroup
saveRDS(hit_coronanet_map, "/Users/cindycheng/Documents/CoronaNet/corona_private/data/collaboration/jhu/hit_coronanet_map_2b.rds")
write.csv(hit_coronanet_map, "/Users/cindycheng/Documents/CoronaNet/corona_private/data/collaboration/jhu/hit_coronanet_map_2b.csv")
4.16 Social Limits
The following code maps how HIT-COVID captures restrictions on the number of people allowed to gather to the CoronaNet taxonomy.
The HIT-COVID taxonomy aims to capture such policies by coding the
catis social limits.The CoronaNet taxonomy aims to capture such policies by coding the
typeis Restrictions of Mass Gatherings, thetarget_who_whatas All Residents (Citizen Residents + Foreign Residents) and thetarget_who_genas No special population targeted.