Ethnic differences in COVID-19 mortality in the second and third waves of the pandemic in England during the vaccine rollout: a retrospective, population-based cohort study – BMC Medicine – BMC Medicine

Data sources
We conducted a retrospective, population-based cohort study using data from the Office for National Statistics (ONS) Public Health Data Asset (PHDA). The ONS PHDA is a linked dataset combining the 2011 Census, mortality records, the General Practice Extraction Service (GPES) Data for Pandemic Planning and Research (GDPPR), Hospital Episode Statistics (HES), and vaccination data from the National Immunisation Management System (NIMS).
Person-level datasets were created from the HES and GDPPR record-level datasets by stacking and deduplicating on NHS number and date of birth. Records with blank or invalid NHS numbers or dates of birth were dropped as these could not be linked to the 2011 Census.
Data linkage
To obtain NHS numbers for the 2011 Census, we linked the 2011 Census to the 2011–2013 NHS Patient Registers using deterministic and probabilistic matching, with an overall linkage rate of 94.6% (see [13] for a detailed description of the linkage methodology and quality evaluation). Rates of linkage failure were higher for most ethnic minority groups than the White British group (Additional file 1: Table S1). Compared with the White British group, the unadjusted odds of linkage failure were highest for the ‘other’ ethnic group (odds ratio = 5.81, 95% confidence interval = 5.78 to 5.84), followed by the mixed ethnic group (4.44, 4.40 to 4.47) and the Chinese group (4.11, 4.07 to 4.16). Rates of linkage failure were also higher in men, younger age groups, people living in more deprived areas, and varied by region; the highest rate of linkage failure was observed for people living in London (9.2%). Once these factors were adjusted for, the odds of linkage failure in ethnic minority groups were substantially reduced. The adjusted odds ratio of linkage failure was below one for the Indian, Bangladeshi, and Pakistani ethnic groups but remained above one for the Black African (1.76, 1.74 to1.77) and the Black Caribbean (1.30, 1.28 to 1.31) ethnic groups.
Further linkage to deaths registrations data, GDPPR, HES, and NIMS data was performed deterministically based on NHS number. 86.2% of deaths recorded in deaths registrations data that occurred in England between 8 December 2020 and 1 December 2021 among people aged 30 to 100 years were included in the analysis. 79.0% of people aged 30 to 100 years who had received at least one dose of a COVID-19 vaccine during the study period according to NIMS records as of 4 January 2022 were linked to the PHDA. The unlinked deaths and vaccination records reflect people not included in our study population (see the ‘Generation of the study cohort’ section).
Generation of the study cohort
Of the 41,880,933 people enumerated at the 2011 Census in England and Wales who would be aged 30–100 in 2020, we excluded 354,036 people (0.9%) who were short-term residents (i.e. people who were enumerated at the 2011 Census but did not intend on staying in the country for at least 12 months), 2,257,221 people (5.4%) who could not be linked deterministically or probabilistically to the NHS Patient register, and 4,360,949 individuals (10.4%) who had died between the Census and 8 December 2020 (the start of the vaccination campaign in the UK). An additional 6,092,707 people (14.5%) were not linked to English primary care records because they either did not live in England in 2019 (the Census included people living in England and Wales) or were not registered with a GP practice in England that was participating in GDPPR (which collects data from 6535 GP practices, covering all GP system suppliers and 97.5% of open and active practices in England [14]). The final study population included 28,816,020 people aged 30–100 years in 2020 (representing 80% of the mid-year 2020 population estimate for England) (Fig. 1) [15].
We restricted our analysis to people aged 30 to 100 in 2020 because most sociodemographic factors were drawn from the 2011 Census, which may not represent people’s circumstances at the beginning of the pandemic; younger people were thought particularly likely to have changed their circumstances, as evidenced by a greater proportion of younger people having different postcodes at the 2011 Census to the most recent postcode recorded in their GP records (Additional file 1: Fig. S1). In addition, very few deaths occurred in people aged below 30 years; official figures show that out of the 84,449 deaths involving COVID-19 in England and Wales in 2020, only 127 (0.2%) were among people less than 30 years old [16].
Exposure
The exposure was self-reported ethnic group, retrieved from the 2011 Census. We used a 10-category ethnic group classification (White British [White English/Welsh/Scottish/Northern Irish/British], Bangladeshi, Black African, Black Caribbean, Chinese, Indian, mixed [White and Asian, White and Black African, White and Black Caribbean, and other mixed], Pakistani, White other [White Irish, White Gypsy or Irish Traveller, and other White], and other [other Asian, Arab, other Black, and any other ethnic groups]).
Covariates
Ethnic differences in the risk of death involving COVID-19 across could be mediated by factors linked to the risk of infection (such as communal living, residing in an area with high infection rates, socioeconomic and demographic factors, and occupation) and factors associated the risk of death if infected (health status and vaccination status). These factors may fall on the causal pathway between ethnicity and COVID-19 mortality (Additional file 1: Fig. S2).
The following covariates were included from the 2011 Census data: age, residence type (private household, care home, or other communal establishment), household tenure, National Statistics Socio-economic Classification (NS-SEC), highest qualification, household size, household deprivation, family status, household composition, key worker in household, and key worker type (education and childcare, food and necessity goods, health and social care, key public services, national and local Government, public safety and national security, transport, utilities and communication; derived according to Census returns based on the 2010 Standard Occupational Classification (SOC) and the 2007 Standard Industrial Classification of Economic Activities [17]) (Additional file 1: Table S2). Body mass index (BMI; classified as underweight, normal weight, overweight, obese, unknown) and pre-existing health conditions (using the same Systematized Nomenclature of Medicine Clinical Terms [SNOMED-CT] codes as the QCovid2 risk prediction model [18]) were included as covariates from the GDPPR data. The QCovid risk prediction model was used in the UK to identify clinically extremely vulnerable individuals who should shield during the pandemic. The model has been previously shown to predict risk of COVID-19 hospitalisation and mortality in three independent datasets [19,20,21]. The number of admissions to, and number of days spent in, admitted patient care during the three years prior to the pandemic were included as covariates from the HES Admitted Patient Care data.
The following covariates were included from other data sources: vaccination status (from NIMS data); region and Rural Urban classification (from the National Statistics Postcode Lookup), population density of the Lower layer Super Output Area (from mid-2019 population estimates), and Index of Multiple Deprivation (IMD) (from the English Indices of Deprivation, 2019) derived from postcodes in GDPPR data; occupational exposure to disease and proximity to others for individuals and the maximum score among all individuals in each household (from the Occupational Information Network database, which collects a range of information about individuals’ working conditions and day-to-day tasks of their job. To calculate the proximity and exposure measures, the questions asked were as follows: (i) How physically close to other people are you when you perform your current job? (ii) How often does your current job require that you be exposed to diseases or infection? Scores ranging from 0 to 100 were calculated for these questions based on 2011 Census data on occupation [22]) and care-home residence status (from the 2019 NHS Patient Register).
Outcome
The outcome was death involving COVID-19, i.e. COVID-19 International Classification of Diseases 10 code of U07.1 (COVID-19, virus identified), U07.2 (COVID-19, virus not identified), or U09.9 (post-COVID condition, unspecified) in part I or II of the death certificate, occurring between 8 December 2020 and 1 December 2021.
Statistical analysis
Age-standardised vaccination rates by ethnic group were calculated for each wave based on the number of vaccine doses received by the end of the period (12 June 2021 for wave 2 and 1 December 2021 for wave 3). Crude rates (%) of the number of people in each ethnic group who were unvaccinated, single-vaccinated, double-vaccinated, or triple-vaccinated were age-standardised using the 2013 European Standardised Population [23].
We calculated age-standardised mortality rates (ASMRs) by ethnic group as deaths per 100,000 person-years at risk to examine the absolute risk of death involving COVID-19, standardised to the 2013 European Standardised Population [23]. ASMRs were calculated separately for each of the waves of the pandemic that occurred during the vaccine rollout (wave 2: 8 December 2020 to 12 June 2021; wave 3: 13 June 2021 to 1 December 2021). This analysis therefore excludes any deaths occurring early in the second wave, which is estimated to have started in September 2020 [24].
As the pandemic was ongoing at the end of the study period, the data were subject to right-censoring. We therefore used Cox proportional hazards models to assess whether differences in the risk of mortality involving COVID-19 by ethnic group could be accounted for by location, sociodemographic factors, pre-pandemic health, and vaccination status. Separate models were fitted for the second and third waves. The index date for start of follow-up time was 8 December 2020 for wave two (the start of the vaccination programme in the UK) and 13 June 2021 for wave three. End of follow-up was date of death for those who died or the end of the wave period for those who were still alive at the end of the period: 12 June 2021 for wave two and 1 December 2021 for wave three (see Additional file 1: Table S3 for mean follow-up times by ethnic group). For computational efficiency, we included all individuals who died of any cause during the analysis period and a random sample (selected by simple random sampling without replacement) of those who did not, with sampling rates of 1% for the White British ethnic group and 10% for every other ethnic group; case weights equal to the inverse probability of selection were included in the analysis, following previously published methods [6, 13]. The White British group was used as the reference category in all models.
The baseline model (model 1) only included adjustment for single year of age as a confounding variable, included as a second-order polynomial. We then introduced potential mediating factors sequentially, starting with factors associated with the risk of exposure to SARS-CoV-2 and then factors associated the risk of death if infected. Model 2 included additional adjustment for type of residence (private household, care home, or other communal establishments). In model 3, we included additional adjustment for geographical factors (region, Rural Urban classification and local population density). In model 4, we adjusted for socioeconomic and demographic factors that are likely to be linked to risk of infection (NS-SEC, highest qualification, IMD decile, household characteristics [tenure of the household, household deprivation, household size, family status, household composition, and key worker in the household], key worker type, individual and household exposure to disease, and individual and household proximity to others). We then adjusted for factors associated with the risk of death if infected. In model 5, additional adjustment was made for health status (pre-existing health conditions, BMI, and number of admissions to hospital and days spent in hospital over the previous 3 years). For all health variables, a binary interaction indicator was included, allowing the effects to vary depending on whether the individual was aged 70 years and older or younger than 70 years. In model 6, vaccination status (unvaccinated, one dose or two doses for wave two plus three doses for wave three) was included as a time-varying covariate, based on the date of vaccination plus 14 days. Therefore, an individual was classified as single-vaccinated 14 days after they received their first vaccine dose, double-vaccinated 14 days after they received their second dose, and triple-vaccinated 14 days after they received their third dose.
Missing Census data were imputed using nearest-neighbour donor imputation, the methodology employed by the Office for National Statistics across all 2011 Census variables [25]. Ethnicity was imputed in 3.0% of 2011 Census records due to item non-response. Individuals with missing data for BMI were placed into an unknown category. Health conditions were derived based on prescription and diagnosis codes, with the sample restricted to people who were registered with a GP practice in England that was participating in GDPPR. Therefore, there were no missing values.
All statistical analyses were stratified by sex and conducted using R, version 3.5. Cox proportional hazards models were implemented using the survival package (version 2.41-3) [26].