Covid-19: Predicted U.S. State Mortality Rates from Age Demographics

It's been known since the early days of the Covid-19 pandemic that its Infection Fatality Rate has a strong dependence on the patient's Age, with the chances of death increasing exponentially as the Age increases. This dependence means that two populations (e.g. - countries, states, cities, etc.) that are identical in all ways other than Age Demographics, and take the exact same legislative steps and preventative measures, should expect different Mortality Rates. Debates about which measures were successful in controlling the spread of Covid-19 and keeping the Mortality Rates down must take Age Demographics into account to avoid reaching the wrong conclusions. While Medical Professionals and Statisticians know to correct for such factors when making comparisons, the typical person likely doesn't know how big an effect their State's Age Distribution has on overall Covid-19 outcome.

Here we examine what one would expect the Mortality Rate to be in the 50 U.S. States based solely on their population's Age distributions, using publicly available data. These values can be used as a potentially more correct "baseline" for comparisons between different states' responses to the pandemic: e.g. - Hawaii has one of the country's lowest Mortality Rates and Arizona one of the highest, but are those outcomes due to how their governments and people behaved in response to Covid-19, or primarily a function of the Age of those States' populations?

Disclaimer

Nothing here should be construed as passing judgment on any State's or People's responses to Covid-19. This is purely a statistical analysis of objective data (State Age Demographics and Covid-19 Infection Fatality Rates), with the goals of: (a) learning the importance of Age Demographics on U.S. State Mortality Rates, and (b) obtaining a better Baseline for other analyses to determine which responses were more effective in combating Covid-19. In addition, we are not epidemiologists: this analysis is only producing back-of-the-envelope estimates.

Covid-19 Virus
Image by The Tribune : India

Conclusions

The Age Distribution of each State's population has a large impact on Mortality Rate (Deaths per 1M Population), with the highest Predicted Mortality Rate for a State (Florida: 15,852) being 85% higher than the lowest Predicted Mortality Rate for a State (Utah: 8,553). This reinforces the idea that Age Demographics must be taken into account when determining State Response Performance. There are many other factors that could impact our choice of baseline, most notably Obesity Prevalence, a known Covid-19 risk factor that varies from 24% to 40% between States. This can be done in later research.

Our model is based on Age Demographics inferred from the 2020 U.S. Census, combined with Infection Fatality Rates determined from the original Covid-19 strain spreading from 4/1/2020 - 1/1/2021 (before Vaccines were available and before widespread Variants). It assumes the entire population gets infected without protection, resulting in much higher Mortality Rates than were observed in reality. These values were compared to the Mortality Rates from Worldometer, and Predicted Mortality Rates were scaled to have the same Mean as the Realized Worldometer values to make comparisons easier.

Here is a table summarizing our results:

State - Names of the 50 U.S. States
Population: Number of People - From the By-State 2020 U.S. Census, "Table S0101: AGE AND SEX", summed across all ages
Population: Percent Age >= 75 - From the By-State 2020 U.S. Census, "Table S0101: AGE AND SEX", summing rows for Age >= 75
Population: Vaccination Rate - From the By-State news.google.com, fully vaccinated percentage
Predicted: Deaths - Results from integrating Covid-19 Infection Fatality Rate against Inferred Age Distribution.
Predicted: Deaths / 1M Pop - 1,000,000 * (Predicted: Deaths) / (Population)
Re-Meaned Predicted: Deaths / 1M Pop - Predicted Mortality Rates multiplicatively scaled to have the same average as the Worldometer Mortality Rates. (Predicted: Deaths / 1M Pop) * (Average(Worldometer: Deaths / 1M Pop) / Average(Predicted: Deaths / 1M Pop)
Worldometer (2022-12): Deaths - From Worldometer for the U.S., sampled in 2022-12
Worldometer (2022-12): Deaths / 1M Pop - 1,000,000 * (Worldometer: Deaths) / (Population)
Performance: Deaths / 1M Pop - How each State performed relative to predicted Mortality Rates based on our Age Demographics model: (Worldometer:Deaths / 1M Pop) - (Re-Meaned Predicted Deaths / 1M Pop). A positive number means there were more deaths than the model predicts, and so is in red because it means that State did worse than expected based on Age Demographics alone. A negative number means there were fewer deaths than the model predicts, and so is in green because it means that State did better than expected based on Age Demographics alone.

U.S. State Predicted Covid-19 Mortality Rate - Age Model

Another way of viewing this that can make outliers jump out is a Scatter Plot. Here we plot the Realized Mortality Rate (y-axis) versus the Re-Meaned Predicted Mortality Rate (x-axis). Anything in the top left had more deaths than predicted (red) and anything on the bottom right had fewer deaths than predicted (green):

U.S. State Realized vs Predicted Covid-19 Mortality Rate - Age Model

Obviously there is a lot of noise in this model, which is unsurprising since it takes so little into account and makes huge assumptions about everyone getting sick with the original strain and no vaccines. However, it does make clear how important Age Demographics of each State can be.

One thing we could compare the Excess Mortality Rate (i.e. - Realized - Predicted) to is the Vaccination Rates for each State.

U.S. State Vaccination Rate vs Excess Mortality Rate

While there are many conflating factors we didn't take into account (e.g. - where breakouts occurred before vaccines were available, Obesity, Population Density), it certainly looks like there is a relationship: the Lower the Vaccination Rate, the higher the Excess Mortality Rate

Methodology

We needed two pieces of information to make predictions on the Covid-19 Mortality Rate for each U.S. State:

Infection Mortality Rate for Covid-19 as a function of Age of the patient.
Age Distributions for each of the 50 U.S. States

Once that is done we can:

Integrate the Infection Mortality Rate against the Age Distribution

And we'll arrive at the expected number of deaths for each State.

(1) Infection Mortality Rate for Covid-19 as a function of Age

We found an article in The Lancet that provided the Infection Mortality Rate in one-year age increments, with their estimates provided in easily downloadable format. They examined fatalities caused by Covid-19 in multiple countries, using case data from 4/1/2020 - 1/1/2021 (the original Covid-19 Strain before vaccines were available and before the spread of variants). They combined the fatality data with seroprevalence surveys to arrive at Infection Fatality Rates that take asymptomatic and untested cases into account. Their estimates go from Age 1-100, and we plot their results here:

Covid-19 Infection Fatality Rate (IFR) vs Age

The only changes that had to be made to their results related to "Age 0". It was unclear from their description if "Age 1" referred to the interval [0,1], [0.5, 1.5], or [1,2]. We decided to associate it with the [1,2] bin, and fill in the [0,1] value by linearly extending the data from [2,3] and [1,2] (in log-space, the orange line plotted above).

(2) Age Distributions for each of the 50 U.S. States

We started with the 2020 U.S. Census data, which provides data about each State's population broken up into 5-year bins up to Age 85+. Unfortunately, with Covid-19's strong sensitivity to higher Ages we need to correct for two issues:

Break up the 85+ bin into 85-89, 90-94, 99-99 bins
Break up all 5-year bins into 1-year bins

These two requirements were handled in different ways.

(a) Break up the 85+ bin into 85-89, 90-94, 95-99 bins

Since we're completely lacking finer grain information, we went to the Social Security Actuarial Tables to see how likely people were to survive an additional year based on their Age. We aren't actually comparing apples to apples here, since the Actuarial table is the likelihood of a given person at a given age surviving for one additional year, while the population at a given moment in a State doesn't need to follow that distribution at all: e.g. - there might be a huge influx of retirees who move to Florida at age 65, resulting in an upward spike in Florida's Age Distribution that is decidedly different from the actuarial tables. Lacking simple alternatives, we did the following:

Look up the percentage of people in the 85+ bin for each State.
Associate the entire percentage from step (1) with "Age 85". (This is only temporary)
Fill in the population for ages 86-100 by multiplying the previous Age by the likelihood of surviving one additional year from the Actuarial Tables.
Sum up the percentage of population that we just filled in from Ages 85-100.
Multiply each Age's percentage by the ratio of values: (1) / (4).
Sum up the percentage-by-Age from step (5) into 5-year bins: 85-89, 90-94, 95-99, and leave 100 as a 1-year bin.

At the end of step (6) we have a percentage of the population in three 5-year bins and a single 1-year bin, and the sum of those four bins equals the original value from step (a), which is equal to the value in the 2020 Census for Age 85+. Thus we are ensured "internal consistency" with our own data.

(b) Break up all 5-year bins into 1-year bins

Once again we're lacking access to finer grain information, so we performed a spline to interpolate between our 2020 Census points (plus the split 85+ bins) and produce a full data set of 1-year values. To do this we performed the following procedure for all 50 States:

Produce (x,y) coordinates for our spline:
1. Divide the percentage in each bin by the number of years in the bin, to get an average percentage-per-year. These will be the y-values for our spline. e.g. - Age 0-4 for Alabama had 5.8% of the population. There are 5 years of ages in that bin (0 - 0.999...), (1 - 1.999...), (2 - 2.999...), (3 - 3.999...), (4, 4.999...), so the y-value for this point becomes 5.8/5 = 1.16%.
2. Take the center age of each bin. These will be the x-values for our spline. e.g. - Age 0-4 for Alabama gets an x-value of 2.5, because the ages we're considering range from [0.0, 4.99999...], making the center age 2.5.
We create two "Book Ends" for the spline:
1. We add a point at exactly x-value = 0 years, using the y-value we got from step (1) for the Age (0-4) bin.
2. We add a point at exactly x-value = 100 years, using the y-value we got from step (6) when we broke up the 85+ bin.
We perform a cubic spline using the points produced from steps (1-2). We used the spline formulas from Wolfram MathWorld, but any spline method should produce similar results.
Splining these points together can make the sum of percentages deviate from the 100% we got from the 2020 Census Data (actually, the raw Census Data didn't always sum up to 100%). This happens because splining enforces requirements on the values our fitting function must go through, as well as continuity of the function, but not on the integral. To fix this we:
1. Verified that none of the sums deviated from 100% by more than 1%. i.e. - as a sanity check, we made sure that the sum of values produced by the spline was always in (99%, 101%).
2. We divided all of the percentage values from step (3) by the sum of percentages to force them to sum to exactly 100%.
We performed some final sanity checks:
1. We verified that the spline correctly went through all desired points.
2. We summed up the splined values in 5-year bins and compared them to the original data-set.

Here we provide plots for Alabama to demonstrate the procedure, as well as the quality of the fitting spline. The other 49 states were quite similar in their fits, and we feel the procedure is of acceptable quality.

We start with the initial 2020 Census Data:

Alabama 2020 Census Data

We split the 85+ bin into 85-89, 90-94, 95-99, and 100 bins, divide and re-center the data, and book-end it with points at Age=0 and Age=100. Then we perform our spline:

Alabama Re-Centered 2020 Census Data vs Spline

As you can see, the blue points have been shifted to the right relative to the first plot, there are new points at Age=0, and Age=87.5, Age=92.5, Age=97.5, Age=100, and we splined the data. Last we verified that our spline is a good fit by summing up the values and comparing to the original data:

Alabama 2020 Census Data vs Binned Spline

There is quite good agreement with the original data, so our attempts at breaking the data up into 1-year bins and getting values for the full range from Age=0 to Age=100 were successful.

(3) Integrate the Infection Mortality Rate against the Age Distribution

This step is trivial: we just take the dot-product of our Infection Mortality Rate, which has values for each Age from 0 to 100, with the Splined Age Distribution, which also has values for each Age from 0 to 100. Doing this produces the data in the Conclusions section of this post

Search This Blog

Helpful Data Hacks