It's been known since the early days of the Covid-19 pandemic that its Infection Fatality Rate
has a strong dependence on the patient's Age, with the chances of death
increasing exponentially as the Age increases. This dependence means
that two populations (e.g. - countries, states, cities, etc.) that are
identical in all ways other than Age Demographics, and take the exact same
legislative steps and preventative measures, should expect different Mortality Rates. Debates about which measures were successful in controlling the
spread of Covid-19 and keeping the Mortality Rates down must take Age
Demographics into account to avoid reaching the wrong conclusions. While
Medical Professionals and Statisticians know to correct for such factors when
making comparisons, the typical person likely doesn't know how big an effect
their State's Age Distribution has on overall Covid-19 outcome.
Here we examine what one would expect the Mortality Rate to be in the 50 U.S.
States based solely on their population's Age distributions, using publicly
available data. These values can be used as a potentially more correct
"baseline" for comparisons between different states' responses to the
pandemic: e.g. - Hawaii has one of the country's lowest Mortality Rates and
Arizona one of the highest, but are those outcomes due to how their
governments and people behaved in response to Covid-19, or primarily a
function of the Age of those States' populations?
Disclaimer
Nothing here should be construed as passing judgment on any State's or
People's responses to Covid-19. This is purely a statistical analysis
of objective data (State Age Demographics and Covid-19 Infection Fatality
Rates), with the goals of: (a) learning the importance of Age Demographics
on U.S. State Mortality Rates, and (b) obtaining a better Baseline for other
analyses to determine which responses were more effective in combating
Covid-19. In addition, we are not epidemiologists: this analysis is
only producing back-of-the-envelope estimates.
Conclusions
The Age Distribution of each State's population has a large impact on
Mortality Rate (Deaths per 1M Population), with the highest Predicted
Mortality Rate for a State (Florida: 15,852) being 85% higher than the lowest
Predicted Mortality Rate for a State (Utah: 8,553). This reinforces the
idea that Age Demographics must be taken into account when determining State
Response Performance. There are many other factors that could impact our
choice of baseline, most notably Obesity Prevalence, a known Covid-19 risk
factor that varies from 24% to 40% between States. This can be done in
later research.
Our model is based on Age Demographics inferred from the 2020 U.S. Census,
combined with Infection Fatality Rates determined from the original Covid-19
strain spreading from 4/1/2020 - 1/1/2021 (before Vaccines were available and
before widespread Variants). It assumes the entire population gets
infected without protection, resulting in much higher Mortality Rates than
were observed in reality. These values were compared to the Mortality
Rates from Worldometer, and Predicted Mortality Rates were scaled to have the
same Mean as the Realized Worldometer values to make comparisons easier.
Here is a table summarizing our results:
- State - Names of the 50 U.S. States
-
Population: Number of People - From the By-State 2020 U.S. Census, "Table S0101: AGE AND SEX", summed across all ages
-
Population: Percent Age >= 75 - From the By-State 2020 U.S. Census, "Table S0101: AGE AND SEX", summing rows for Age >= 75
-
Population: Vaccination Rate - From the By-State news.google.com, fully vaccinated percentage
-
Predicted: Deaths - Results from integrating Covid-19
Infection Fatality Rate against Inferred Age Distribution.
-
Predicted: Deaths / 1M Pop - 1,000,000 * (Predicted: Deaths) /
(Population)
-
Re-Meaned Predicted: Deaths / 1M Pop - Predicted Mortality
Rates multiplicatively scaled to have the same average as the Worldometer
Mortality Rates. (Predicted: Deaths / 1M Pop) *
(Average(Worldometer: Deaths / 1M Pop) / Average(Predicted: Deaths / 1M
Pop)
-
Worldometer (2022-12): Deaths - From
Worldometer for the U.S., sampled in 2022-12
-
Worldometer (2022-12): Deaths / 1M Pop - 1,000,000 * (Worldometer:
Deaths) / (Population)
-
Performance: Deaths / 1M Pop - How each State performed relative to
predicted Mortality Rates based on our Age Demographics model:
(Worldometer:Deaths / 1M Pop) - (Re-Meaned Predicted Deaths / 1M
Pop). A positive number means there were more deaths than the model
predicts, and so is in red because it means that State did worse than
expected based on Age Demographics alone. A negative number means
there were fewer deaths than the model predicts, and so is in green
because it means that State did better than expected based on Age
Demographics alone.
|
U.S. State Predicted Covid-19 Mortality Rate - Age Model
|
Another way of viewing this that can make outliers jump out is a Scatter
Plot. Here we plot the Realized Mortality Rate (y-axis) versus the
Re-Meaned Predicted Mortality Rate (x-axis). Anything in the top left
had more deaths than predicted (red) and anything on the bottom right had
fewer deaths than predicted (green):
|
U.S. State Realized vs Predicted Covid-19 Mortality Rate - Age
Model
|
Obviously there is a lot of noise in this model, which is unsurprising since
it takes so little into account and makes huge assumptions about everyone
getting sick with the original strain and no vaccines. However, it
does make clear how important Age Demographics of each State can be.
One thing we could compare the Excess Mortality Rate (i.e. - Realized -
Predicted) to is the Vaccination Rates for each State.
|
U.S. State Vaccination Rate vs Excess Mortality Rate
|
While there are many conflating factors we didn't take into account (e.g. -
where breakouts occurred before vaccines were available, Obesity, Population
Density), it certainly looks like there is a relationship: the Lower the
Vaccination Rate, the higher the Excess Mortality Rate
Methodology
We needed two pieces of information to make predictions on the Covid-19
Mortality Rate for each U.S. State:
-
Infection Mortality Rate for Covid-19 as a function of Age of the
patient.
- Age Distributions for each of the 50 U.S. States
Once that is done we can:
-
Integrate the Infection Mortality Rate against the Age Distribution
And we'll arrive at the expected number of deaths for each State.
(1) Infection Mortality Rate for Covid-19 as a function of Age
We found
an article
in The Lancet that provided the Infection Mortality Rate in one-year age
increments, with their estimates provided in easily
downloadable
format. They examined fatalities caused by Covid-19 in multiple
countries, using case data from 4/1/2020 - 1/1/2021 (the original Covid-19
Strain before vaccines were available and before the spread of
variants). They combined the fatality data with seroprevalence surveys
to arrive at Infection Fatality Rates that take asymptomatic and untested
cases into account. Their estimates go from Age 1-100, and we plot
their results here:
|
Covid-19 Infection Fatality Rate (IFR) vs Age
|
|
The only changes that had to be made to their results related to "Age
0". It was unclear from their description if "Age 1" referred to the
interval [0,1], [0.5, 1.5], or [1,2]. We decided to associate it with
the [1,2] bin, and fill in the [0,1] value by linearly extending the data
from [2,3] and [1,2] (in log-space, the orange line plotted above).
(2) Age Distributions for each of the 50 U.S. States
We started with the
2020 U.S. Census data, which provides data about each State's population broken up into 5-year
bins up to Age 85+. Unfortunately, with Covid-19's strong sensitivity
to higher Ages we need to correct for two issues:
- Break up the 85+ bin into 85-89, 90-94, 99-99 bins
- Break up all 5-year bins into 1-year bins
These two requirements were handled in different ways.
(a) Break up the 85+ bin into 85-89, 90-94, 95-99 bins
Since we're completely lacking finer grain information, we went to the
Social Security Actuarial Tables
to see how likely people were to survive an additional year based on their
Age. We aren't actually comparing apples to apples here, since the
Actuarial table is the likelihood of a given person at a given age
surviving for one additional year, while the population at a given moment
in a State doesn't need to follow that distribution at all: e.g. - there
might be a huge influx of retirees who move to Florida at age 65,
resulting in an upward spike in Florida's Age Distribution that is
decidedly different from the actuarial tables. Lacking simple
alternatives, we did the following:
- Look up the percentage of people in the 85+ bin for each State.
-
Associate the entire percentage from step (1) with "Age 85".
(This is only temporary)
-
Fill in the population for ages 86-100 by multiplying the previous Age
by the likelihood of surviving one additional year from the Actuarial
Tables.
-
Sum up the percentage of population that we just filled in from Ages
85-100.
-
Multiply each Age's percentage by the ratio of values: (1) / (4).
-
Sum up the percentage-by-Age from step (5) into 5-year bins: 85-89,
90-94, 95-99, and leave 100 as a 1-year bin.
At the end of step (6) we have a percentage of the population in three
5-year bins and a single 1-year bin, and the sum of those four bins
equals the original value from step (a), which is equal to the value in
the 2020 Census for Age 85+. Thus we are ensured "internal
consistency" with our own data.
(b) Break up all 5-year bins into 1-year bins
Once again we're lacking access to finer grain information, so we
performed a spline to interpolate between our 2020 Census points (plus the
split 85+ bins) and produce a full data set of 1-year values. To do
this we performed the following procedure for all 50 States:
-
Produce (x,y) coordinates for our spline:
-
Divide the percentage in each bin by the number of years in the
bin, to get an average percentage-per-year. These will be
the y-values for our spline. e.g. - Age 0-4 for Alabama had
5.8% of the population. There are 5 years of ages in that
bin (0 - 0.999...), (1 - 1.999...), (2 - 2.999...), (3 -
3.999...), (4, 4.999...), so the y-value for this point becomes
5.8/5 = 1.16%.
-
Take the center age of each bin. These will be the x-values
for our spline. e.g. - Age 0-4 for Alabama gets an x-value
of 2.5, because the ages we're considering range from [0.0,
4.99999...], making the center age 2.5.
-
We create two "Book Ends" for the spline:
-
We add a point at exactly x-value = 0 years, using the y-value we
got from step (1) for the Age (0-4) bin.
-
We add a point at exactly x-value = 100 years, using the y-value
we got from step (6) when we broke up the 85+ bin.
-
We perform a cubic spline using the points produced from steps
(1-2). We used the spline formulas from
Wolfram MathWorld, but any spline method should produce similar results.
-
Splining these points together can make the sum of percentages deviate
from the 100% we got from the 2020 Census Data (actually, the raw
Census Data didn't always sum up to 100%). This happens because
splining enforces requirements on the values our fitting function must
go through, as well as continuity of the function, but not on the
integral. To fix this we:
-
Verified that none of the sums deviated from 100% by more than 1%.
i.e. - as a sanity check, we made sure that the sum of values
produced by the spline was always in (99%, 101%).
-
We divided all of the percentage values from step (3) by the sum
of percentages to force them to sum to exactly 100%.
-
We performed some final sanity checks:
-
We verified that the spline correctly went through all desired
points.
-
We summed up the splined values in 5-year bins and compared them
to the original data-set.
Here we provide plots for Alabama to demonstrate the procedure, as well
as the quality of the fitting spline. The other 49 states were
quite similar in their fits, and we feel the procedure is of acceptable
quality.
We start with the initial 2020 Census Data:
|
Alabama 2020 Census Data
|
We split the 85+ bin into 85-89, 90-94, 95-99, and 100 bins, divide and
re-center the data, and book-end it with points at Age=0 and
Age=100. Then we perform our spline:
|
Alabama Re-Centered 2020 Census Data vs Spline
|
As you can see, the blue points have been shifted to the right relative
to the first plot, there are new points at Age=0, and Age=87.5,
Age=92.5, Age=97.5, Age=100, and we splined the data. Last we
verified that our spline is a good fit by summing up the values and
comparing to the original data:
|
Alabama 2020 Census Data vs Binned Spline
|
There is quite good agreement with the original data, so our attempts at
breaking the data up into 1-year bins and getting values for the full
range from Age=0 to Age=100 were successful.
(3) Integrate the Infection Mortality Rate against the Age Distribution
This step is trivial: we just take the dot-product of our Infection
Mortality Rate, which has values for each Age from 0 to 100, with the
Splined Age Distribution, which also has values for each Age from 0 to
100. Doing this produces the data in the Conclusions section of
this post
Comments
Post a Comment