Introduction – or why the net is flooded with the same types of descriptive statistics
I’ve been pushing this moment for quite a while now. Yes, you know that moment when you feel that everybody (and I mean EVERYBODY) has written something about Covid-19 and you ask yourself if you really should partake in the endeavor of flooding the internet with posts on this pandemic. For one, even though my main interest is healthcare, I know not enough to join the congregation of self-elected epidemiologists that recently popped up around the world and on social media. They also are self-elected virologists but seem unable to make the difference between the two. Secondly, just repeating variants of the descriptive statistics on the virus from the John-Hopkins and WHO data feels, if I may say, rather tedious and boring. We’ve all seen the figures and the myriads of graphs….na, won’t do that; It kind of lacks originality. Note that I do see the importance of statistics, just that we do not need 15 million copies of the same thing.
So, why am here today? First of all, I AM BORED!!! I am on my first week of self-confinement since I have a number of symptoms which make me seriously wonder whether I’ve gone Covid. Secondly, and as I wrote above, statistics and curves have been done for all the countries and I want to offer something else to read.
As most of us have noticed, the focus of an overwhelming majority of the countries affected by the novel corona virus has been to “flatten the curve” describing the number of infected individuals. The reason for this, as you might have understood, is that one wishes to make sure as few people as people get seriously ill and put a strain on healthcare service. Indeed, strain on scares resources imply that the demand of highly technical and specialized services exceed their supply. The consequence is thus the need to estimate the possible outcomes of dispensing care for different patients. This is something that most of us agree to be an undesirable situation to be put in, especially if we know how to avoid it. So, what is the most effective way to do so? What is common to all the strategies states have chosen? Yes, that’s right! To minimize the number of interactions between people. That’s a good start, and this is what we are going to dedicate some time to show how efficient it is.
The SIR model
As I pointed out above, I am neither an epidemiologist nor a virologist. I will therefore not claim to make an accurate model of the Covid-19 virus spread over time (nor that of any other illness for that matter). I will, however, need to make some realistic assumption and also make some over-simplification to make the whole thing somewhat understandable.
Indeed, to design a truly realistic model, one would need to fully understand how individuals interact in a society. Different groups move in different manners and in different areas of a city and those patterns are determined by age, occupation, interests and so forth. One notable thing is that individuals are actually far more predictable in their overall behaviour than one might think. Think about how your own randomly chosen day looks like. I’d bet you that it is pretty much like the day after or the previous one, especially if you are a well settled individual over 35. So, to design a sufficiently realistic model you would need to have some solid data on demographics, OD (Origin Destination) matrices for (at least) groups of individuals and on how they might interact with other members of society.
So, to keep things simple, I will not divided the society in different groups and assume that they move randomly within a given area. Although this seems to be a gross oversimplification, a well-distributed population is in appearance a randomly moving group. This simplification also assumes that all individuals are equally prone to be contaminated as to be contagious, although I suspect this cannot be true. As I said, I am not an epidemiologist, and the point of this blog is not to put forth a novel theory of the actual spread of Covid-19. Instead it is more a pedagogical excerise in python. Another assumption is that a given individual’s probability of contaminating another individual is equal for all contaminated individuals. This of course not true. As has been pointed out by most epidemiologists in the past week, most contaminations occur within the family. Also, different individuals have different social networks and therefore smaller or greater potentials to be viral agents.
So, what is the SIR model. It one of a long series of mathematical models to show spread of a disease in a closed population. By closed, we mean that we do not take into consideration the influx (birth, immigration) of individuals nor the outflux (deaths or emigration). The S, I and R are actually functions of time that describe three groups:
- S(t) consists of the individuals that are Susceptible to be infected (but haven’t yet).
- I(t) is the number of Infected individuals and
- R(t) are the individuals that have Recovered from the disease.
As you might have guessed, the size of each group changes (hopefully) over time. At the beginning, few will be infected while many will be susceptible to be infected. the number of infected will gradually increase to finally reach a peak. That, at a given time, the curve over the number of infected reaches a maximum comes from the fact that 1) The number of individuals susceptible to be infected diminishes (which make to probability for an uninfected to meet an infected lower) and 2) The number of people that were infected but recovered increases.
Now, what we wrote about suggests there are two parameters to take into account. As I pointed out above, the rate at which people get infected depends on the potential number of individuals an infected person comes into contact with. Or equivalently how many people an infected individual comes into contact with healthy people that are not immune. This rate is known as the contact rate and is denoted . The number of people a generic individual come into contact with another person in a given time frame is thus . As you may have guessed, not all have the same propensity to meet people and it highly depends on a wide range of factors. But ponder this: From the time you leave your home, get on a buss or commuter train, meetings at work, lines in shops and so on, how many people have you had more or less close contact with? The second parameter I indirectly mentioned was how many people have recovered after a given time. This measure depends on how long an infected person is contagious. This rate is called the mean recovery rate and denoted . It depends on the virus and is determined empirically by virologists; One method is to estimate when a given individual was first showing symptom (the start of the contagious period) and when the patient is considered to have recovered. This is the reason for which the period given for Covid-19 is between 5 and 14 days but that it is average is approximately 10 days (i.e. ).
The model is mathematically rather simple. It was first given by Kermack and McKendrick [Proc. R. Soc. A, 115, 772 (1927)] as the following system of differential equations:
So….we have now all the important information to understand the model. Let’s do some code. Here, I choose Python, but it is as easily done in R. I suggest this as an excercise. I have actually plans to do this with a more complicated type of model with several age classes using R:s epiDynamics package.
Preparing the model
As for any model, you need to initiate the model with the above described parameters and the model equations:
import numpy as np from scipy.integrate import odeint import matplotlib.pyplot as plt
We set the population at 1000 individuals. (it doesn’t really matter, just as long as you keep proportions reasonable.
N = 1000
I0 and R0 are the number of initially infected individuals and recovered. At first, let’s assume that 5 people have just entered the country from some infected place. All other individuals are suscpetible to be infected.
I0, R0 = 5, 0 S0 = N - I0 - R0
Here comes the crucial part. Lets first assume that the average contact rate of an individual is . that is that the individual comes in contact with 600 people. In some contexts, that might not be impossible. Think of a night club, a school, a concert or some shops. Let’s also assume that the recovery rate i 10 days. We also choose a time frame of 100 days to observe the development od S, I and R.
beta, gamma = 0.6, 1/10 t = np.linspace(0, 100, 100)
The equation system
We define a function for the derivatives of the equation system and the initial conditions (I0, R0 and S0) of the system as well as the integral of the equations over time (from the initial moment to the time limit we set above).
def derivative(y, t, N, beta, gamma): S, I, R = y dSdt = -beta * S * I / N dIdt = beta * S * I / N - gamma * I dRdt = gamma * I return dSdt, dIdt, dRdt y0 = S0, I0, R0 ret = odeint(deriv, y0, t, args=(N, beta, gamma)) S, I, R = ret.T
Let see how the spread of the virus plays itself out when the average individual meets as many as 600 people. To do so, let’s plot he curves for S(t), R(t) and I(t).
fig = plt.figure(facecolor='w') ax = fig.add_subplot(111, facecolor='#dddddd', axisbelow=True) ax.plot(t, S/1000, 'b', alpha=0.5, lw=2, label='Susceptible') ax.plot(t, I/1000, 'r', alpha=0.5, lw=2, label='Infected') ax.plot(t, R/1000, 'g', alpha=0.5, lw=2, label='Recovered with immunity') ax.set_xlabel('Day (beta = 0.2 , gamma = 0.1 )') ax.set_ylabel('Number (1000s)') ax.set_ylim(0,1.5) ax.yaxis.set_tick_params(length=0) ax.xaxis.set_tick_params(length=0) ax.grid(b=True, which='major', c='w', lw=2, ls='-') legend = ax.legend() legend.get_frame().set_alpha(0.5) for spine in ('top', 'right', 'bottom', 'left'): ax.spines[spine].set_visible(False) plt.show()
The result being
As you can see, a little bit more that half the population (520) is infected around 18 days after the outbreak. At that point, a quarter of the population will have recovered and be immune while a quarter will still be susceptible to be infected. The problem in a situation like this depends on how fatal the infection is. If half the population is in need of ICU-care, this is serious. If only certain age groups are at risk, it can be serious is the demographic structure of the population is such that there are many older individuals (e.g. Italy, Spain, France or Sweden). The situation can be even worse if there is a compound of factors that make the virus dangerous (e.g. comorbidities). The problem is also dependent on the ICU’s ability to take care of so many patients.
Let’s see now what happens if we reduce the number of potential contact by, say 3. I assume that the average Joe has 200 contacts in the a population of 1000 individuals. The curve now looks like this:
Some really interesting observations can be made here!!! First of all, the peak of the curve showing the number of infected individual never even reaches 200 individuals out of the 1000 in the population. That, in itself is a very good news, at least for the patients because it implies that there is a chance for healthcare professionals to handle the influx of patients and not end up in a situation in which unpleasant choices need to be made. The price? Well, in the previous case the time spent with infected individuals in society was around 50 days while that number became approximately 100 in the second case. There are reason to believe that it is a rather high price to pay for the society as a collective. But for infected individuals in risk groups, it is unquestionably preferable.
The next, even more surprising (or is it?) observation is that in the first case, the entire population has recovered AND is immune after 70 days. This means that they are (at least for some time) not going to be infected (at least with this virus). In the second case, the figures are slightly different. After 150 days, 80% of the population is immune, but the virus has not had the possibility to survive to infect the remaining 20% of the population that was susceptible to be infected.
Lessons learned from this exercise
The strategies adopted by a wide range of countries to handle the spread of COVID-19 all boil down to one very simple set of goals. For one, there is a strong need not to get a too rapid rise in the number of infected individuals and in particular individuals that may die of the virus. That doesn’t mean that fewer will be ill but rather that they’ll spread out over time, thus given healthcare services a fighting chance to give patients all the care they can get, the result being to save people that in a chaotic situation would have died. The second goal in to make sure that the proportion of individuals with immunity is sufficiently high that the virus doesn’t get the necessary amount of contagious agents to survive. This is called flock immunity and is something that has been known for a long time.
The only mean of managing this, as we saw above, is to reduce the number of possible contacts. It can be achieve in many different ways: Full lock-down (France), partial lock-down (Korea), self-isolation (Sweden) and by asking all citizens to keep a safe distance from one another.
These strategies are sensible and take into account the immediate healthcare needs of the population. They do, however, not take care of the long-term effect on the public health, the economy and other factors. My opinion on the matter is my own and have no place in this blog, so I will not indulge you with what I think.
This has not been an attempt from my side to claim to know something that others do not know, but rather to show intuitively and how little needs to be done to explain the concept of flattening the curve as has been spoken about the past few weeks. I also wanted to show the impact of actually reducing social contacts and that every individual actually can make an enormous difference in outcome. Not everybody is a healthcare professional. Not everybody has unlimited economic resources to produce masks or other necessary medical equipment. However, there is something very easy and efficient that everyone can do, namely ensuring that the average contact rate falls sufficiently as to flatten the curve.
As I mentioned above, I will in the coming weeks develop this idea and try to refine the experiment to include groups with different risk factor such as comorbidities and age, or even better age groups with subgroups having risk factors.
Untill then, stay safe and keep your distance!
You have a really nice blog; I found it on a job advertisement for Sopra Steria on Linkedin. I really like this post and will read the follow up post.
In your plots I found sth else also interesting. Let’s keep gamma fixed, as you did. By decreasing beta R(t->inf) decreases and S(t -> inf) increases, which make sense. Now we can ask how much should be beta for flock/herd immunity? Or the other way around, knowing the values of R or S at infinity what can we say about beta?
Unfortunately I(t -> inf) =0 is the fixed point of ODEs which does not give more information about R and S. But one can find S and I as a function of R using the differential equations. For instance the first and the last one gives:
dS/dR = – (beta / gamma) S (I am working with the densities, so already everything is divided by N).
And this gives:
S = S0 exp(- beta * R / gamma) .
And we know that at t=0, R0 = 0 so
S0= 1- I0 ( 0.995 in your example)
and at t->inf, I(inf) = 0, hence,
S(inf) + R(inf) =1.
Combining this and the relation between S and R we get beta/gamma (this is a dimensionless parameter and meaningful at infinity)
beta/gamma = – 1/R(inf) * ln ( ( 1- R(inf) ) / (1- I0) ) .
If we consider R (inf) =2/3 we get beta/gamma ~1.6 and with gamma =0.1 (1/ time unit), beta ~ 0.16 (1/ time unit) . Which is quite close to what you have in the plot for beta=0.2. Actually the above formula gives beta ~ 0.2 for R(inf) =0.8 as well. For R(inf) = 0.1, which is the limit of very low recovered/immune we get beta ~ 0.1 . In comparison with the above case, this is a lot of change in R(inf).