Data scientists take on corona data to predict growth of new infections

By applying population growth models to worldwide corona data, TU/e data scientists are able to calculate the expected number of new infections and deaths from the virus for the near future. For the four countries with the highest number of infections (China, South Korea, Italy and Iran) they have an accurate forecast for one to three days in advance. Moreover, for South Korea and Iran, they can already estimate the maximum number of infections. In the short term, this will also be possible for other countries, such as the Netherlands and the US.

photo DOERS/Shutterstock

"Researchers around the world are analyzing the growth of corona infections, but actually making accurate predictions is quite difficult," says Edwin van den Heuvel, professor of statistics at TU/e. Together with two colleagues, he uses his knowledge of statistics and growth curves on the corona data to make these highly desirable calculations. "We are data scientists, we had to do something with this."

It may seem trivial to continue the line of infections, but this really requires a lot of effort, according to Van den Heuvel. "The question is always when the increase will level off, where is the maximum?"

For the four most affected countries (China, Italy, Korea and Iran), the researchers are now able to make an estimate of the number of new infections and deaths from the virus for one to four days in advance with an accuracy of 81%. Van den Heuvel expects that within a week he will also have reliable data for countries such as the United States and the Netherlands to do this.

"We continue to improve our model, so that we can also predict several days ahead," says Van den Heuvel. Moreover, they will look at the effect of measures and the structure of the population in China to explain the current maxima. “With this we hope to be able to predict where the maximum for other countries will be, so that we know how many people in total will be infected or die as a result of the virus”, says Van den Heuvel.

Population models Verhulst

Van den Heuvel relied on the famous logistic function developed by the Belgian mathematician Pierre Francois Verhulst around 1845. This function describes how a population grows over time, in an S-shape: in the beginning the population grows slowly, followed by an ever-increasing rise, which then flattens to a maximum.

They first applied this population model to data from China. "The total number of infections in the provinces in China turned out to follow that logistic growth very precisely," says Van den Heuvel. With the Chinese data, they were able to calibrate their prediction model, and then correct underestimations for the maximum number of new infections and deaths in other countries. "The predictions turned out to be right for Iran, Italy and South Korea."

Interested in keeping track of the latest calculations by the TU/e data scientists? This page gives frequent updates.

Share this article