COVID-19 in Switzerland real-time epidemiological analyzes powered by EpiGraphHub

Information source

For this work, we used data from the Federal Office of Public Health (OFSP – We used the reported numbers of daily cases (shown in Fig. 1), hospitalizations, tests, test positivity and deaths in the 26 cantons. The cantons are administrative regions of Switzerland similar to a state in the United States.

Fig. 1

Daily case count for all cantons. We can see that there are delays between their incidence curves. Cantons are identified by their official two-letter codes (

The datasets used here are listed in Table 1 where we outline the names, brief descriptions, and links to where they can be viewed on the EpigraphHub platform.4. For each of the datasets, a corresponding metadata table is also available ( Metadata tables have the same name as the tables they refer to, with the appended suffix “_meta”. Other COVID-19 datasets obtained from the Swiss Federal Office of Public Health are also available for download and viewing in EpigraphHub. Data sets are stored and re-shared unchanged.

Table 1 List of datasets used for this article.

For a summary visualization of the datasets used here, see

View Hospitalization Rates

Given the succession of viral variants and the effects of vaccination since the beginning of 2021, a simple but effective way to visually follow the evolution of the risk of hospitalization over time is to look at the day-to-day relationship. day between the number of new hospitalizations and the number of new Covid-19 cases reported daily. This visualization can be achieved by a simple scatter plot and by applying temporal color mapping. Finally, we split the analysis into blocks of 3 months to show the evolution of the average severity of the cases (Fig. 2). We can look at these rates by canton, to see how they differ from the national rates (Fig. 3).

Figure 2
Figure 2

Daily hospitalizations by case in Switzerland, colored by quarters (3-month windows). Q1: January-March, Q2: April-June, Q3: July-September and Q4: October-December. The trend lines represent the average ratio of hospitalizations per case.

Figure 3
picture 3

Daily hospitalizations per case in Zurich, Geneva, Aargau and Bern. The blue circles are from Q4 of 2021, the greens from Q1 of 2022 and the cyans from Q2 of 2022.

Spatio-temporal analysis

From the time series of daily reported cases for each canton, we applied pairwise correlation analysis to unravel the spatial dynamics of the virus.

As the virus spreads across geographic regions (e.g. townships), delays in the incidence of reported cases in different regions can be estimated (Fig. 1). We used a cross-correlation between new case series reported daily to not only estimate this spatial trajectory over time, but also to analyze the magnitude of pairwise correlations between all townships. To get the offset τ between two series, we evaluated the lag that maximizes the intercorrelation coefficient between each pair of cantons.

The normalized cross-correlation function for two time series, Xyou and Yesyou is given by:

$${rho }_{XY}(tau )=frac{{mathbb{E}}left[left({X}_{t}-{mu }_{X}right)left({Y}_{t+tau }-{mu }_{Y}right)right]}{{sigma }_{X}{sigma }_{Y}}.$$


The sign of τ which maximizes the cross-correlation function is an approximation of the direction of predictability, i.e. if ρXY(τ > 0) this means that the canton X anticipates Yes in trends in incidence, and therefore may be a good indicator Yes6. To find the value of τwhich maximizes the correlation for each pair of cantons, we calculated ρXY(τ ) for values ​​of τ ranging from −30 to 30 days. Here, µ and σare the mean and standard deviation for each time series. We used this information to build forecast models for each township, as shown below. It should be noted that this measure is not proof of causality between pairs of geographic However, it allows selection of regions that can contribute to short-term forecasts of trends in other regions.

Spatial clustering

Using the 1−maximum (ρXY) as distance between blocks Xand Yes we can perform an agglomerative clustering of the cantons6, taking into account the optimal delay obtained as described above. Maximum correlations and optimal lags are stored as correlation and lag matrices, respectively (Figs. 4, 5).

Figure 4
number 4

Cross-correlation matrix between canton incidence time series.

Figure 5
number 5

This matrix shows the offset that maximizes the correlation between each pair of blocks.

Estimated prevalence and hospitalization rate

If we view epidemics as stochastic processes, we can use the available data to make inferences about their rates. Here, we used case, test, and hospitalization series to estimate prevalence and hospitalization rates.

Estimating prevalence from the number of reported cases is not trivial, as the frequency of testing varies considerably over time, influencing the number of cases detected. Thus, we constructed a simple hierarchical Bayesian model to estimate the prevalence of infection HPyou and hospitalization rate pHyou the number of tests Jyou and the number of positive tests (Caseyou).

We start by modeling the reported cases (Caseyou) as a binomial process (Eq. 2) with parameters not and p corresponding respectively to the number of daily tests and the fraction of positive tests8.

Suppose we assume that the number of tests and the coverage of tests are large enough that the population tested approximates a representative sample of the general population. In this case, the number of positive tests will allow us to estimate the probability that a test will be positive. This can be used to approximate the proportion of infected individuals in the general population, i.e. prevalence, HPyou. To get a correct representation of prevalence, we can model it using a beta prior: (P{v}_{t} sim Betaleft({alpha }_{p},{beta }_{p}right))technically, by treating it as a random variable.

$$Case{s}_{t} sim Binleft(n={T}_{t},p=P{v}_{t}right).$$


Similarly, we can model the probability of hospitalization as (P{h}_{t} sim Betaleft({alpha }_{h},{beta }_{h}right)) and Hospitalizations as a pair,

$$Hospitalization{s}_{t} sim Binleft(n=Cas{s}_{t},p=P{h}_{t}right).$$


The complete Bayesian model then becomes:

$$begin{array}{lll}Hospitalization{s}_{t}| P{h}_{t} & sim & {rm{Bin}}(n=Cas{s}_{t},p=P{h}_{t}), Cas{s}_ {t}| P{v}_{t} & sim & {rm{Bin}}(n={T}_{t},p=P{v}_{t}), P{h}_{ t} & sim & {rm{Beta}}({alpha }_{h},{beta }_{h}), P{v}_{t} & sim & {rm {Beta}}({alpha }_{p},{beta }_{p}).end{array}$$

The choice of non-informative Beta priors, ({alpha }_{h}={beta }_{h}={alpha }_{p}={beta }_{p}=0.5)was taken to start the inference from a neutral a priori perspective.

These simple probabilistic models have a closed-form expression for the posterior distribution of binomial probability parameters because they are based on conjugate (beta-binomial) distributions. Inference based on the models described here was performed with the PyMC python package ( or using the closed formulas for later beta distributions.

The advantage of having a probabilistic representation of incidence is that it can be plugged into the binomial model of hospitalizations (Eq. 3).

Forecasting models

The use of ensemble models to forecast epidemiological time series has been successfully applied several times in recent years.6.9.

To predict the hospitalization curves of the cantons, we used a probabilistic model of gradient boosting machinetenbecause they can capture complex nonlinear relationships in multiple time series regression models.

The model was defined as

$$begin{array}{lll}ln{H}_{k,t} & = & {beta }_{0,k}+{beta }_{1,k}{C}_{k ,t-{tau }_{i}}+{beta }_{2,k}{H}_{k,t-{tau }_{i}} & & {+beta } _{3,k}{T}_{k,t-{tau }_{i}}+{beta }_{4,k}IC{U}_{k,t-{tau }_ {i}}+varepsilon ,end{array}$$


where Hk, t is modeled as a log-normal random variable, is the number of new hospitalizations in the canton kthe day you , VS is the incidence, J is the number of tests performed, and intensive care is the number of intensive care patients. Each of these predictors enters the model 14 times, with a lag τ= 1…14 (we use the last 14 days of each series as predictors). Similarly, different cantons of the same cluster as kare also added to the model with the same offsets.

The model of eq. (4), can be trained to predict hospitalizations for any day ≥you. Here we used it to predict the number of hospitalizations up to 14 days in advance (Fig. 9).

The forecast models are run daily, right after the data is updated in the EpiGraphHub database. The forecasts are then also saved in EpiGraphHub. The URLs of the updated datasets used and the result tables are shown in Table 1.

Comments are closed.