# COVID-19 in Switzerland real-time epidemiological analyzes powered by EpiGraphHub

### Information source

For this work, we used data from the Federal Office of Public Health (OFSP – opendata.swiss/en/dataset/covid-19-schweiz). We used the reported numbers of daily cases (shown in Fig. 1), hospitalizations, tests, test positivity and deaths in the 26 cantons. The cantons are administrative regions of Switzerland similar to a state in the United States.

The datasets used here are listed in Table 1 where we outline the names, brief descriptions, and links to where they can be viewed on the EpigraphHub platform.^{4}. For each of the datasets, a corresponding metadata table is also available (https://epigraphhub.org/tablemodelview/list/). Metadata tables have the same name as the tables they refer to, with the appended suffix “*_meta*”. Other COVID-19 datasets obtained from the Swiss Federal Office of Public Health are also available for download and viewing in EpigraphHub. Data sets are stored and re-shared unchanged.

For a summary visualization of the datasets used here, see epigraphhub.org/superset/dashboard/p/yorXv7eBJAQ/.

### View Hospitalization Rates

Given the succession of viral variants and the effects of vaccination since the beginning of 2021, a simple but effective way to visually follow the evolution of the risk of hospitalization over time is to look at the day-to-day relationship. day between the number of new hospitalizations and the number of new Covid-19 cases reported daily. This visualization can be achieved by a simple scatter plot and by applying temporal color mapping. Finally, we split the analysis into blocks of 3 months to show the evolution of the average severity of the cases (Fig. 2). We can look at these rates by canton, to see how they differ from the national rates (Fig. 3).

### Spatio-temporal analysis

From the time series of daily reported cases for each canton, we applied pairwise correlation analysis to unravel the spatial dynamics of the virus.

As the virus spreads across geographic regions (e.g. townships), delays in the incidence of reported cases in different regions can be estimated (Fig. 1). We used a cross-correlation between new case series reported daily to not only estimate this spatial trajectory over time, but also to analyze the magnitude of pairwise correlations between all townships. To get the offset *τ* between two series, we evaluated the lag that maximizes the intercorrelation coefficient between each pair of cantons.

The normalized cross-correlation function for two time series, *X*_{you} and *Yes*_{you} is given by:

$${rho }_{XY}(tau )=frac{{mathbb{E}}left[left({X}_{t}-{mu }_{X}right)left({Y}_{t+tau }-{mu }_{Y}right)right]}{{sigma }_{X}{sigma }_{Y}}.$$

(1)

The sign of *τ* which maximizes the cross-correlation function is an approximation of the direction of predictability, i.e. if *ρ*_{XY}(*τ* > 0) this means that the canton *X* anticipates *Yes* in trends in incidence, and therefore may be a good indicator *Yes* ^{6}. To find the value of *τ*which maximizes the correlation for each pair of cantons, we calculated *ρ*_{XY}(*τ* ) for values of *τ* ranging from −30 to 30 days. Here, *µ* and *σ*are the mean and standard deviation for each time series. We used this information to build forecast models for each township, as shown below. It should be noted that this measure is not proof of causality between pairs of geographic regions.^{seven}. However, it allows selection of regions that can contribute to short-term forecasts of trends in other regions.

### Spatial clustering

Using the 1−*maximum* (*ρ*_{XY}) as distance between blocks *X*and *Yes* we can perform an agglomerative clustering of the cantons^{6}, taking into account the optimal delay obtained as described above. Maximum correlations and optimal lags are stored as correlation and lag matrices, respectively (Figs. 4, 5).

### Estimated prevalence and hospitalization rate

If we view epidemics as stochastic processes, we can use the available data to make inferences about their rates. Here, we used case, test, and hospitalization series to estimate prevalence and hospitalization rates.

Estimating prevalence from the number of reported cases is not trivial, as the frequency of testing varies considerably over time, influencing the number of cases detected. Thus, we constructed a simple hierarchical Bayesian model to estimate the prevalence of infection *HP*_{you} and hospitalization rate *pH*_{you} the number of tests *J*_{you} and the number of positive tests (*Case*_{you}).

We start by modeling the reported cases (*Case*_{you}) as a binomial process (Eq. 2) with parameters *not* and *p* corresponding respectively to the number of daily tests and the fraction of positive tests^{8}.

Suppose we assume that the number of tests and the coverage of tests are large enough that the population tested approximates a representative sample of the general population. In this case, the number of positive tests will allow us to estimate the probability that a test will be positive. This can be used to approximate the proportion of infected individuals in the general population, i.e. prevalence, *HP*_{you}. To get a correct representation of prevalence, we can model it using a beta prior: (P{v}_{t} sim Betaleft({alpha }_{p},{beta }_{p}right))technically, by treating it as a random variable.

$$Case{s}_{t} sim Binleft(n={T}_{t},p=P{v}_{t}right).$$

(2)

Similarly, we can model the probability of hospitalization as (P{h}_{t} sim Betaleft({alpha }_{h},{beta }_{h}right)) and Hospitalizations as a pair,

$$Hospitalization{s}_{t} sim Binleft(n=Cas{s}_{t},p=P{h}_{t}right).$$

(3)

The complete Bayesian model then becomes:

$$begin{array}{lll}Hospitalization{s}_{t}| P{h}_{t} & sim & {rm{Bin}}(n=Cas{s}_{t},p=P{h}_{t}), Cas{s}_ {t}| P{v}_{t} & sim & {rm{Bin}}(n={T}_{t},p=P{v}_{t}), P{h}_{ t} & sim & {rm{Beta}}({alpha }_{h},{beta }_{h}), P{v}_{t} & sim & {rm {Beta}}({alpha }_{p},{beta }_{p}).end{array}$$

The choice of non-informative Beta priors, ({alpha }_{h}={beta }_{h}={alpha }_{p}={beta }_{p}=0.5)was taken to start the inference from a neutral *a priori* perspective.

These simple probabilistic models have a closed-form expression for the posterior distribution of binomial probability parameters because they are based on conjugate (beta-binomial) distributions. Inference based on the models described here was performed with the PyMC python package (www.pymc.io) or using the closed formulas for later beta distributions.

The advantage of having a probabilistic representation of incidence is that it can be plugged into the binomial model of hospitalizations (Eq. 3).

### Forecasting models

The use of ensemble models to forecast epidemiological time series has been successfully applied several times in recent years.^{6.9}.

To predict the hospitalization curves of the cantons, we used a probabilistic model of gradient boosting machine^{ten}because they can capture complex nonlinear relationships in multiple time series regression models.

The model was defined as

$$begin{array}{lll}ln{H}_{k,t} & = & {beta }_{0,k}+{beta }_{1,k}{C}_{k ,t-{tau }_{i}}+{beta }_{2,k}{H}_{k,t-{tau }_{i}} & & {+beta } _{3,k}{T}_{k,t-{tau }_{i}}+{beta }_{4,k}IC{U}_{k,t-{tau }_ {i}}+varepsilon ,end{array}$$

(4)

where *H*_{k, t} is modeled as a log-normal random variable, is the number of new hospitalizations in the canton *k*the day *you* , *VS* is the incidence, *J* is the number of tests performed, and *intensive care* is the number of intensive care patients. Each of these predictors enters the model 14 times, with a lag *τ*= 1…14 (we use the last 14 days of each series as predictors). Similarly, different cantons of the same cluster as *k*are also added to the model with the same offsets.

The model of eq. (4), can be trained to predict hospitalizations for any day ≥*you*. Here we used it to predict the number of hospitalizations up to 14 days in advance (Fig. 9).

The forecast models are run daily, right after the data is updated in the EpiGraphHub database. The forecasts are then also saved in EpiGraphHub. The URLs of the updated datasets used and the result tables are shown in Table 1.

Comments are closed.