-Electronic Supplementary MaterialOptimizing surveillance for livestock disease spreading through animal movements Paolo Bajardi1,2 , Alain Barrat2,3 , Lara Savini4 , Vittoria Colizza5,6,7

1

Network aggregation and simulation procedure

The data describing bovines’ displacements are recorded on a daily basis, making it possible to construct different network representations according to different assumptions or aggregating on different time window lengths [1] [2]. In particular, we simulated the spread of an infectious disease on networks aggregated on a daily, weekly, monthly and yearly scale. The choice of the aggregating time window length ∆t affects the underlying mobility structure, leading to denser displacement networks for longer time windows, while the time step used in the numerical simulations of the spreading dynamics is kept fixed to 1 day. In this perspective, when the spreading process takes place on the daily dynamical networks, at every time step of the spreading the snapshot of the static network is different, while for a longer aggregating window length the network topology remains unchanged for exactly ∆t time steps. In Figure 2 of the main paper, the unfolding of the spreading for different aggregating time window lengths is followed by plotting the temporal evolution of the number of infected premises for every spreading time step (=1 day). In the following, if not otherwise specified, we present results corresponding to spreading phenomena simulated on networks aggregated on time windows of length ∆t = 1 day, starting at t0 =Jan 1, and with an infectious period µ−1 = 7 days.

2

Overlap values and clusters

The initial conditions similarity network (ICSN) is a fully connected network where all the information resides in the links weight corresponding to the overlap value of the invasion 1

Computational Epidemiology Laboratory, Institute for Scientific Interchange (ISI), Turin, Italy Centre de Physique Th´eorique, Aix-Marseille Univ, CNRS UMR 6207, Univ Sud Toulon Var, 13288 Marseille cedex 9, France 3 Data Science Laboratory, Institute for Scientific Interchange (ISI), Turin, Italy 4 Istituto Zooprofilattico Sperimentale Abruzzo-Molise G. Caporale, Teramo, Italy 5 INSERM, U707, Paris F-75012, France 6 UPMC Universit´e Paris 06, Facult´e de M´edecine Pierre et Marie Curie, UMR S 707, Paris F75012, France 7 Institute for Scientific Interchange (ISI), Turin, Italy 2

1

1

p(overlap)

0.1

0.01

0.001

0.0001 0

0.2

0.4

0.6

0.8

1

overlap

Figure 1: Distribution of the overlaps between invasion paths for the initial conditions similarity network resulting from the simulations of a deterministic spreading phenomenon starting at t0 =Jan 1, with an infectious period µ−1 = 7 days, and performed on a dynamic network corresponding to an aggregating time window length of ∆t = 1 day. paths generated by the different seeding nodes. Figure 1 shows the corresponding overlap distribution. Most nodes, when taken as seed of a spreading phenomenon, yield very short infection paths and very few infected nodes, so that the resulting overlap with most other paths is zero. Moreover, some particular sub-structures lead to an abundance of overlap values equal to 0.33: some nodes, such as the slaughterhouses, have a large in-degree and no out-links; two seeding nodes that are only linked (during their infectious period) to a slaughterhouse yield therefore two infection paths having each two nodes, among which one is common (the slaughterhouse), leading to an overlap of 1/3. In order to avoid taking into account such structures, we focus in our study on larger overlap values, and often restrict the computations to spreading paths containing at least 10 nodes. As described in the main text, considering in the ICSN only the links corresponding to an overlap larger than a given threshold Θth separates the ICSN into several connected components, each of whom defines a cluster of nodes. The schematic illustration of the clustering procedure is shown in Figure 3 of the main text. In each cluster, all nodes are however not a priori connected with each other: the fact that two nodes i and j belong to the same cluster simply means that there exists a set of other nodes i1 , ...ip such that the overlaps Θii1 , Θi1 i2 , Θip j are all larger than the threshold, but it does not imply that the overlap between i and j is larger than Θth . It is therefore important to measure the distribution of overlaps of all pairs of nodes inside a cluster. Figure 2 shows these distributions for several clusters constructed for a deterministic spreading starting at t0 =Jan 1, with infectious period µ−1 = 7days, and a threshold value Θth = 0.8: even if the distributions extend to values lower than the threshold, no small overlap values are observed. The clusters therefore group nodes yielding pairwise similar invasion paths. The clustered structures emerge by pruning links with weights smaller than a certain 2

10 10 10

-3

10

P(overlap)

0

0.2

0.4

0.6

0.8

1

cluster 3 (159 nodes)

-4

10

10

0

0

0.2

0.4

0.6

0.8

1

10

cluster 6 (125 nodes)

-1

-3

0

0.2

0.4

0.6

0.8

1

10

0.2

0.4

0.6

0.8

1

cluster 4 (150 nodes)

-3

10 10

-3

0

10

0.2

0.4

0.6

0.8

1

10

cluster 7 (109 nodes)

10

-3

0

0.2

0.4

0.6

0.8

1

10

0.4

0.6

0.8

1

0.6

0.8

1

0.6

0.8

1

cluster 5 (128 nodes)

-4 0

0

0.2

0.4

cluster 8 (79 nodes)

-1

-2

10

-4

0.2

-3

10

-2

0

-1

10

-1

0

-2

10 0

cluster 2 (205 nodes)

-4

10

-4

10

-4

0

-2

10 10

0

-1

10

-2

10

-5

0

-1

-2

10

10

-4

10

-3

10

10

10 cluster 1 (251 nodes)

-2

10

-3

0

-1

10

-1

10

10

0

-2

10

10

10

-4

10

10

10

10

10

10

10 cluster 0 (324 nodes)

-2

10 10

0

-1

-3

-4

0

0.2

0.4

overlap

Figure 2: Distribution of the overlaps between invasion paths rooted in nodes belonging to a given cluster. The clusters correspond to the invasion paths of a deterministic spreading phenomenon starting at t0 =Jan 1, with an infectious period µ−1 = 7 days, performed on a dynamic network corresponding to an aggregating time window length of ∆t = 1 day, and obtained with a threshold value of 0.8. The large values of the overlap between all pairs of nodes belonging to the same cluster indicates that all the nodes of a cluster lead to a similar spreading behavior, even if some pairs of nodes have an overlap smaller than the threshold. threshold in the initial condition similarity network. As the choice of the threshold is arbitrary, it is important to check the robustness of the obtained cluster structure with respect to changes in the threshold value. We investigate this point in Figure 3 by measuring the intersection of clusters obtained with different threshold values. For large enough threshold values, the structure of resulting clusters is stable with respect to small variations in the thresholding criterion. The results described in the main text are obtained by fixing the threshold value to 0.8. Since, in principle, there could be as many clusters as there are seeding nodes, in Figure 4 the cluster size distribution is shown. Most of the clusters are isolated nodes, i.e. seeding nodes leading to invasion paths with overlap smaller than 0.8 with any other invasion path, but some non-trivial clusters with large sizes clearly emerge.

3

Initial Conditions Similarity Network from directed paths

As described in the Methods section of the main text, the measure we adopted to weight the links in the Initial Conditions Similarity Network (ICSN) described above is based on the Jaccard index evaluated as the number of common infected nodes normalized by the total number of nodes reached by the infection paths of the two seeds. 3

Figure 3: Jaccard indices between clusters constructed using different threshold values. For each couple of threshold values a and b, we show at row a and column b a color-coded matrix of the Jaccard indexes between the two sets of 20 largest clusters of the ICSN obtained for the thresholds values a and b. The Jaccard indices of two clusters is computed as the number of common nodes divided by the number of nodes in the union of the clusters. The cases a = b are not shown as they trivially have a diagonal equal to 1 and zero off-diagonal elements. The violet-to-yellow color scale indicate how much the clusters obtained with different thresholds have in common. For threshold values larger than 0.6, the cluster structure is rather stable with respect to small changes in the thresholding criterion. Note that for each threshold value, the 20 largest clusters are ranked by size; as this ranking may change from one cluster value to the next, the yellow dots are not all on the diagonal of the corresponding matrix.

4

10

P(cluster size)

10 10

-1

-2

10 10

0

-3

-4

10

-5

10

0

10

1

10

2

10

3

cluster size

Figure 4: Cluster size distribution. The plot refers to the partition obtained imposing a threshold value of the Jaccard index of the invasion paths equal to 0.8. 98% of the clusters are represented by isolated nodes, but the probability distribution is fat tailed showing that a non-negligible number of clusters includes a large number of seeding nodes. The 20 largest clusters correspnd to the sizes on the right of the red line. It is also possible to consider an ICSN that takes into account the directedness and thus the causality of the paths of infection. To this aim, we define the modified overlap Θl12 between two �paths Γ1 and Γ2 , composed by the sets of directed links �l1 and �l2 , as the � � Jaccard index ||�ll1 � �ll2 || , measuring the number of common directed links with respect to the 1 2 total number of links in the two paths. In Figure 5, we compare the clusters obtained by weighting the links of the ICSN with the overlap of the nodes and with the overlap Θl of the directed links of the spreading paths. The partitions of the largest 20 clusters are very similar, but it is worth to notice that the construction of the ICSN using an overlap measure between paths based only on the intersection and union of the set of nodes composing the paths (as explained in the main text) relies on much less information and is therefore easier to achieve in real settings. We therefore use, both in the main text and in the rest of the ESM, the clusters obtained through the definition of the weight of an ICSN link as the overlap between the sets of nodes of the spreading paths.

4

Clusters features

As explained in the main text, the construction of the clusters is based on a criterion of similarity of invasion paths. In the following we investigate several other characteristics of the clusters. Figures 6, 7 and 8 show some properties of the spreading starting from nodes belonging to the various clusters: the distribution of sizes are typically peaked around a welldefined characteristic value for each cluster. The maximal geographic distance distributions are also narrow or present only a small number of peaks with respect to the number of nodes in the cluster. Figure 8 moreover highlights the very strong similarity between the prevalence 5

1

0.9

Node-based clusters

0.8 0.7 0.6 0.5 0.4 0.3 0

Link-based clusters

Figure 5: Jaccard indices between clusters constructed using two different definitions of the overlap of two spreading paths Γ1 and Γ2 , either based only on the nodes composing these paths (Θ12 , defined in the main text), or taking into account the directed links (Θl12 , defined in the ESM text). The Jaccard index of two clusters is computed as the number of common nodes divided by the number of nodes in the union of the clusters, and is shown through a violet-to-yellow color scale. 10 10 10

P(final size)

10 cluster 0 (324 nodes)

-1

-2

10

10

0

0

25

50

75 100 125 150 cluster 3 (159 nodes)

-1

-3

10 10

0

0

25

50

75 100 125 150

cluster 6 (125 nodes)

cluster 1 (251 nodes)

10 10

-3

10

0

0

25

75 100 125 150

50

cluster 4 (150 nodes)

10

-3

10

0

10

10

0

25

75 100 125 150

50

-3

-2

10

10

-3

0

25

50

75 100 125 150

0

0

25

50

cluster 5 (128 nodes)

-1

10

-3

10

0

0

25

50

cluster 7 (109 nodes)

-1

75 100 125 150

-2

75 100 125 150 cluster 8 (79 nodes)

10 10

cluster 2 (205 nodes)

-2

10

-2

0

-1

10

-1

10

-1

10

-2

10

10

0

-1

10

-2

10

10 10

-3

10

10

0

-1

-2

10

-3

0

25

75 100 125 150

50

10

-2

0

25

50

75 100 125 150

final size

Figure 6: Final size distributions of epidemics rooted in nodes of various clusters. 6

10

10

P(max dist)

10

-3

200

10

400

600

800

cluster 3 (159 nodes)

-1

-3

200

10

400

600

800

-1

-2

10

-3

200

400

600

800

10

200

-1

400

600

800

cluster 4 (150 nodes)

200

400

600

800

cluster 7 (109 nodes)

-1

-2

10 1000

10 1000

10

-3

200

400

-3

200

600

800

400

600

800

1000

600

800

1000

cluster 5 (128 nodes)

-1

-2

10 1000

-3

200

10

10

cluster 2 (205 nodes)

-1

-2

10

-3

10

10

10

-2

10 1000

cluster 6 (125 nodes)

cluster 1 (251 nodes)

-3

10

10

-1

-2

10 1000

-2

10

10

10

-2

10

10

cluster 0 (324 nodes)

-1

400

cluster 8 (79 nodes)

-1

-2

10 1000

-3

200

400

600

800

1000

maximum distance

Figure 7: Distribution of the maximum geographical distance (in km) covered by the invasion paths rooted in nodes of various clusters. The long-range disease infections are a major concern in facing an emerging infectious disease. The existence of trade connections spanning several hundreds of kilometers makes possible a rapid and wide spread of the disease. curves (shape, peak time, duration) of spreading phenomena starting from different nodes of a given cluster.

7

# of infected premises

150

150 cluster 0 (324 nodes)

100

100

50 0

0

10

20

30

40

cluster 3 (159 nodes)

100

10

20

30

40

# of infected premises

150 cluster 6 (125 nodes)

100

0

10

20

30

40

cluster 9 (70 nodes)

cluster 4 (150 nodes)

10

20

30

40

cluster 7 (109 nodes)

100

0 50 0 150

50

50 0

10

20

30

cluster 12 (57 nodes)

100

0

10

20

30

40

150 cluster 15 (50 nodes)

100

0 50 0 150

10

20 30 time

10

20

30

40

cluster 10 (61 nodes)

10

20

30

40

cluster 13 (51 nodes)

50

30

40

50

10

20

30

40

50

cluster 11 (59 nodes)

0 50 0 150

50

0 50 0 150

0

20

100

50

50

40

cluster 8 (79 nodes)

0 50 0 150

100

10

20

30

40

cluster 16 (49 nodes)

100

40

10

100

10

20

30

40

50

30

40

50

cluster 14 (51 nodes)

0 50 0 150

10

20

cluster 17 (46 nodes)

100

50 0

30

cluster 5 (128 nodes)

100

100

50

20

50

0 40 0 150

50

10

50

0 50 0 150 100

150

# of infected premises

40

50

0 50 0 150

100

0

30

50

150

0

20

100

50

0

10

50 0

cluster 2 (205 nodes)

100 50

0 50 0 150

50

0

150

50

150

0

cluster 1 (251 nodes)

50 0

10

20 30 time

40

50

0

0

10

20 30 time

40

50

Figure 8: Prevalence curves of spreading phenomena rooted in nodes belonging to various clusters. This figure is similar to Figure 6(a) of the main text: here, the prevalence curves for nodes belonging to the 18 largest clusters are shown.

8

5

Longitudinal stability of the seeds’ clusters

In the main text, the longitudinal stability of the clusters obtained by considering different starting times t0 of the spreading has been evaluated using an entropy-like function. In the following, we first present additional results on the clusters’ stability measured through this entropy-like function, and then consider further methods to assess the temporal stability of clusters, in order to better characterize their evolution and to verify that such stability is not measure-dependent.

5.1

Entropy-like stability

To supplement the examples shown in Figure 6b of the main text, in Figure 9 we show the results of the cluster’s stability for the other clusters, grouped according to their temporal behavior.

5.2

Sensitivity analysis

The entropy function introduced in Equation 1 of the main text evaluates the level of fragmentation of the clusters when different starting dates for the epidemic spreading simulations are chosen. Since most of the clusters have small sizes (see Figure 4), we restricted our analysis to the largest C = 20 clusters, and we normalize therefore the entropy function accordingly. In Figure 10, we report a sensitivity analysis with respect to considering different values of C. The results are shown not to be sensitive to the precise value of C. When all the clusters are considered however, the entropy function is shifted towards lower values because of the large number of clusters composed by isolated nodes: the large value of C increases therefore strongly the normalizing factor log C. Using this normalization would therefore artificially enhance the apparent stability of the clusters.

5.3

Rand index

In order to compare two partitions of the seed nodes into clusters corresponding to two different seeding dates, we use the Rand index [3], which quantifies to which extent nodes that were clustered together in the first partition are still together in the second one, and, analogously, nodes that belonged to different clusters in the first partition are still classified separately in the second. More precisely, given two partitions P(t0 ) : {C10 , C20 , ..., Cr0 } and P(t1 ) : {C11 , C21 , ..., Cq1 }, the Rand index of P(t0 ) and P(t1 ) is defined by R = t+s where t is the number of pairs of (n2 ) nodes belonging to the same cluster in both P(t0 ) and P(t1 ), s is the number of pairs of nodes classified in different clusters both in P(t0 ) and P(t1 ) and n is the number of nodes present in both partitions. In Figure 11 the Rand index of the partition P(t0 ) with the partitions obtained for different starting times t1 are shown. If we consider all the clusters, we obtain extremely high values for this index. The reason lies in the very large number of clusters with size ≤ 2, shown in Figure 4: most isolated nodes in one partition remain isolated in the successive ones, leading to very high values of 9

1

0.8

0

84

70

168 252 168 252

84

cluster 12

t1

168 252

84

168 252

t1

70

56

28

42

0

14

84

70

28

56

t1

168 252

t1

42

0

0

0

14

0.2

0

168 252

0.4

0.2

84

0.4

0.2 70

0.4

0.2 56

0.4

42

0.6

28

0.6

14

0.6

84

cluster 8

0.6

1

70

1

0.8

70

cluster 15

56

0.8

28

1 cluster 10

42

1 0.8

56

168 252

84

70

t1

0

t1

56

28

0

0

14

168 252

84

70

56

28

0.2 42

0.4

0.2 14

0.4

56

42

28 28

168 252

84

70

56

28

42

H min{H} H-min{H}

cluster 9

0.8 0.6

42

cluster 11

0

42

168 252

84

70

56

28

42

0 0

168 252

84

70

t1

1

0.8

0

0

cluster 17

t1

0.6

0

56

28

42

0

14

168 252

84

70

0

56

0.2

0

28

0.4

0.2 42

0.4

0.2 0

0.4

14

0.6

1

t1

0.8

cluster 16

0.6

C

14

168 252

84

70

1

0.6

0

cluster 7

t1

14

0.8

cluster 14

56

28

42

0

14

168 252

0

84

0

70

0.2

0

56

0.4

0.2

28

0.4

0.2 42

0.4

0.2 0

0.4

14

0.6

1

0

0.8

0.6

t1

14

84

168 252

70

56

28

42

0

84

168 252

70

14

1 cluster 4

0.8

0.6

t1

cluster 6

t1

1 cluster 3

0.8

t1

entropy

0

0.6

0.8

entropy

t1

1 cluster 1

1

cluster 13

0.8

entropy

56

28

42

t1

1 0.8

entropy

0

0

14

0

84

0.2

0

168 252

0.4

0.2

70

0.4

0.2 56

0.4

0.2 28

0.4

42

0.6

0

0.6

14

entropy

0.6

B

D

1 0.8

cluster 5

0.6

t1

entropy

1 0.8

cluster 2

0

1 0.8

cluster 0

14

1 0.8

14

A

0.6 0.4

84

168 252

t1

70

56

42

28

0

0

14

0.2

Figure 9: Entropy H of the partition into clusters as a function of time, for the same clusters as in Figure 8. H measures the fragmentation of the largest C = 20 clusters obtained for the starting time t0 = January 1st in the partitions obtained in the following weeks. The difference H − minH (grey bars) represents the robustness of the cluster (the smaller the difference and the more robust is the cluster), given that only part of it may be present in the partition obtained for a later starting condition (as measured by minH ). Beyond the four typical behaviors described in the text, a hybrid evolution may also emerge: for instance, cluster 6 (the last in the first row of the figure) remains quite stable over the whole year.

10

H

H-minH

minH

Top C=10 clusters

Top C=20 clusters

Top C=30 clusters

All clusters

1

entropy

cluster 0

0.8 0.6 0.4 0.2 0 1

entropy

cluster 1

0.8 0.6 0.4 0.2 0 1

entropy

cluster 9

0.8 0.6 0.4 0.2 0 1

entropy

cluster 8

0.8 0.6 0.4

84

168 252

t1

70

56

42

28

14

84

168 252 0

t1

70

56

42

28

14

84

168 252 0

t1

70

56

42

28

14

84

168 252 0

t1

70

56

42

28

0

0

14

0.2

Figure 10: Longitudinal stability sensitivity analysis. The longitudinal stability of the clusters presented in Figure 6b of the main text is assessed by considering different values of C, representing the number of largest clusters considered for the evaluation of the normalized entropy. The results are robust and do not depend on the choice of the parameter C. When all the clusters are considered, the entropy is biased towards lower values because of the large number of small clusters. As in Figure 9 the entropy H, the minimum entropy minH and the difference H − minH are shown.

11

1 0.9 0.8

Rand Index

0.7 0.6 0.5

all clusters clusters’ size >1 clusters’ size >2 20 largest clusters

0.4 0.3 0.2 0.1 0

0

14

28

42

56

70

84

168 252

t1

Figure 11: Rand index of the partition obtained for the starting time t0 =Jan 1 with the partitions obtained for different starting times t1 of the spreading simulations. the Rand index. We therefore consider several modified versions of the Rand index, which take into account only a subset of clusters: respectively all clusters of size at least 2, all clusters of size at least 3, and the 20 largest clusters. In all cases, the large values of the Rand index indicate that the partitions obtained for different starting times remain fairly stable.

5.4

Best match

The Rand index is a global measure of the stability of the partition of the nodes into clusters, and does not give access to the individual stability of the different clusters. In order to go beyond this evaluation of the overall stability of the partitions and to obtain an alternative measure with respect to the entropy-like function studied above and in the main text, we can also investigate the longitudinal stability of the single clusters. To this aim, we compute the Jaccard index of each cluster of the partition obtained for a spreading starting time t0 , with all the other clusters corresponding to the partition obtained for starting time t1 . In Figure 12 the Jaccard indices of the largest C = 20 clusters obtained with spreading simulations starting at t0 = 0 with the largest 20 clusters obtained at subsequent starting times t1 are shown. For each value of t1 and for each cluster of the partition corresponding to t0 , it is then possible to find the best matching cluster in P(t1 ) by focusing on the largest Jaccard index. Figure 13 shows for each of the 18 largest clusters of P(t0 ) the evolution of this largest Jaccard index with t1 . We also note that the Jaccard index can either be evaluated as the crude ratio of the intersection over the union of the nodes in the two clusters, or can be computed by considering only the nodes present in both partitions (i.e. the ratio of the intersection and the union 12

1

0.8

0.6

0.4

0.2

0

...C20 C1 ...

t0=0

Jaccard index:

C1 ...

...C20

t1=0

C1 ...

t1=7

...C20

t1=14

C1 ...

t1=21

...C20

t1=28

C1 ...

t1=35

...C20

t1=42

t1=49

Figure 12: Jaccard indices between clusters obtained for different starting times of the spreading. The Jaccard index of two clusters is computed as the number of common nodes divided by the number of nodes in the union of the clusters, and is shown with a violet-toyellow color scale. of the nodes in the two clusters, given that the nodes are present in both P(t0 ) and P(t1 )). Both cases lead to similar results and are shown in Figure 13.

13

70

84

168 252

84

168 252

56

70

42

28

56

28

42

168 252

84

70

56

28

42

0

t1

t1

1

168 252

84

70

t1

1

1 0.8

t1

1

t1

0

cluster 12

168 252

84

70

28

42

0

56

t1

168 252

t1

14

168 252

84

70

56

28

0

42

0

0

0.2

0

14

0.4

0.2

168 252

0.4

0.2 84

0.4

0.2 70

0.4

56

0.6

42

0.6

28

0.6

14

0.6

84

cluster 8

70

0.8

56

cluster 15

28

0.8

42

1 cluster 10

0

1 0.8

14

t1

56

28

0

0

best match (conditional) best match (crude)

14

168 252

0.2 84

0.2 70

0.4

56

0.4

42

0.6

28

0.6

14

cluster 9

0.8

42

cluster 11

0.8

0

0

84

70

56

28

42

168 252

t1

cluster 17

14

0

168 252

84

70

56

28

42

0

14

168 252

84

0

70

0.2

0

56

0.4

0.2 42

0.4

0.2 28

0.4

14

0.6

0

0

0.8

cluster 16

0.6

0

0

cluster 7

1

0.8

cluster 14

14

t1

1

0.8

0

84

168 252

70

56

28

42

0

14

84

168 252 168 252

84

70

56

28

42

0

t1

0.6

0

Jaccard index

14

0

168 252

0

84

0.2

0

70

0.4

0.2

56

0.4

0.2 42

0.4

0.2 28

0.4

14

Jaccard index

0.6

0

0.6

1 Jaccard index

0.8

0.6

t1

t1

cluster 13

0.8 0.6 0.4

168 252

84

70

42

28

14

0

0

0.2 56

Jaccard index

1 cluster 4

0.8

0.6

1

Jaccard index

0

t1

1 cluster 3

0.8

cluster 6

t1

1 cluster 1

0.8

t1

D

70

t1

1

C

56

28

0

42

0

0

0.2

0

84

0.4

0.2

168 252

0.4

0.2 70

0.4

0.2 56

0.4

28

0.6

42

0.6

14

0.6

t1

B

1 0.8

cluster 5

0.6

0

Jaccard index

1 0.8

cluster 2

14

1 0.8

cluster 0

14

1 0.8

14

A

t1

Figure 13: Best matching cluster. The highest value of the Jaccard indices evaluated between the clusters obtained at t0 and those obtained for subsequent starting times are shown. The crude Jaccard index considers the ratio of the intersection and the union of the nodes of the two clusters, while in the conditional Jaccard index only the nodes that are present at both t0 and t1 are considered.

14

ξs

0.50 0.45 0.40 0.35

30 42 32 15 9

40 40 31 14 8

50 37 28 11 6

ns 60 70 34 31 25 22 10 9 6 5

80 29 20 7 4

90 25 18 5 4

100 25 18 5 4

Table 1: Deterministic simulations. Number of sentinels for different thresholds in the n − ξ space.

6

Sentinels

As described in the main text, the definition of sentinel nodes is flexible. In particular, depending on the surveillance system and on the resources availability it is possible to be more or less conservative in the choice of ξs and ns . It is also possible to tune these values in order to restrict to nodes that provide a better characterization of the epidemic origin (low ξs ) and that have a higher probability of being reached by an outbreak (high ns ). In table 1 the number of sentinel nodes that should be monitored for different choices of ξs and ns are shown.

6.1

Normalizing factor

In the main text we defined the seeder uncertainty as an entropy-like function normalized with nk . The presence of clusters of small sizes (isolated nodes or pairs of nodes) raises the total number of clusters, so that the condition nk < C is always satisfied in our simulations. In this context the most homogeneous infection patterns for the node k is represented by the probability 1/nk of being infected by nk different clusters. In the following we show the results by normalizing the seeder uncertainty with log(C). The new normalization leads to very small values of the seeder uncertainty as induced by the large number of clusters C, but results are not overall affected by such rescaling. In Figure 14b we consider the 15 sentinels identified in the main text with the normalization 1/nk , and plot their trajectories in the plane uncertainty vs. n, where the uncertainty is now calculated with the new normalization log(C) of the entropy function. The sentinels trajectories span a rather compact space demonstrating that changes in the normalizing factor do not alter our sentinel identification method, and just correspond to a rescaling of the threshold imposed for the new definition of the uncertainty.

7

Stochastic case

The high dimensionality of the phase space of the initial conditions has led us to prefer a deterministic computational approach for the presentation of the proposed methodology and main results. We show however in the following how our approach can be extended to stochastic simulations that take into account the intrinsic stochasticity of epidemic propagation phenomena. Results similar to the deterministic case are recovered. In the stochastic 15

(a) 1

(b)

all premises all premises with n > 10

0.00025

seeder uncertainty (j)

0.6

c

P (j)

0.8

0.4 0.2 0 1e-05

0.0001

0.001 0.01 seeder uncertainty (j)

0.1

t0=January 1st t0=January 1st & 7 following weeks

0.0002 0.00015 0.0001 5e-05 0 1 10

1

10

2

n

10

3

10

4

Figure 14: Sentinel premises identification by normalizing the seeder uncertainty with log(C). (a) Cumulative probability distribution of the uncertainty of a given premises in the identification of the seeding cluster. Slaughterhouses are discarded from the analysis, as they cannot spread the disease further to other farms. Note that the x-axis is in the log-scale and more than the 99% of the premises have a seeder uncertainty below 0.001, as induced by the new normalization factor. (b) For a set of initial conditions (x0,t0),with t0=January 1st, each infected holding is represented by a dot in the n-uncertainty phase space, with n being the number of times the farm is reached by an infection. Note that the y-axis is compressed within a small range of uncertainty values. The trajectories of the same 15 sentinels identified in Figure 7 of the main text are shown. simulations, each spreading event is a probabilistic process which occurs with a probability of infection βdt per time interval dt. We keep for simplicity a deterministic recovery process. Moreover, while the weights of the links are irrelevant in the case of a deterministic spreading, they need to be taken into account in a stochastic modeling. In this context, two H definitions of weights can be used. We denote by wij the number of herds of cattle and by B wij the number of bovines displaced from i to j in the time window ∆t. At most one herd H is displaced each day, so that wij is at most equal to ∆t. Each weight definition leads to a different definition of the probability of infection of a susceptible node by a neighboring infectious node, with different underlying assumptions. A first possibility is to define the H rate of infection as P (Si + Ij → Ii + Ij ) = β ∗ (wij )/∆t: for ∆t = 1, this means that an infectious node infects a neighboring node to whom it sends a herd with probability β. On the other hand, one can assume that the spreading power is proportional to the number of B displaced bovines during each time window. Since the weights wij are broadly distributed [2], we choose to model the probability of infection by a function that saturates to 1 at B large values of the weight, namely P � (Si + Ij → Ii + Ij ) = 1 − exp(−β � wij /∆t) . We use the finest time scale ∆t = 1 day for the stochastic simulations. In this case, the number of herds displaced between two nodes at each time step is either 0 or 1. In order to compare the two possible assumptions underlying the stochastic simulations, we use values of β and β � such that the probabilities P and P � are equal on average. This condition is satisfied if B β � = ln(1 − β)/ < wij >. We explore two values of β: high transmission rate (β = 0.9, � corresponding to β = 0.63) and intermediate transmission rate (β = 0.5, corresponding to

16

A transmission rate ` =0.9

prevalence

80

80

60

80

60

213 nodes

60

211 nodes

40

40

40

20

20

20

0

0

10

20 30 time (days)

0

40

0

10

20 30 time (days)

0

40

181 nodes

0

50

prevalence

50

40

213 nodes

40

125 nodes

30

30

20

20

20

10

10

10

0

0

10

20 30 time (days)

40

50

30

0

20 30 time (days)

50% confidence interval median

B transmission rate ` =0.5

40

10

40

0

10

20 30 time (days)

40

0

175 nodes

0

10

20 30 time (days)

40

Figure 15: Prevalence curves of stochastic spreading starting from nodes belonging to various clusters. For each seed 500 stochastic simulations have been performed and the medians (black curves) and 50% confidence intervals are shown. In order to give an estimate of the fluctuations we evaluated the 50% confidence interval for each curve and we plot the maximum of the upper bounds and the minimum of the lower bounds (brown shaded area). Higher transmission rate (A) leads to lower fluctuations, while for the intermediate transmission rate (B), the 50% confidence intervals are quite broad. It is worth to stress that hundreds of stochastic curves with different seeding nodes are compared, so that fluctuations are expected; nevertheless the clustering procedure is still able to capture the similarity in the overall behavior (shape, peak time, duration) of the prevalence curves. β � = 0.19). Once the transmission probabilities are defined and the transmission rate fixed, each stochastic simulation produces an invasion path. The union of all the paths yields a risk probability associated to each node, given by the fraction of runs in which the node has been infected. We generalize the construction of the ICSN and of the clusters defined in the main text for deterministic spreading as follows. For each seed x, we define a vector r whose element ri is given by the fraction of runs starting in x for which i was infected, and the set ν of nodes i such that ri > 0 (i.e., the set of nodes which were reached at least once by a spreading issued from x). For each pair of seeds x1 and x2 , we build in this way the two �vectors r1 and r2 and�the sets of nodes ν1 and ν2 , and we consider the similarity Ω12 = i (1 − |r1,i − r2,i |2 )/|ν1 ν2 |. In the case of a deterministic spreading, we recover the definition of the overlap Θ as the elements of the vectors ri are 0 or 1. The cluster detection method described in the main text can now be applied using this measure by retaining in the ICSN only the links with similarity Ω larger than a certain threshold. We show in Figure 15 the prevalence curves for seeds belonging to various clusters. As in the deterministic case, spreading phenomena originated in nodes of the same cluster show 17

stochastic B (mild infectivity)

1 stochastic H (mild infectivity)

0.8

stochastic B (high infectivity)

0.6

0.4

stochastic H (high infectivity)

0.2 deterministic

ter

de

tic nis mi

ic ic ic ic H ) ) ) ) ast tivity ast tivity ast tivity ast tivity ch ch ch ch sto infec sto infec sto infec sto infec h h d d il il g g (m (m (hi (hi B

H

0

B

Figure 16: Jaccard indices of the 20 largest clusters obtained with different simulation procedures. We refer with stochasticH and stochasticB to simulations with transmission probH B abilities respectively determined by wij and wij . Each point represents the intersection of clusters obtained with different transmission probability definitions and transmission rates and is color-coded, on a violet-to-yellow color scale, according to the value of the Jaccard index computed as the number of common nodes divided by the number of nodes in the union of the clusters.

18

ξs

0.45 0.40 0.35 0.30

30 39 27 15 4

40 34 24 13 4

50 32 22 12 4

ns 60 70 27 24 18 16 10 8 2 2

80 21 14 6 1

90 20 13 5 1

100 16 10 3 0

Table 2: Stochastic simulations. Number of sentinels for different thresholds in the n − ξ space. very similar temporal patterns. The cluster detection procedure leads thus to an efficient grouping of the potential seeds of an epidemics not only for deterministic spreading but also for more realistic stochastic simulations. We also evaluate and show in Figure 16 the overlap between the clusters obtained with the various types of stochastic simulations and the deterministic ones. Strikingly, many clusters are very stable when the type of simulation and the infectivity parameter are changed. Once the nodes are grouped in different clusters, the sentinel identification is a rather straightforward procedure. Using the same definition of seeder uncertainty described in the main paper, it is thus possible to identify sentinel nodes starting from stochastic disease spreading simulations. It is worth to notice that for each seeding node we simulate 100 stochastic runs, so that the number of times nk that a node k has been reached by the disease is potentially much larger than in the deterministic case. In order to make comparable the stochastic and the deterministic scenarios we rescale n in Figure 17 and table 2 of a factor 100. In Figure 17 we show the results obtained from stochastic simulations, similarly to Figure 7-8 of the main text. Surprisingly, the stochastic simulations lead to a lower uncertainty than the deterministic case. This unintuitive behavior can be linked to the fact that including some heterogeneities due to the links’ weight the less probable invasion paths contribute very little in the seeder uncertainty evaluation, naturally reducing the noise of the measure. Since the choice of ξs and ns are arbitrary, we report in table 2 the number of sentinels identified with different values of these parameters.

References [1] Vernon M C, Keeling M J (2009) Representing the UKs cattle herd as static and dynamic networks. Proc R Soc B 276: 469-476 [2] Bajardi P, Barrat A, Natale F, Savini L, Colizza V (2011) Dynamical patterns of cattle trade movements. PLoS ONE 6(5): e19869. [3] Rand W M (1971) Objective criteria for the evaluation of clustering methods. JASA 66 (336): 846850.

19

B 1

0.8

0.8

seeder  uncertainty  (j)

1

0.6 0.4 0.2 0

0

0.1 0.2 0.3 0.4

0.5 0.6 0.7 0.8 0.9

seeder  uncertainty  (j)

1

C

farm  reached  in  the  first  week   (slaugtherhouse  excluded) farm  trajectory  along  8  weeks

fraction  of  detected  outbreaks

all  farms  (slaugtherhouse  excluded)                    n  >  10 (slaugtherhouse  excluded)

c

P (j)

A

0.6 0.4 0.2 0 0 10

2

1

10 n

10

3

10

4

10

27sentinels  (n• j” 15  sentinels  (n•j”

1 0.8 0.6 0.4 0.2 0

0

10

20 30 40 minimum  outbreak  size

Figure 17: Identification of the initial conditions, and corresponding uncertainty, for the stochastic simulations. (A) Cumulative probability distribution of the uncertainty ξ in the identification of the seeding cluster, once a given node of the network has been detected as infected. Slaughterhouses are discarded from the analysis, as they cannot spread the disease further to other farms (they are the end points of the livestock movements and gather bovines from different sources). (B) For a set of initial conditions, each infected farm is represented by a dot in the n − ξ phase space, with n being the number of times the farm is reached by an infection, and ξ the uncertainty in the identification of the corresponding seeding cluster. Eight consecutive weeks starting from January 1st are considered as temporal initial conditions. Sentinel nodes are defined as the farms that are often reached by epidemics (i.e., n > ns ) and that have a low degree of uncertainty in the identification of the seeding cluster that led to the outbreak (i.e., ξ < ξs ). The plot shows the trajectories in the n − ξ phase space of the 15 sentinels obtained by imposing ns = 30 and ξs = 0.4. (C) Fraction of detected outbreaks as a function of the minimum outbreak size of the epidemic, where an outbreak is considered detected if at least one of the sentinels has been reached by the infection. Two sets of sentinel farms are considered, of 15 and 27 sentinels, corresponding respectively to (ns = 30, ξs = 0.35) and (ns = 30, ξs = 0.40).

20

50

Electronic Supplementary Material- Optimizing ...

3Data Science Laboratory, Institute for Scientific Interchange (ISI), Turin, Italy. 4Istituto ..... The entropy function introduced in Equation 1 of the main text evaluates the level of frag- ..... In the case of a deterministic spreading, we recover the.

5MB Sizes 1 Downloads 228 Views

Recommend Documents

Electronic supplementary material
Jun 22, 2009 - ... of two Hill functions (a-b) with sufficiently distant midpoints is equivalent to the Hill function with the largest midpoint (c). Namely: Hill(x, K 1) · Hill(x, K 2) ≈ Hill(x, K 2) where K 1

Supplementary Material
Jan Heufer ∗. *TU Dortmund University, Department of Economics and Social Science, 44221 Dortmund,. Germany. .... 3 More Details on the Data Analysis.

Electronic Supplementary Material S1 Food sharing in ...
Food sharing in vampire bats: reciprocal help predicts donations more than .... Kalinowski, S. T., Wagner, A. P. & Taper, M. L. 2006 ML-RELATE: a computer ...

Supplementary Material
gaze fixation point is detected by an eye tracker. ∗indicates equal contribution. (a) Random ... 1) confirms quanti- tatively our intuition, that the location of the hands are in- deed important for learning about hand-object interactions. Based on

Supplementary Material - Arkivoc
General Papers. ARKIVOC 2015 (vii) S1-S13. Page S3. ©ARKAT-USA, Inc. δ /ppm. 1. H Assignment. 8.60 (brs, 1H). 5.35 (t, J = 7.3 Hz, 1H). 5.08 (t, J = 6.1 Hz, ...

Supplementary Material
By the definition of conjunctive pattern, since S does not satisfy pc, then S ... Then there exists at least one instance of p o in S: {I1,I2,...,Im},. I1 ∈ I(p c. 1 ,S),...,Im ∈ I(p c m,S), such that ∀t1 ∈. I1,..., ∀tm ∈ Im,t1 < t2 < ...

Supplementary Material
and Business Cycles Conference, the Bank of Canada 2014 Fellowship ..... CGH exhibited an upward trend in the frequency of sales that could explain why they find ...... 12Fox and Sayed (2016) use the IRI scanner dataset to document the ...

Supplementary Material
The data are provided a for 10 subsectors, which I aggregate up to three sectors as follows. ... CAN Canada ...... Structural change in an open economy. Journal ...

Supplementary Material
SVK Slovakia. CYP Cyprus. ITA. Italy. SVN Slovenia. CZE Czech Republic JPN Japan. SWE Sweden. DEU Germany. KOR South Korea. TUR Turkey. DNK Denmark. LTU Lithuania. TWN Taiwan. ESP Spain. LUX Luxembourg. USA United States. EST Estonia. LVA Latvia. RoW

Efficient Repeated Implementation: Supplementary Material
strategy bi except that at (ht,θt) it reports z + 1. Note from the definition of mechanism g∗ and the transition rules of R∗ that such a deviation at (ht,θt) does not ...

Supplementary Material for
Aug 3, 2016 - alternatives are “autocracy and risk-sharing” and “democracy and investment.” In other words, the .... whether seizing power. A coup is .... Our main sources are Galasso (1976), Stearns (2001), and Ortu (2005). References.

Supplementary Online Material
Branstetter, M.G. (2009) The ant genus Stenamma Westwood (Hymenoptera: Formicidae) redefined, with a description of a new genus Propodilobus. Zootaxa,.

Supplementary Material for
Fujitsu Research & Development Center Co., Ltd, Beijing, China. {wangzhengxiang,rjliu}@cn.fujitsu.com. Abstract. This supplementary material provides the ...

Supplementary Material - HEC Montréal
... the ONS website: http://www.ons.gov.uk/ons/datasets-and-tables/index.html. .... sitivity of sales to the unemployment rate for independent stores and chains.

Supplementary Material for
Sep 25, 2015 - Using archived and field-collected worker bumble bees, we tested for temporal changes in the tongue ... (Niwot Ridge Long Term Ecological Research Site, 40°3.567'N, 105°37.000'W), and by. P. A. Byron (17) ... these transects did not

Supplementary Material for ``Observations on ...
Nov 19, 2017 - C.2 Steady State in a Perturbed Environment. In this subsection we formally adapt the definitions of a consistent signal profile and of a steady state to perturbed environments. Let f((1−ϵ)·σ+ϵ·λ) : OS → OS be the mapping bet

Supplementary Material Methods Tables
αF was added (25nM final) to exponentially growing MATa bar1Δ cells in YPD (OD600 ≈ 0.3) and the culture incubated for two hours at. 30oC. >95% G1 arrest, as indicated by cells with small buds (schmoo formation), was confirmed by microscopy. Cell

Supplementary Material: Adaptive Consensus ADMM ...
inequalities together. We conclude. (Bvk+1 − Bvk)T (λk+1 − λk) ≥ 0. (S3). 1.2. Proof of ..... matrix Di ∈ Rni×d with ni samples and d features using a standard ...

Price Selection – Supplementary Material
†Email: Departamento de Economia, Pontifıcia Universidade Católica do Rio de .... and-tables/index.html, the Statistics Canada's Consumer Price Research ...

Price Selection – Supplementary Material
trade and search frictions in the goods market. An equilibrium pins down a unique relative price distribution G(p−1 | S−1) but does not pin down price changes. This distribution is invariant to monetary shocks. Hence there is full monetary neutra

SUPPLEMENTARY MATERIAL FOR “WEAK MONOTONICITY ...
This representation is convenient for domains with complete orders. 1 .... check dominant-strategy implementability of many classical social choice rules. In.

SUPPLEMENTARY MATERIAL FOR “WEAK MONOTONICITY ...
This representation is convenient for domains with complete orders. 1 ... v = (0,v2,0), v2 > 0, would want to deviate and misreport their type so as to get 3.

1 Supplementary online material: additional results ...
weakly higher winning frequencies for A and C in comparison to D. The corresponding tests reject the .... tives over the course of the experiment for PV elections.

Supplementary material for “Complementary inputs and ...
Aug 2, 2017 - statement of Theorem 2, and then complete the proof. ...... Jones, 1984), spaces of square-integrable functions (Harrison and Kreps, 1979; ...