K-CORE TRACING APP 

A DIGITAL CONTACT TRACER FOR COVID-19

Our study

We developed an optimized Digital Contact Tracing (DCT) protocol using network theory to find the minimal number of infected contacts and their secondary contacts to halt the epidemic spreading with minimal social disruptions. The model is tested and calibrated during the ongoing Covid-19 pandemic in the city of Fortaleza, Brazil by using real-time data on individuals geolocalization provided by mobile applications and epidemiological information from government authorities.

 

We monitor the giant connected component finding that the most optimal strategy is to directly quarantine the maximum kcore of a two-layer contact network seeded by the infected people. We implement this optimized strategy by deploying a contact-tracing App in partnership with the Government of Ceará

Collection and elimination of data

We collect the GPS data for only the past 14 days deleting it after that period ensuring its use only for the app purpose. The data is encrypted and no relation is made between the real person and the data source keeping the process anonymous The novelty of the contact tracing algorithm relies on the estimation of the probability of infection considering a correlated time and space component. 

Key Concepts

Contact Area

 

As the database is labeled in two groups, sources (infected users) and targets (healthy users), a contact is defined as an interaction produced between a source and a target fulfilling specific conditions. Each time stamp the contact area is defined as a circle centred in the position of the source with a radius R. From each timestamp we gather all the targets that are within the circle in a T minutes forward time window. Once all the targets are detected, we compute for them and the infected source the average position and the interval of time they have been within the contact area: 

Where d[n] is the euclidean distance between the source and target average positions of the data points within the contact area in the next T minutes from the time stamp and R is the radius of the contact area.

 

Contact probability of infection

The time component depends on the overlapped amount of time source and target spent within the contact area. The time component probability is proportionally related to the amount of time target and source coexist within the contact area in the T minutes time window. 

The probability leans to 1 when the overlapped time is equivalent to the window size and to 0 when there is no overlapped time:

where τ(∆ts,∆tt)[n] is the overlapped time in the time stamp n that depends on the ∆t for the source (∆ts) and the target (∆tt),tsf and tsl are the first an last time data for∆ts respectively,t tf and t tl are the first an last time data for ∆tt respectively and T is the time window size. Note that if the target only has one data point within the circle in the T minutes window the ∆t would be 0 and therefore also the pt. We need at least two points within the circle to have a probability of infection due time. This condition also applies to the source. If the infected user has only one data point there is no time period he remained within the circle so we omit that contact area as the overlapped time with the targets goes to 0.

We obtain the probability of infection for each time stamp:

Finally, we use a recursive expression to obtain a unique probability of infection for eachcontact between source and target:

wherePi[n] is the total probability of infection for a contact between a specific target and source until time stamp n,Pi[n−1] is the total probability of infection in the previous state and pi[n] is the probability of infection for this contact in the current time stamp n.

Our study uses three complementary data sets from different sources:

  • The first data set is provided by GranData Inc., an IT company in the telecommunication service industry. The data set includes anonymized GPS geolocation data from an application software that allows to track the trajectories of a set of citizens in LATAM. The Data collected identifies each phone device with a unique phone ID and specifies its latitude and longitude location through time (UTC time). As complementary geolocation data it also registers the country information and the Geohash with a 12 digits precision.

  • The second data set is obtained from the Health Department of Fortaleza. They provided an anonymized list of COVID-19 patients and their residential addresses geolocation. Thelist specifies the gender and age of each patient and classifies them according to the disease evolution, ranging their state in healed, stable, hospitalised (but not ITU), severe (in ITU)and deceased. As additional health information it also gathers the date the patient got the blood test for the SARS-COV-2 detection, crucial to estimate the time period the patient is contagious. As mentioned, the Health Department provides the latitude and longitude of the patients residential address. With this information, we are able to correlate this data with the GranData data set detecting the infected patients phone IDs and tracing their location during the period they were highly contagious.

  • The third data set is built with a mobile application developed in collaboration with the Fortaleza government. This application is referred here as the ’App’. The app collects GPS geolocation and Blue-tooth contacts from the users specifying the latitude and longitude for each user through time. This data will be used as input to the contact tracking algorithm and to personalize the results for the users. The app also gathers health information about the users current health state generating a list of users infected with COVID-19. This will  allow the users or other official institutions to update their current condition. The Government and health services will request the users to follow these indications so the database is accurate and the results are reliable.

Project Timeline

  • March  26 | First talks between Hernan and Matias from Grandata about sharing mobile data for contact tracing across LATAM.         

                            

  • March 27 | First Skype meeting present: Hernan, Matteo, Higor, Saulo.   

                                              

  • April 1 |  Matias confirms availability of GPS dataset for all LATAM. Shaojun joins the team.

  • April 3 |  Data from Grandata started to be collected and treated.

  • April 4 |  We filtered the geolocalized data by a using a boundbox around the State of Ceará using geohashes.

  • April 8 |  Matteo managed to generate movies using folium depicting the trajectories of infected people.

  • April 15 |  Matteo presented results from the probabilistic model regarding exponential strategy and recursive strategy​. Morning and night zoom meeting are established.

  • April 17 |  Shaojun incorporated Matteo's code in the pipeline from elastic search.

  • April 18 | ​ A final version of the HTML maps were produced by Matteo and Hernan. It will be used at the meetings with authorities from Ceará and posted at KCore webpage.

  • April 19 |  Fortaleza is going ahead with the App. Macedo will have a prototype by cuarta feria.

  • April 20 |  Saulo and Carles will start to write the first draft (for the first abstract/science). Higor and Matteo will identify all infected guys using Fortaleza time.

  • April 22 | Changed model to continuous version (UPDATE).

  • April 28 | Shaojun presented his complete pipeline. Shaojun suggested some experienced acquaintances.

  • May 01 |  Hernan showed a movie with the app from the government; Imanol showed the wireframe of his version of the app (android); Hernan asked that the app to show notifications and red zones.   

          

  • May 03 | Start investigating the GCC versus time. 

 

  • May 04 | Start study of kcore to understand the sustainability of the pandemic despite the almost vanishing of the GCC .

                                              

  • May 14 | First breakthrough: Higor and Matteo updated results on the k-core and GCC analysis:
    - The GCC analysis and the k-core analysis shows that a kcore of infected people keeps the epidemic alive.
    - We find convincing evidence that the max kcore is responsible for sustaining the spread of the epidemic beside the strong reduction observed in the GCC after the quarantine is implemented.
    - We move now to write t he paper. 

  • May 15 |  Leitmotif of the gr oup is coined:  "we are almost there..."    

                       

  • May 16 | Chaos emerged: we find a truncation error in the raw data that generated artificial kcores. Data glitch was clearly seen in the maps.   

                       

  • May 17 | Discussion on geolocazation of contacts for each k-core.                                                                -Hernan, Higor and Matteo detected strange regularities on them.                                                     -Shaojun did not detect any truncation error in the pipeline. They are an artifact from raw data. 

            

  • May 17 | Matteo and Hernan late night work. Chaos resolved. Truncated data removed. Kcore picture remains.     

                                                               

  • May 18 | Final model chosen: 4 layer - w=1 week -   r = 8m - Delta_t = 30 min - pc=0.9
    Remaining studies: clusters plots in map, kshell attack to destroy GCC, msd vs k-shell, SIR model to verify persistance of spreading in kcores.                   

  • May 22 | Discussion on the GCC attack results. Identification of node that connect the kcore clusters and their properties. Optimal strategy emerges = Betweenness centrality. 'Weak links' ideas are discussed.

  • Weak links removal is established as the best strategy to stop the pandemic. 

  • May 23 |  'Matteo in the beach' concept is established. It refers to the weakest link of them all, who needs to be quarantined to dismantle the GCC with minimal disruptions, ergo, the name 'Matteo in the beach'.

  • May 29 | Carles and Hernan full time on the paper. Matteo start with initial results on Puebla. Large decrease found in GCC at quarantine confirming Ceara results. Ceara App finally out.

  • June 3 | Partnership with the State of Puebla is established. We move to work with Alex and Jesus on contact tracing and app development in the state.

  • June 4 | Milestone reached. Our advanced analytics have tracked over half a million unique users  and over a billion datapoints over three months in the states of Puebla and Ceara.

  • June 5 | Version 1 of white paper is released.