For this challenge, three CDR (Call Detail Record) datasets will be made available by Türk Telekom, together with two files on cell tower locations.

The datasets will include one year of mobile CDR data, collected between January 2017 – December 2017. 

All datasets will be stored in plain text format.


The geographical coordinates (longitude, lattitude) of the mobile network antennae are given (BTSs - Base Transceiver Stations). It should be noted that several BTSs may be co-located. Each line of this file contains the BTS ID, and a district ID, for the district where the antenna is located. Each district may contains several antennae.


For example:

BTS_ID, district_ID, longitude, latitude






For coarse mobility data, we do not provide individual base stations, but only district information. There are 971 districts (or prefectures) in Turkey. The base stations included in the dataset are collected in approximately 481 districts across the country. The rough geometric center of each district will be provided separately.


Form example:

district_ID, district_name, city_name, longitude, latitude

1, Beşiktaş, İstanbul, -17.5251,14.74683

2, Sarıyer, İstanbul, -17.5164,14.74673


One year site-to-site traffic on an hourly basis.  This dataset contains the traffic between each site for a year. The file Veri_Seti1_201701 (i.e. Dataset_1_2017_01, indicating dataset type, collection year and month) contains monthly voice traffic between sites and is structured as follows:


timestamp: day / hour formatted as YYYY-MM-DD HH (rounded up to hours, HH from 1 to 24)

outgoing_site_id: id of site the call originated from

incoming_site_id: id of site receiving the call

total number_of_calls: the total number of calls between these two sites during this hour

number of calls originated from refugees: the  number of calls originated from numbers with refugee status.

total call duration: the total duration of all calls between these two sites during this hour.

total call duration originated from refugees: the total duration of calls between these two sites during this hour originating from refugee IDs.


Similarly, the file Veri_Seti1_SMS_201701 contains monthly text traffic between sites and are structured as follows:


timestamp: day and hour considered in format YYYY-MM-DD HH  (rounded up to hours, HH from 1 to 24)

outgoing_site_id: id of site the SMS originated from

incoming_site_id: id of site receiving the SMS

number_of_SMS: the total number of SMS messages between these two sites during this hour

number of SMS originated from refugees: the  number of SMS originated from numbers with refugee IDs.

timestamp, outgoing_site_id, incoming_site_id,... ...number_of_calls, refugee_calls, total_call_duration, refugee_call_duration

2013-04-01 00,2,2,7,1,138,20

2013-04-01 00,2,3,4,0,136,0

2013-04-30 23,1659,608,0,1,0,3601


This dataset will provide the cell tower identifiers used by a group of randomly chosen active users to make phone calls and send texts. The data will be timestamped and a particular group of users will be observed for a period of 2 weeks. At the end of the two-week period, a fresh sample of active users will be drawn at random. Each sample contains 3% of the refugee base plus equal amount of non-refugee users. To protect privacy, new random identifiers are chosen in every time period. Time stamps are rounded to the minute.


The phone numbers for these users are removed, and each one is assigned a unique random number instead. These numbers will start with 1 for refugees, 2 for non-refugees, 3 for unknown. However, this indicator should be considered to be somewhat noisy. Among the users who are marked as refugees, there may be customers who are not refugees, and vice versa. Consequently, it will not be possible to say with 100% certainty whether an invitation CDR belongs to a refugee or not. There is no identifying information about the other party of the call; only the area code (1: refugee, 2: not refugee, 3: unknown) is given.


It should be noted that there are multiple mobile operators for each region. Therefore, the number of phone calls and conversations do not represent actual total numbers, although they are indicators of the total amount of conversations of the region. Numbers of -99 or 9999 are given for missing antenna information, for instance if the other party uses a different operator.


Monthly voice traffic between the areas are stored in the form of Veri_Seti2_201701W_In / Out for VOICE and in the format of Veri_Seti2_201701W_SMS_In / Out for SMS. These are structured as follows:



caller id: rrandomly assigned value, prefixed with digit indicating refugee status (1: refugee, 2: non-refugee, 3: unknown)

timestamp: day / hour considered in format YYYY-MM-DD HH:MM (rounded up to minute)

callee prefix: refugee, 2: non-refugee, 3: unknown

site_id: id of site recording the call 

call type: 1 for outgoing, 2 for incoming


If incoming SMSs come from the 9333 service or from different SMS services and applications, the dialed area code is given as 3: unknown.


For example:

caller id, timestamp, callee prefix, site id, call type

1138, 2013-04-01 12:32, 1, 52, 1

309095, 2013-04-01 12:33, 3, -1, 2


In this dataset, the trajectories of 50,000 randomly selected refugees and 50,000 randomly selected non-refugees are provided for the entire observation period, but with reduced spatial resolution. 


The spatial resolution is reduced by replacing antenna identifiers with broader area identifiers, called districts, or prefectures. The map of Turkey is divided into 971 districts officially, our dataset contains data from 481 districts. 


The files of the dataset are split into 12 monthly accumulated files. Veri_Seti3_201701_In/Out will contain records of the form:


caller id: randomly assigned value, prefixed with digit indicating refugee status (1: refugee, 2: non-refugee)

timestamp: day / hour considered in format YYYY-MM-DD HH:MM (round up to minute)

prefecture_id: id of prefecture recording the call 


For example:

caller id, timestamp, prefecture id

1138, 2013-04-01 12:32, 167,

209095, 2013-04-01 12:33, 23

176202, 2013-04-01 12:33, 75


Dataset 1 contains the number and duration of calls per cell tower. There is little scope of privacy breach being caused by Dataset 1 alone, since it contains no personally identifiable information about the users. It could be used to study traffic patterns during the entire period but reveals no information pertaining to the users.This dataset enables analysis of activity levels of different areas, as well as makes it possible to establish communication links between areas.

Dataset 2 contains detailed call records. To protect the privacy of users, phone numbers are replaced with random numbers, and only 2-weeks of data is recorded for any given user. It should not be forgotten that this dataset only makes available call records made by one operator for each region. The exact physical location per call is not shared. The dataset only records the id of the cell tower that handled the call. Since calls are not always handled by the nearest cell tower (depending on how busy a tower is, and about the physical lay of the land), this adds another layer of protection.

Dataset 3 contains records for an entire year, but the physical location is very coarsely indicated. Also here, all personal information is excluded. There is only a refugee status indicator. However, this indicator is not perfect, and contains some noise. This makes it impossible to say with certainty whether a record belongs to a refugee or not.


21 JANUARY 2019

Welcome to the homepage of the Data for Refugees (D4R) Challenge Workshop. This workshop will gather stakeholders seeking to use Big Data for improving the living conditions of refugees. The Challenge participants will present their findings at the Workshop, and Awards will be distributed on different categories:
Health, education, social integration, unemployment, safety & security, respectively.

The presentations will be in English.

Date and Location

This workshop will take place on the 21th of January 2019 at Boğaziçi University, Albert Long Hall.

Participation and Registration

The workshop is free to attend, but security requires the names of attendees to enter the university on the day of the workshop. Please register if you like to attend.

09:00 - 09:30   Opening talks (Boğaziçi University, TUBITAK, Türk Telekom)
09:30 - 10:00   D4R Challenge Award Ceremony
10:00 - 11:00   Session 1 (Oral)
Reducing measles risk in Turkey through social integration of Syrian refugees Paolo Bosetti, Piero Poletti, Massimo Stella, Bruno Lepri, Stefano Merler and Manlio De Domenico Data Analytics without Borders: Multi-Layered Insights for Syrian Refugee Crisis Ozgun Ozan Kılıç, Mehmet Ali Akyol, Oğuz Işık, Banu Günel Kılıç, Arsev Umur Aydınoğlu, Elif Sürer, Hafize Şebnem Düzgün, Sibel Kalaycıoğlu and Tuğba Taşkaya Temizel UDMIT: An Urban Deep Map for Integration in Turkey Sedef Turper Alışık, Damla Bayraktar Aksel, Asım Evren Yantaç, Lemi Baruh, Sibel Salman, İlker Kayı, Ahmet İçduygu and Ivon Bensason 11:00 - 12:00   Session 2 (Oral)
AROMA_CoDa: Assessing Refugees’ Onward Mobility through the Analysis of Communication Data Harald Sterly, Benjamin Etzold, Lars Wirkus, Patrick Sakdapolrak, Jacob Schewe, Carl-Friedrich Schleussner and Benjamin Hennig Measuring fine-grained multidimensional integration using mobile phone metadata: the case of Syrian refugees in Turkey Michiel Bakker, Daoud Piracha, Patricia Lu, Keis Bejgo, Mohsen Bahrami, Yan Leng, Jose Balsa-Barreiro, Julie Ricard, Alfredo Morales, Vivek Singh, Burcin Bozkaya, Selim Balcisoy and Alex Pentland Quantified Understanding of Syrian Refugee Integration in Turkey Wangsu Hu, Ran He, Jin Cao, Lisa Zhang, Huseyin Uzunalioglu, Ahmet Akyamac and Chitra Phadke 12:00 - 14:00   Session 3 (Poster) and lunch
Refugees in undeclared employment - A case study in Turkey Fabian Bruckschen, Till Koebe, Melina Ludolph, Maria Francesca Marino and Timo Schmid Mobile Data for Mobility: Travel and Communication Patterns of Syrian Refugees Eda Beyazıt, Ervin Sezgin, Kerem Arslanli and Mehmet Gencer Segregation and Sentiment: Estimating Refugee Segregation and its Effects Using Digital Trace Data Neal Marquez, Emilio Zagheni and Ingmar Weber Integration of Syrian refugees: insights from D4R, media events and housing market data Simone Bertoli, Paolo Cintia, Fosca Giannotti, Etienne Madinier, Caglar Ozden, Michael Packard, Dino Pedreschi, Hillel Rapoport, Alina Sirbu and Biagio Speciale Exploring Refugee Mobility due to Large Scale Events using Mobile Phone Records Fatima K. Abu Salem, Al-Abbas Khalil, Ahmad Dhaini, Joachim Diederich, Shady Elbassuoni and Wassim El Hajj An Overview of Group Behavior on Turkey Humberto T. M-Neto, Jussara M. Almeida, Artur Ziviani, Virgilio A. F. Almeida, Jaqueline Faria de Oliveira, Douglas C. Teixeira and Haron C. Fantecele New Approaches to the Study of Spatial Mobility and Economic Integration of Refugees in Turkey Steven Reece, Franck Duvell, Carlos Vargas-Silva and Zovanga Kone Syrian Refugee Integration in Turkey: Evidence from Call Detail Records Tugba Bozcaga, Fotini Christia, Elizabeth Harwood, Constantinos Daskalakis and Christos Papadimitriou Optimizing the Access to Healthcare Services in Dense Refugee Hosting Urban Areas: A Case for Istanbul Tarik Altuncu, Nur Sevencan and Ayse Seyyide Kaptaner Social Integration of Syrian Refugees: Some Insights from Call Detail Record Datasets Nuran Bayram-Arli, Fatih Cavdur, Mine Aydemir, Fadime Aksoy and Asli Sebatli Refugee Integration in Turkey: A Study of Mobile Phone Data Ismail Uluturk, Ismail Uysal and Onur Varol Measuring Segregation of Syrian Refugees via Mobile Call Detail Records Fatih Uludağ, Halit Eray Çelik, Serbest Ziyanak, Murat Canayaz and Fikriye Ataman Reaching all children:A data-driven allocation strategy of educational resources for Syrian refugees. Suad Aldarra, Lorenzo Lucchini, Elisa Omodei and Laura Alessandretti Developing Integration Policy for Refugees through Mobile Phone Data Analysis: A Study on Türk Telekom Customers Ibrahim Zincir, Tohid Ahmed Rana, Ayselin Yıldız and Dilaver Arıkan Açar 14:00 - 15:00   Session 4 (Oral)
Measuring and mitigating behavioural segregation as an optimisation problem: the case of Syrian refugees in Turkey Daniel Rhoads, Javier Borge-Holthoefer and Albert Solé-Ribalta Refugee Mobility: Evidence from Phone Data in Turkey Luisito Bertinelli, Rana Comertpay, Anastasia Litina, Jean-François Maystadt, Benteng Zou and Michel Beine Mobility and Calling Behavior to Assess the Integration of Syrian Refugees in Turkey Antonio Luca Alfeo, Mario Giovanni C.A. Cimino, Bruno Lepri and Gigliola Vaglini 15:00 - 16:00   Session 5 (Oral)
Characterizing the Mobile Phone Use Patterns of Refugee Hosting Provinces in Turkey Ross Gore, Meltem Y. Sener, Christine Boshuijzen-van Burken, Erika Frydenlund, Engin Bozdag and Christa de Kock Improve Education Opportunities for Better Integration of Syrian Refugees in Turkey Marco Mamei, Seyit Cilasun, Marco Lippi, Francesca Pancotto and Semih Tumen Towards an Understanding of Refugee Segregation, Isolation, Homophily and Ultimately Integration in Turkey Using Call Detail Records Jeremy Boy, David Pastor, Marguerite Nyhan, Rebeca Moreno Jimenez, Daniel Macguire and Miguel Luengo Oroz

Targeted Outcomes

The aim of the Workshop is to disseminate the findings of the D4R Challenge to a wider public. 31 research groups will present their results in the Workshop. The group meetings will bring the Challenge participants together to initiate a recommendations document (to be completed after the Workshop). These recommendations will be compiled in a jointly-authored chapter in the forthcoming book:

Salah, A.A., A. Pentland, B. Lepri, E. Letouze, P. Vinck, Y.A. de Montjoye, X. Dong (eds.), Guide to Mobile Data Analytics in Refugee Scenarios, Springer International, forthcoming.

Furthermore, a white paper in Turkish will be edited from the Challenge outcomes to be shared with the authorities.


Scientific Committee
Albert Ali Salah, Boğaziçi University
Alex Pentland, MIT
Bruno Lepri, FBK
Emmanuel Letouze, Data-Pop Alliance
Patrick Vinck, Harvard University
Yves-Alexandre de Montjoye, Imperial College London
Xiaowen Dong, University of Oxford

Project Evaluation Committee
Senem Özyavuz, Turk Telekom
Iyad Rahwan, MIT
Anahi Ayala Iacucci, Internews
Bülent Sankur, Boğaziçi University
Yıldırım Bahadırlar, TUBITAK BILGEM
Alex Rutherford, MIT
Claire Melamed, Global Partnership for Sustainable Development Data
Jean-Marie Garelli, UNHCR
Ahmad Garibeh, Istanbul & I
Geoffrey Charles Fox, Indiana University
Josephine Goube, Techfugees
Fırat Yaman Er, Turk Telekom
Phuong Pham, Harvard Humanitarian Initiative
Mithat Büyükhan, TC Ministry of Education
Mazen AboulHosn, IOM
Ömer Hakan Şimşek, TC Ministry of Health
Nona Zicherman, UNICEF
Manuel Garcia-Herranz, UNICEF
Vedran Sekara, UNICEF


Albert Ali Salah, salah@boun.edu.tr

Attention Please

