For this challenge, three CDR (Call Detail Record) datasets will be made available by Türk Telekom, together with two files on cell tower locations.

The datasets will include one year of mobile CDR data, collected between January 2017 – December 2017. 

All datasets will be stored in plain text format.


The geographical coordinates (longitude, lattitude) of the mobile network antennae are given (BTSs - Base Transceiver Stations). It should be noted that several BTSs may be co-located. Each line of this file contains the BTS ID, and a district ID, for the district where the antenna is located. Each district may contains several antennae.


For example:

BTS_ID, district_ID, longitude, latitude






For coarse mobility data, we do not provide individual base stations, but only district information. There are 971 districts (or prefectures) in Turkey. The base stations included in the dataset are collected in approximately 481 districts across the country. The rough geometric center of each district will be provided separately.


Form example:

district_ID, district_name, city_name, longitude, latitude

1, Beşiktaş, İstanbul, -17.5251,14.74683

2, Sarıyer, İstanbul, -17.5164,14.74673


One year site-to-site traffic on an hourly basis.  This dataset contains the traffic between each site for a year. The file Veri_Seti1_201701 (i.e. Dataset_1_2017_01, indicating dataset type, collection year and month) contains monthly voice traffic between sites and is structured as follows:


timestamp: day / hour formatted as YYYY-MM-DD HH (rounded up to hours, HH from 1 to 24)

outgoing_site_id: id of site the call originated from

incoming_site_id: id of site receiving the call

total number_of_calls: the total number of calls between these two sites during this hour

number of calls originated from refugees: the  number of calls originated from numbers with refugee status.

total call duration: the total duration of all calls between these two sites during this hour.

total call duration originated from refugees: the total duration of calls between these two sites during this hour originating from refugee IDs.


Similarly, the file Veri_Seti1_SMS_201701 contains monthly text traffic between sites and are structured as follows:


timestamp: day and hour considered in format YYYY-MM-DD HH  (rounded up to hours, HH from 1 to 24)

outgoing_site_id: id of site the SMS originated from

incoming_site_id: id of site receiving the SMS

number_of_SMS: the total number of SMS messages between these two sites during this hour

number of SMS originated from refugees: the  number of SMS originated from numbers with refugee IDs.

timestamp, outgoing_site_id, incoming_site_id,... ...number_of_calls, refugee_calls, total_call_duration, refugee_call_duration

2013-04-01 00,2,2,7,1,138,20

2013-04-01 00,2,3,4,0,136,0

2013-04-30 23,1659,608,0,1,0,3601


This dataset will provide the cell tower identifiers used by a group of randomly chosen active users to make phone calls and send texts. The data will be timestamped and a particular group of users will be observed for a period of 2 weeks. At the end of the two-week period, a fresh sample of active users will be drawn at random. Each sample contains 3% of the refugee base plus equal amount of non-refugee users. To protect privacy, new random identifiers are chosen in every time period. Time stamps are rounded to the minute.


The phone numbers for these users are removed, and each one is assigned a unique random number instead. These numbers will start with 1 for refugees, 2 for non-refugees, 3 for unknown. However, this indicator should be considered to be somewhat noisy. Among the users who are marked as refugees, there may be customers who are not refugees, and vice versa. Consequently, it will not be possible to say with 100% certainty whether an invitation CDR belongs to a refugee or not. There is no identifying information about the other party of the call; only the area code (1: refugee, 2: not refugee, 3: unknown) is given.


It should be noted that there are multiple mobile operators for each region. Therefore, the number of phone calls and conversations do not represent actual total numbers, although they are indicators of the total amount of conversations of the region. Numbers of -99 or 9999 are given for missing antenna information, for instance if the other party uses a different operator.


Monthly voice traffic between the areas are stored in the form of Veri_Seti2_201701W_In / Out for VOICE and in the format of Veri_Seti2_201701W_SMS_In / Out for SMS. These are structured as follows:



caller id: rrandomly assigned value, prefixed with digit indicating refugee status (1: refugee, 2: non-refugee, 3: unknown)

timestamp: day / hour considered in format YYYY-MM-DD HH:MM (rounded up to minute)

callee prefix: refugee, 2: non-refugee, 3: unknown

site_id: id of site recording the call 

call type: 1 for outgoing, 2 for incoming


If incoming SMSs come from the 9333 service or from different SMS services and applications, the dialed area code is given as 3: unknown.


For example:

caller id, timestamp, callee prefix, site id, call type

1138, 2013-04-01 12:32, 1, 52, 1

309095, 2013-04-01 12:33, 3, -1, 2


In this dataset, the trajectories of 50,000 randomly selected refugees and 50,000 randomly selected non-refugees are provided for the entire observation period, but with reduced spatial resolution. 


The spatial resolution is reduced by replacing antenna identifiers with broader area identifiers, called districts, or prefectures. The map of Turkey is divided into 971 districts officially, our dataset contains data from 481 districts. 


The files of the dataset are split into 12 monthly accumulated files. Veri_Seti3_201701_In/Out will contain records of the form:


caller id: randomly assigned value, prefixed with digit indicating refugee status (1: refugee, 2: non-refugee)

timestamp: day / hour considered in format YYYY-MM-DD HH:MM (round up to minute)

prefecture_id: id of prefecture recording the call 


For example:

caller id, timestamp, prefecture id

1138, 2013-04-01 12:32, 167,

209095, 2013-04-01 12:33, 23

176202, 2013-04-01 12:33, 75


Dataset 1 contains the number and duration of calls per cell tower. There is little scope of privacy breach being caused by Dataset 1 alone, since it contains no personally identifiable information about the users. It could be used to study traffic patterns during the entire period but reveals no information pertaining to the users.This dataset enables analysis of activity levels of different areas, as well as makes it possible to establish communication links between areas.

Dataset 2 contains detailed call records. To protect the privacy of users, phone numbers are replaced with random numbers, and only 2-weeks of data is recorded for any given user. It should not be forgotten that this dataset only makes available call records made by one operator for each region. The exact physical location per call is not shared. The dataset only records the id of the cell tower that handled the call. Since calls are not always handled by the nearest cell tower (depending on how busy a tower is, and about the physical lay of the land), this adds another layer of protection.

Dataset 3 contains records for an entire year, but the physical location is very coarsely indicated. Also here, all personal information is excluded. There is only a refugee status indicator. However, this indicator is not perfect, and contains some noise. This makes it impossible to say with certainty whether a record belongs to a refugee or not.



Welcome to the homepage of the Data for Refugees (D4R) Challenge Workshop. This workshop will gather stakeholders seeking to use Big Data for improving the living conditions of refugees. The Challenge participants will present their findings at the Workshop, and Awards will be distributed on different categories:
Health, education, social integration, unemployment, safety & security, respectively.

The presentations will be in English.

Date and Location

This workshop will take place on the 5th of November 2018 at Boğaziçi University, Albert Long Hall.

Participation and Registration

The workshop is free to attend, but security requires the names of attendees to enter the university on the day of the workshop. Please register if you like to attend.

First Name
Last Name
Tentative Program

09:00 - 09:30   Opening talks (Boğaziçi University, TUBITAK, Türk Telekom)
09:30 - 10:00   D4R Challenge Award Ceremony
10:00 - 12:00   Oral presentations from challenge participants
12:00 - 14:00   Lunch break and poster presentations from challenge participants
14:00 - 16:00   Oral presentations from challenge participants
16:00 - 17:00   Group meetings for challenge areas

Targeted Outcomes

The aim of the Workshop is to disseminate the findings of the D4R Challenge to a wider public. 31 research groups will present their results in the Workshop. The group meetings will bring the Challenge participants together to initiate a recommendations document (to be completed after the Workshop). These recommendations will be compiled in a jointly-authored chapter in the forthcoming book:

Salah, A.A., A. Pentland, B. Lepri, E. Letouze, P. Vinck, Y.A. de Montjoye, X. Dong (eds.), Guide to Mobile Data Analytics in Refugee Scenarios, Springer International, forthcoming.

Furthermore, a white paper in Turkish will be edited from the Challenge outcomes to be shared with the authorities.


Scientific Committee
Albert Ali Salah, Boğaziçi University
Alex Pentland, MIT
Bruno Lepri, FBK
Emmanuel Letouze, Data-Pop Alliance
Patrick Vinck, Harvard University
Yves-Alexandre de Montjoye, Imperial College London
Xiaowen Dong, University of Oxford

Project Evaluation Committee
Senem Özyavuz, Turk Telekom
Iyad Rahwan, MIT
Anahi Ayala Iacucci, Internews
Bülent Sankur, Boğaziçi University
Yıldırım Bahadırlar, TUBITAK BILGEM
Alex Rutherford, MIT
Claire Melamed, Global Partnership for Sustainable Development Data
Jean-Marie Garelli, UNHCR
Ahmad Garibeh, Istanbul & I
Geoffrey Charles Fox, Indiana University
Josephine Goube, Techfugees
Fırat Yaman Er, Turk Telekom
Phuong Pham, Harvard Humanitarian Initiative
Mithat Büyükhan, TC Ministry of Education
Mazen AboulHosn, IOM
Ömer Hakan Şimşek, TC Ministry of Health
Nona Zicherman, UNICEF
Manuel Garcia-Herranz, UNICEF
Vedran Sekara, UNICEF


Albert Ali Salah, salah@boun.edu.tr

Attention Please

PROJECT GROUPS SUBMIT PROPOSALS has been extended until March 23, 2018.