Inferring journeys from image hosting services
The main challenge we faced was getting the data. Which data source would help us determine the places BRIC tourists visit? We found out that image hosting services such as Flickr or Picasa were pretty good proxys to infer most visited places: pictures metadata contains valuable information, particularly location information.
By analyzing 2,9M pictures from Flickr and other local image hosting services over a 2-year period, we were able to draw a detailed mapping of BRIC touristic flows in France, both in space (what are the most visited cities or monuments?) and in time (what is the seasonality of tourism from BRIC countries?). Multiple techniques were used to collect, clean and analyze the data. One of the most challenging task we had to perform was to determine the exact nationality of people posting pictures: machine learning techniques helped us solve this problem by inferring nationality from temporal density of pictures uploads.
Unlocking unprecedented touristic patterns
We helped Atout France by delivering directly actionable insights. Among the large set of patterns in data we uncovered, 3 were particularly insightful:
- There are different maturity level among BRIC tourists journeys. Whereas Chinese and Indian tourists tend to stick to Paris and its world-famous monuments, Russian visitors are a bit more adventurous and are keen to explore French ski resorts, and are big fans of the French Riviera.
- French regions have different levels of affinity with BRIC tourists, making them more or less interesting to shortlist to welcome foreign tourists, and for being promoted abroad.
- Chinese tourists have a noticeably odd visit pattern. Due to low-cost flights between Beijing and German airports, these tourists come to Paris by bus from there, and often stop in Strasbourg, thus provoking large waves of Chinese citizens in those cities.