2021-09-30, 10:00–10:30, Ushuaia
The Metropolitan Area of Buenos Aires is the third largest in Latin America, with millions commuters using public transport every day. As government officials we must make sense of the massive amounts of location data generated by our transport system. We used Apache Spark in combination with Geomesa in order to process over 100 million monthly GPS records from buses to create individual trajectories between terminals, infer direction of travel and derive meaningful performance indicators of transportation. We conclude that this is a great combination of tools to tackle big-geo and mobility problems of the kind expected in a big city government.
Buenos Aires Metropolitan Area is home to more than 15 million people and the third largest in Latin America. In the last few years, the City Government has dedicated great resources to become an innovation leader in the region. As in most cities across the world, mobility and transportation are one of the most important aspects of life in the city with around 3.4 million daily users before the pandemic and around 2.4 million these days.
As government officials, we face the challenge of making sense of the massive amount of location data generated by the public transport system. Achieving this allows us to propose and prioritize new projects, perform program impact assessments, make network and infrastructure changes, and ultimately to improve the life of citizens.
In order to better understand mobility, we set ourselves to identify individual trajectories from bus GPS records. Buses are the most widespread mode of transport, and they are used in more than 90% of the total trips in the metropolitan area. We first approached this problem on a sample on a Postgres database, and quickly found that for our need of processing ~100M monthly records Postgis would not be enough. At that point, we turned to Apache Spark and Geomesa, a free and open-source suite of tools that allows geospatial analyses in a distributed fashion.
We ingested GPS records into Spark and constructed point geometries from latitude and longitude attributes. We then partitioned the data using the bus line and individual vehicle, and calculated the differences in location and timestamp of each point relative to the previous one.
We then applied spatially aware rules to identify the moment when a bus leaves or enters its corresponding terminal’s area of influence, and used them to define the beginning and end of individual bus trips. Using the terminal data, we were also able to infer the direction. Finally we created linestring geometries from all points belonging to the same trip. These trips geometries with its associated attributes were then exported into a spatial database in order to visually communicate and analyse our results.
Starting from over 100M monthly GPS records, we identified ~1.5M individual bus trips, completed by more than 130 bus lines. Around 89% of records ended up as part of a trajectory and ~11% were discarded due to bad quality, or not meeting the criteria set to belong to a trajectory. With these individual trajectories we were able to produce key performance indicators of the bus transportation system and to encourage evidence-based decision-making inside the government.
In conclusion, we can say that SparkSQL & Geomesa are a great combination of tools to tackle big-geo and mobility problems expected in a big city government. They enabled us to identify millions of bus trajectories, and derive useful KPIs to support data-driven decisions.
Villarroel Torrez, Daniel (1)
Armesto Brosio, Andrés (1)
(1) Undersecretariat for Evidence-based Public Policy, Buenos Aires City Government.Track –
Use cases & applicationsTopic –
Government and InstitutionsLevel –
3 - Medium. Advanced knowledge is recommended.Language of the Presentation –
Data Scientist, Engineer, Public Sector & Urban Analytics.
Currently at Buenos Aires City Government, being part of the team leading Digital Transformation and Innovation initiatives. Our mission is to turn Buenos Aires into a data-driven city. We develop data exploitation projects and also encourage a cultural change within our government.
Geo-geek and Data Scientist at the city of Buenos Aires government.