Processing Massive-Scale Geospatial Data with Apache Sedona
2021-09-29, 15:30–16:00, Humahuaca

Apache Sedona is a cluster computing system for processing large-scale spatial data. Sedona provides a set of out-of-the-box Spatial Resilient Distributed Datasets and Dataframes that can efficiently load, process, and analyze large-scale spatial data across a cluster of machines. In the talk, we will give an overview of the Apache Sedona architecture. We will then demonstrate Sedona’ s Scala, SQL, and Python APIs to program analytics pipelines on-top of geospatial data. For more details about Apache Sedona, please visit the following links:

Apache Sedona website: http://sedona.apache.org/
Apache Sedona github: https://github.com/apache/incubator-sedona/
Speaker Twitter: https://twitter.com/mosarwat


The talk presents the details of designing and developing Apache Sedona, which extends the core engine of Spark and SparkSQL to support spatial data types, indexes, and geometrical operations at scale. The talk also gives a detailed analysis of the technical challenges and opportunities of extending Spark to support state-of-the-art spatial data partitioning techniques: uniform grid, R-tree, Quad-Tree, and KDB-Tree. The talk will explain how building local spatial indexes, e.g., R-Tree or Quad-Tree, on each Spark data partition can speed up the local computation and hence decrease the overall runtime of the spatial analytics program. Extensive experiments on real spatial datasets show that Sedona achieves up to two orders of magnitude faster run time performance than existing Hadoop-based systems and up to an order of magnitude faster performance than Spark-based systems.


Authors and Affiliations

Mohamed (Mo) Sarwat
Computer Science Faculty at Arizona State University
CEO and Founder of Wherobots

Track

Software

Topic

Data collection, data sharing, data science, open data, big data, data exploitation platforms

Level

1 - Principiants. No required specific knowledge is needed.

Language of the Presentation

English

Mo is an assistant professor of computer science at Arizona State University and the CEO / Co-founder of Wherobots. He is the architect of Apache Sedona (a scalable system for processing big geospatial data) that is being used by major tech companies.