Where are all the Data Tiles?: An Under-Appreciated Format for the Big Data Revolution
2021-10-01, 08:30–09:00, Buenos Aires

As big data and interactive visualizations gain popularity, there is an increasing need to quickly manipulate and represent massive datasets within the browser. The scale of these datasets challenges existing solutions requiring complex and unsatisfactory work-arounds to deliver a smooth user-experience.

Data tiles are images that contain data encoded in their pixels. Tiled using the standard web tile schema, they can provide instant access to numerous spatial data layers at a practically unlimited scale. Web-based "slippy" maps already take advantage of tiled imagery, and we routinely load and navigate “datasets” composed of trillions of data points (pixels) in our browsers. By simply modifying the images to contain raw data instead of visual imagery, data tiles let us browse and use data effortlessly at any scale. Our “GeoPngDB” format builds on existing solutions to provide a browser-friendly way of encoding raw data for consumption by web-based tools.


This talk will discuss the potential of tiled raster formats to provide visualization tools with raw data at unlimited scale. We will focus on a browser-friendly solution we have discovered, cover how the format works and share tools for creating tiles.

Don’t we have enough geospatial data formats already? We thought so too, but when it came to finding a solution to provide instant access to census data such as population distribution and employment for every block in the US, we couldn’t find an existing format that worked for us. We needed something we could host cheaply and load instantly; that could scale indefinitely while maintaining pixel-perfect data fidelity.

Nothing we could find checked all the boxes. Vector data was not the right fit because we had to choose between visual fidelity and performance: loading block-level data at a national scale would require downloading gigabytes of data and overwhelming the browser. Existing raster solutions, like COGs require a backend server to deliver data (based on a query) which can make them prohibitively expensive to host freely and openly.

This presentation will outline the solution we have developed. Building on FOSS work by MapZen, our approach uses raw data encoded in PNG tiles. Because PNGs are lossless (unlike JPEGs) we are able to encode numeric data in the pixel colors. We were able to build on MapZen’s solution for encoding terrain elevation data in map tiles, and adapt it for other types of data. Additional features include aggregation / nesting of values, variable precision to account for different zoom levels and support for arrays.

We will share use cases as well as references for creating GeoPngDB tiles in Node JS (e.g. from vector tiles), using a QGIS plugin, or as a Java library for use in raster-based analysis tools such as Conveyal’s R5.

This proposal provides a potential companion to server-backed raster formats as well as vector tiles that borrows its scalability from proven web-map techniques. You’ll learn what makes PNG-based data tiles the simplest, cheapest and fastest way to host massive data sets online in a way that supports the mission of FOSS.

For more information see: https://github.com/sasakiassociates/geo-png-db


Authors and Affiliations

Ken Goulding
Eric Youngberg
Raj Adi Raman

Track

Open data

Topic

Data collection, data sharing, data science, open data, big data, data exploitation platforms

Level

2 - Basic. General basic knowledge is required.

Language of the Presentation

English

Ken is a Principal at Sasaki, an internationally recognized multi-disciplinary design firm. Trained in architecture and planning, Ken has dedicated his career to finding the most pertinent applications of technology to planning and design, and he remains actively involved in inventing, prototyping, and building new tools and approaches.

This speaker also appears in: