Forecasting the Future of Weather Data with GOES-R and TileDB
2021-09-29, 11:30–12:00, Aconcagua

The Geostationary Operational Environment Satellite-R (GOES-R) series provides continuous satellite imagery of the Earth’s eastern hemisphere. GOES-R series datasets are made available through multiple cloud service providers via NOAA’s Big Data Program. The datasets include Level 1b and Level 2 satellite data split into directories of NetCDF files stored for consecutive time periods. This talk will show how to use TileDB Embedded, an open-source universal storage engine, to combine data from multiple GOES-R products into a single easily-accessible dataset.

In this talk, I will show how to ingest data from the GOES-R Advance Baseline Imager (ABI) and Geostationary Lightning Mapper (GLM) into cloud-ready storage using TileDB Embedded. I will discuss the pros and cons of keeping the original NetCDF data model, and show how to combine datasets that consist of both dense and sparse arrays. With the arrays stored in TileDB Embedded, I will show how to efficiently slice weather data, locally and remotely on cloud object storage; how to use data versioning to time-travel across any changes to an array; and give an overview of some of the open-source tools that integrate directly with TileDB Embedded.


GOES-R offers a variety of useful datasets stored in NetCDF hierarchies, but NetCDF is not the only way. In this talk, I’ll show you how to efficiently convert GOES-R products of interest into a single TileDB Embedded group, making data easier to slice and introducing additional features like sparse storage, data versioning and more.

I will begin the talk with an introduction to TileDB Embedded. TileDB Embedded is — quite literally — an embedded C++ library with a powerful array format. It is all open-source, and it can store all geospatial data in a unified way. Next I will introduce TileDB-CF-Py, an open-source Python library for ingesting NetCDF data into TileDB Embedded. Together, they create a cloud-ready, alternative array engine that can conform to the NetCDF model — speeding query times, saving storage space, and providing many new indexing and labeling options.

Using these tools, I will cover the mechanics of GOES-R-to-TileDB conversions, providing strategies you can apply to your own analyses. Finally, I will walk through a detailed example, where I will show how to combine, slice and visualize sparse data on lightning strikes from the Geostationary Lightning Mapper (GLM) dataset with dense data from the Advanced Baseline Imager (ABI).


Authors and Affiliations

Julia Dark (TileDB)

Track

Use cases & applications

Topic

Data collection, data sharing, data science, open data, big data, data exploitation platforms

Level

1 - Principiants. No required specific knowledge is needed.

Language of the Presentation

English

Dr. Julia Dark is Senior Software Engineer at TileDB working on geospatial applications.