Deep Dive with TileDB and SONAR
2021-10-01, 08:00–08:30, Bariloche

We should show our SONAR data the same love and attention as the rest of our point clouds. SONAR suffers from the same challenges of scale, datasets split across thousands of files, and inability to quickly manipulate data. And yet, SONAR is one of the most interesting point cloud data types that should be analyzed at cloud scale, from finding shipwrecks to knowing whether a vessel is going to block the Suez Canal. This is where TileDB Embedded can help as an open-source library and cloud-native data engine for working with large multi-dimensional arrays.

TileDB Embedded can help accelerate SONAR analysis workflows in several ways. First, I will cover how analysis-ready TileDB arrays of many TBs can be sliced directly from cloud object storage in seconds, returning dataframes that are easily accessible from Pandas in Jupyter notebooks. Next, I will present TileDB integrations with familiar SONAR and point cloud tools like MB-System and PDAL, and how TileDB can help apply this information with modern data science techniques.

Finally, I will show real use-cases of TileDB with SONAR point clouds. With TileDB, you can avoid downloading full datasets and working across several domain-specific libraries. TileDB allows you to efficiently extract specific features and points of interest. The talk will conclude by addressing these challenges with a demo of subsea point cloud analysis.


TileDB Embedded is a powerful storage engine architected around dense and sparse multi-dimensional arrays. SONAR is sparse 3D data, so TileDB is an ideal solution.

As SONAR data makes its way from desktops to the cloud, an open-source data science ecosystem will be crucial in making the leap. TileDB Embedded (https://github.com/TileDB-Inc/TileDB) is particularly well-positioned to close this gap with numerous open-source integrations and cloud-native performance that will make SONAR relevant in modern data science.

Integrations

TileDB Embedded brings together many integrations including open-source libraries for PDAL, GDAL, Rasterio, and SAR data.

The larger vision is to make TileDB Embedded interoperable with all popular SQL engines and data science tools. Slicing conforms to NumPy-like syntax. Existing connectors include Apache Spark, Dask, xarray, PrestoDB, and MariaDB. In the case of Python Jupyter notebooks, TileDB also has integrations for 3D visualization, including a notebook widget using Babylon.js. TileDB Embedded supports a variety of APIs built on top of its core C++ library, including Python, R, Java, and Go.

Performance

With TileDB Embedded you can ingest TBs of data, compress it at the network edge, and slice it quickly.

TileDB offers great compression because of its columnar format. It organizes data on disk in a way that makes existing compressors more effective. Users can apply arbitrary compression filters in combination on each array attribute, mixing and matching filters from one column to the next.

Slicing millions of points from a large TileDB Embedded array takes only seconds. Data is persisted in an analysis-ready format — both locally on a laptop or remotely on cloud object storage. Sparse datasets are traversed efficiently thanks to R-Tree indexing, and offer other options like Hilbert curve ordering.

This talk will serve as your jumping-off point for TileDB and SONAR, and I look forward to diving right in!


Authors and Affiliations

Norman Barker (TileDB)

Track

Use cases & applications

Topic

Sensors, remote sensing, laser-scanning, structure from motion

Level

1 - Principiants. No required specific knowledge is needed.

Language of the Presentation

English

Norman is the VP of Geospatial at TileDB. Prior to joining TileDB, Norman focused on spatial indexing and image processing, and held engineering positions at Cloudant, IBM and Mapbox. He has a master's degree in Mathematics from the University of Durham, England.