Human-in-the-loop Machine Learning with GroundWork and STAC
2021-09-27, 09:00–13:00, María Remedios del Valle

Data management and interoperability is a challenge when solving geospatial problems with machine learning. Existing ML tools and standards outside of GIS are inadequate when dealing with the idiosyncrasies of geospatial data. Additionally, data acquisition and labeling are expensive processes. Human-in-the-loop workflows can help machine learning models improve more quickly.

In this workshop attendees will learn how to use data compliant with the STAC specification to organize and create training data, models, and predictions to power applications. Workshop attendees will learn what STAC is and how to use tools like PySTAC, Raster Vision, Franklin, and GroundWork to create a human-in-the-loop AI pipeline from beginning to end.


This workshop will focus on the practical application of machine learning and artificial intelligence to geospatial problems.

Figuring out how to gather, organize, store, and visualize training data and results is a central problem when building geospatial machine learning models. This challenge is exacerbated by each framework having its own opinion (or no opinion) about the proper organization of data for each stage of the ML pipeline. The STAC specification is an open-source, community-driven, extensible specification for managing large geospatial datasets. ML engineers and GIS analysts can leverage the consistency that this specification provides for machine learning. With an ever-growing set of tools, extensions, and applications built with STAC in mind, it makes sense to center machine learning pipelines around the specification. At Azavea, we have been building machine learning pipelines, both internally and for clients, based on STAC to encourage interoperability between teams, organizations, and tools.

Attendees at this workshop will learn what STAC is and how to use it in the context of domain adaptation in geospatial machine learning. During the workshop, attendees will accomplish several tasks.

First, attendees will create campaigns in GroundWork. Attendees will use the campaigns they create throughout the workshop to organize their machine learning work. Second, attendees will create some initial predicted labels in their campaigns using a provided machine learning model produced by Raster Vision. Third, attendees will learn to correct some of the model’s errors. Next, attendees will export their corrections as new training data to improve their models. These exports will be in the format of a STAC catalog. After that, attendees will use their exported catalogs, PySTAC, and Raster Vision to re-train and improve the provided machine learning model. Finally, after a few iterations, attendees will load their final labels into Franklin, an open-source, user-friendly STAC application that can be used to power visualizations and downstream applications.

Attendees will leave the workshop with a web-addressable STAC API containing the labels and imagery used in the workshop. While this workflow will use Raster Vision and PySTAC for each iteration on the machine learning model, the workflow itself is transferable to whatever frameworks and languages attendees are comfortable with in their non-conference lives.

Relevant links:

STAC specification: https://stacspec.org/
PySTAC: https://pystac.readthedocs.io/en/latest/
Raster Vision: https://rastervision.io/
Franklin: https://azavea.github.io/franklin/
GroundWork: https://groundwork.azavea.com/


Authors and Affiliations

James Santucci (1)
Adeel Hassan (1)

(1) Azavea, Philadelphia, PA, US

Level

2 - Basic. General basic knowledge is required.

Requirements for the Attendees

This talk will require use of containers and running Raster Vision commands. The container images you need will be provided, but you should have a working knowledge of Docker to begin with. Additionally, part of the workflow will require interacting with Azavea's GroundWork application -- this requires WebGL. You can check whether your browser supports WebGL at https://get.webgl.org/.

James Santucci is a senior software developer at Azavea on the Raster Foundry and GroundWork team.