Hands-on workshop using open-source Python software for scalable analysis of geospatial data
2021-09-28, 14:00–18:00, Mercedes Sosa

Bring your data and laptop to this 4-hour virtual hands-on workshop sponsored by the Pangeo project! Pangeo is first and foremost a community promoting open, reproducible, and scalable science (http://pangeo.io/). Workshop participants will learn emerging best-practices for interactive analysis of large volumes of multidimensional earth observation data, such as high-resolution stacks of multispectral imagery and synthetic aperture radar products.
We will focus on data discovery with SpatioTemporal Asset Catalogs (STAC), data loading with Cloud-optimized formats (Cloud-Optimized Geotiff, ZARR), and scalable analysis with Xarray and Dask Libraries. These workflows are based entirely on free and open-source software and work anywhere - whether you’re using a laptop or connecting to public data hosted by major cloud providers.
This workshop will assume an intermediate level understanding of the Python programming language. We will have tutorials on fundamental Python geospatial software and focus on applying newly-learned skills to datasets that participants utilize in their own work. Instructors will be on hand to help out.

Learning Objectives

  1. Recognize OSS Python software libraries and cloud-hosted infrastructure that comprise the Pangeo ecosystem and explain how they work together
  2. Analyse remote public datasets with Xarray
  3. Work efficiently with very large geospatial datasets using Dask
  4. Interact with the online Python scientific community

Draft Workshop Agenda

Part1. Welcome, introductions
- Introduction to the Pangeo project, establishing a positive learning community
- Participants briefly describe projects (depending on number)
- Getting familiar with hosted infrastructure (JupyterHub, BinderHub, Microsoft Planetary Computer)

Coffee break!

Part2. Essential Python
- pySTAC: metadata and cataloging of datasets (Cloud-optimized Geotiff, Zarr)
- Xarray: multi-dimensional arrays
- Dask: tools for scaling and parallelizing Python workflows
- 30 mins to experiment and ask questions of instructors

Interactive workshop content is hosted here https://gallery.pangeo.io, please explore any of these examples before attending.


Authors and Affiliations

Scott Henderson, University of Washington eScience Institute, USA
Tom Augspurger, Microsoft, USA

Level

2 - Basic. General basic knowledge is required.

Requirements for the Attendees

It will be helpful to have a basic understanding of Python, and experience working with geospatial data.

Participants who are accustomed to working with small images, downloading files and working locally, who are hitting bottlenecks as image sizes and inventories increase will really benefit from this workshop.

Please explore https://pangeo.io before the event, you can also try interactive examples from past workshops here https://gallery.pangeo.io

Hi, I'm Scott Henderson, I'm a research scientist working at the University of Washington eScience Institute and Department of Earth and Space Sciences in Seattle Washington!