Jan 14 - Designing Data Infrastructures for Multimodal Mobility Datasets
109 attendees from 47 groups hosting
Details
This technical workshop focuses on the data infrastructure required to build and maintain production-grade mobility datasets at fleet scale.
Date, Time and Location
Jan 14, 2026
9:00-10:00 AM Pacific
Online. Register for the Zoom!
We will examine how to structure storage, metadata, access patterns, and quality controls so that mobility teams can treat perception datasets as first-class, versioned “infrastructure” assets. The session will walk through how to design a mobility data stack that connects object storage, labeling systems, simulation environments, and experiment tracking into a coherent, auditable pipeline.
What you’ll learn:
- Model the mobility data plane: Define schemas for camera, LiDAR, radar, and HD, and represent temporal windows, ego poses, and scenario groupings in a way that is queryable and stable under schema evolution.
- Build a versioned dataset catalog with FiftyOne: Use FiftyOne customized workspaces and views to represent canonical datasets, and integrate with your raw data sources. All while preserving lineage between raw logs, the curated data, and simulation inputs.
- Implement governance and access control on mobility data: Configure role-based access and auditable pipelines to enforce data residency constraints while encouraging multi-team collaboration across research, perception, and safety functions.
- Operationalize curation and scenario mining workflows: Use FiftyOne’s embeddings and labeling capabilities to surface rare events such as adverse weather and sensor anomalies. Assign review tasks, and codify “critical scenario” definitions as reproducible dataset views.
- Close the loop with evaluation and feedback signals: Connect FiftyOne to training and evaluation pipelines so that model failures feed back into dataset updates
By the end of the workshop, attendees will have a concrete mental model and reference architecture for treating mobility datasets as a governed, queryable, and continuously evolving layer in their stack.
