Project Aria: A New Tool for Egocentric Multi-Modal AI Research
Abstract
Egocentric, multi-modal data as available on future augmented reality (AR) devices provides unique challenges and opportunities for machine perception. These future devices will need to be all-day wearable in a socially acceptable form-factor to support always available, context-aware and personalized AI applications. Our team at Meta Reality Labs Research built the Aria device, an egocentric, multi-modal data recording and streaming device with the goal to foster and accelerate research in this area. In this paper, we describe the Aria device hardware including its sensor configuration and the corresponding software tools that enable recording and processing of such data.
Community
Introduces Project Aria: a wearable device (glasses) for egocentric data collection and streaming (perception); existing foundation models are large transformers trained on generic (common) data, need something for personal/egocentric data collection (personalization). Tightly calibrated and time aligned sensor stack with hardware: proximity sensor (wearing or not), IMUs (with barometer - different L/R for uncorrelated noise), 480p monoscene cameras (stereo - right and left), 8 MP PoV (point of view) camera, 240p low-res eye-tracking (L/R) cameras, privacy shutter, microphone array (7 mics), capture button, WiFi, Bluetooth, GNSS/GPS, and battery with charging circuit. Time domains SMPTE LTC and WiFi TicSync protocols. Calibration is through manufacturer (offline, factory) and through provided machine perception services (MPS) hosted by Meta (backend). VRS data container format for recording and playback. MPS provides VIO (visual inertial odometry) and SLAM services (using all sensors on board for robustness - lighting, motion, dynamic scenes), giving open-loop and closed-loop 6 DoF trajectories; time-varying intrinsic and extrinsic calibrations (to account for deformations when wearing the glasses); scenes with semi-dense point clouds; eye-gaze tracking and annotations (where is wearer looking) in cyclopean eye frame. Use cases include life-long mapping and re-localization, egocentric scene reconstruction and mapping (including using NeRF methods), understanding object interaction and manipulation, activity and attention monitoring. From Meta.
Links: Webpage (tools, datasets, vrs), MPS docs, YouTube - CVPR 2023 Tutorial, GitHub
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper