REACT investigates the application of Deep Reinforcement Learning for satellite attitude control, addressing critical challenges in space systems autonomy. This research developed robust control policies that handle both nominal operations and critical underactuated scenarios — where one or more reaction wheels have failed.
Conducted during my MSc thesis and continued at Argotec, this foundational work directly contributed to the autonomous navigation algorithms later deployed on embedded flight hardware aboard LICIACube (NASA DART mission) and ArgoMoon (NASA Artemis I mission). The ability to handle actuator uncertainty in simulation proved critical for building confidence in real flight systems.
Adapted the PPO (Proximal Policy Optimization) algorithm for the unique challenges of satellite attitude control, including continuous action spaces and sparse reward signals in 3D rotational dynamics.
Created adaptive policies capable of automatically detecting and compensating for reaction wheel failures — enabling attitude control with only 2 of 3 wheels operational, without explicit failure mode programming.
Extensive evaluations in high-fidelity simulation environments and a Hardware-in-the-Loop (HIL) setup with a space-graded on-board computer, validating real-time feasibility.
This research underpinned the AI-based navigation stack deployed on LICIACube and ArgoMoon, two cubesats that flew on NASA's DART and Artemis I missions respectively, marking some of the first deep RL-informed navigation systems flown in deep space.
Matteo El Hariry, Andrea Cini, Giacomo Mellone, Alessandro Balossino
NeurIPS 2021 Workshop on Deployable Decision Making in Embodied Systems