Nome e qualifica del proponente del progetto: 
sb_p_2199299
Anno: 
2020
Abstract: 

This project proposes to investigate the use of Reinforcement Learning for the robust design of low-thrust trajectories in presence of severe state and control uncertainties, as in the case of micro-spacecraft interplanetary missions.
Recent development in on-board component miniaturization are opening the possibility to realize deep-space exploration missions with small or micro-spacecraft, able to greatly reduce design cost and time. Differently from standard spacecraft, micro-spacecraft are characterized by a reduced orbit control capability, larger uncertainties in state knowledge (limited radio links with ground stations on Earth) and in command execution (low reliability components), as well as low possibility of propellant margins and system redundancy, because of the limited size and cost budget. Therefore, the trajectory design for these kind of mission is mainly driven by its robustness to uncertainties.
Unlike traditional optimization methods, reinforcement learning provides a systematic framework to deal with stochastic optimal control problems, where the system dynamics, or environment, can be characterized by any kind of uncertainty and dynamical model. In reinforcement learning, a deep neural network is used to map the spacecraft states to the optimal control policy and the expected value function, that measures the actual trajectory performance on the basis of mission objectives and requirements. The network is trained by repeatedly interacting with a number of realizations of the environment and progressively refining the control policy in order to maximize the expected cumulative reward.
At the end of the training process, beside a reference robust trajectory, the network outputs an optimal state-feedback control law. For this reason, the trained network can be mounted on-board of the spacecraft and used to provide it with autonomous guidance and control capabilities during the actual orbital operations.

ERC: 
PE8_1
PE1_19
PE6_7
Componenti gruppo di ricerca: 
sb_cp_is_2786318
Innovatività: 

Although in BC the learning process is considerably faster with respect to RL methods since collected data are not correlated to each other (that is, they are independent and identically distributed random variables), the BC approach is characterized by a number of downsides, that make it unsuitable for robust trajectory design. Indeed, BC may not be effective when the G&CNet is asked to solve a problem that falls outside of the expert demonstrations it was trained on. As a consequence, in stochastic OCPs, performance drops (or even divergence) are expected when, because of uncertainty, the flight trajectory starts moving away from the training-set domain, typically populated by solutions coming from deterministic OCPs.
More recently, an attempt has been performed to feed the network with a training set encompassing optimal trajectories that consider the presence of a single random MTE [1], showing promising results. However, the possibility of having multiples MTEs over the course of a single mission, together with other types of state and control uncertainties, has not been addressed yet.

Conversely, RL has the clear advantage of not requiring the a priori generation of any optimal trajectory to populate the training set. Instead, new data are gathered on-policy, that is, by running the current best found control policy and collecting new state-control-reward tuples during real mission simulations. In this way, the agent is able to progressively improve, in an autonomous way, the performance and robustness of its control policy, in order to achieve the mission goals regardless of the uncertainties that may arise. This feature makes RL the ideal candidate to solve the problem at hand, that is, to design a low-thrust trajectory sufficiently robust to state and control uncertainties.

RL has a great advantage also over more traditional stochastic optimization methods, as stochastic DDP or belief-based optimal control. Indeed, it can be applied, without substantial changes and in a simple and straightforward way, to stochastic problems with arbitrary uncertainty distributions and any dynamical model, even provided in the form of black-box functions, since an explicit mathematical formulation is not required to derive an optimal robust trajectory. This feature makes RL the only feasible way to train a program to achieve high performance in many complex or stochastic domains [2].

At present, almost all the research papers encompassing RL for spacecraft trajectory design deal exclusively with deterministic mission scenarios. The present research project has, as main goal, to study the possible extension of RL applicability also to stochastic problems, that are now of great interest for the aerospace community.

References:
[1] Ari Rubinsztejn, Rohan Sood, and Frank E Laipert. "Neural network based optimal control: Resilience to missed thrust events for long duration transfers". In Astrodynamics Specialist Conference, Portland, Maine, 2019.

[2] Stuart J Russell and Peter Norvig. "Articial intelligence: a modern approach". Malaysia; Pearson Education Limited, 2016.

Codice Bando: 
2199299

© Università degli Studi di Roma "La Sapienza" - Piazzale Aldo Moro 5, 00185 Roma