SpaceMind: A Modular and Self-Evolving Embodied Vision-Language Agent Framework for Autonomous On-orbit Servicing

Aodi Wu1,2, Haodong Han1,2, Xubo Luo1,2, Ruisuo Wang2, Shan He2, Xue Wan2
1 University of Chinese Academy of Sciences    2 Technology and Engineering Center for Space Utilization, CAS
SpaceMind Overview
SpaceMind operates as a VLM-based decision-control hub that perceives through visual sensors, reasons about the current situation, and issues motion and sensor-control commands. It supports three task types across both high-fidelity UE5 simulation (5 satellites, 174 runs) and a physical laboratory (2 satellites, 18 runs).

Abstract

Autonomous on-orbit servicing demands embodied agents that perceive through visual sensors, reason about 3D spatial situations, and execute multi-phase tasks over extended horizons. We present SpaceMind, a modular and self-evolving vision-language model (VLM) agent framework that decomposes knowledge, tools, and reasoning into three independently extensible dimensions: skill modules with dynamic routing, Model Context Protocol (MCP) tools with configurable profiles, and injectable reasoning-mode skills. An MCP-Redis interface layer enables the same codebase to operate across simulation and physical hardware without modification, and a Skill Self-Evolution mechanism distills operational experience into persistent skill files without model fine-tuning.

We validate SpaceMind through 192 closed-loop runs across five satellites, three task types, and two environments—a UE5 simulation and a physical laboratory—deliberately including degraded conditions to stress-test robustness. Under nominal conditions all modes achieve 90–100% navigation success; under degradation, the Prospective mode uniquely succeeds in search-and-approach tasks where other modes fail. A self-evolution study shows that the agent recovers from failure in four of six groups from a single failed episode, including complete failure to 100% success and inspection scores improving from 12 to 59 out of 100. Real-world validation confirms zero-code-modification transfer to a physical robot with 100% rendezvous success.

Key Results

192
Closed-loop runs across 5 satellites, 3 task types, and 2 environments
90–100%
Navigation success under nominal conditions across all reasoning modes
4/6
Groups recover from failure through self-evolution after a single episode

Architecture

SpaceMind Architecture
SpaceMind architecture. The VLM decision core receives visual observations and context from the skill layer, reasons through one of three modes (Standard, ReAct, Prospective), and issues tool calls via MCP. The Redis message bus decouples the agent from environment-specific backends, enabling the same codebase to operate across UE5 simulation and physical hardware.

Video

Coming soon

A demo video showcasing SpaceMind operating in both UE5 simulation and the physical laboratory will be available here.

BibTeX

@article{wu2026spacemind, title={SpaceMind: A Modular and Self-Evolving Embodied Vision-Language Agent Framework for Autonomous On-orbit Servicing}, author={Wu, Aodi and Han, Haodong and Luo, Xubo and Wang, Ruisuo and He, Shan and Wan, Xue}, journal={Acta Astronautica}, year={2026}, note={Under review} }