Abstract: This article investigates the optimal distributed formation control for heterogeneous air–ground vehicle systems using a data-efficient, off-policy reinforcement learning algorithm.