Drag reduction or reward hacking? Recurrent multi-agent reinforcement learning that earns its reward

Giorgio Maria Cavallazzi; Miguel Pérez-Cuadrado; Alfredo Pinelli

doi:10.48550/arXiv.2606.06227

← Recent

AG-2026.06-153·physics.flu-dyn·cross-listed: cs.LG

Drag reduction or reward hacking? Recurrent multi-agent reinforcement learning that earns its reward

Authors

Giorgio Maria Cavallazzi
Miguel Pérez-Cuadrado
Alfredo Pinelli

Abstract

A reinforcement-learning agent maximises its reward, which can diverge from the outcome its designer intended. In physical control the reward rarely closes that gap, and drag reduction in wall turbulence makes it concrete. A mass-conservation projection couples agents' outputs and erases the per-agent credit the policy gradient needs; a memoryless policy cannot resolve the slow near-wall cycle it acts on; and a pressure-gradient reward pays for nominal drag reduction by pumping power through the wall. Two degenerate controllers achieve large drag reductions while total dissipation rises, so the reported figure can mask a more wasteful flow. We trace each fault to its cause and fix it: a differentiable projection that restores credit, a recurrent policy with a widened sensing stencil, and a reward scored on the true wall power. The corrected controller acts on the flow within a closed energy budget, earning a conservative $17\%$ under honest accounting.

Submitted

4 June 20261 week ago

Version

v1

License

CC-BY-4.0

DOI

10.48550/arXiv.2606.06227

Cite this preprint

BibTeX RIS

Imports into BibLaTeX, Zotero, Mendeley, EndNote.

PDF

Open PDF

Opens in a new tab · v1.

Chat with this PDF

Ask questions, probe assumptions, request a plain-English summary. Answers cite sections from the preprint itself.

Community

Questions and answers about this paper from other readers. No formal peer review — just a place to think out loud.