there are some misconceptions about what's actually happening in different decentralised training runs

RL Swarm isn't just distributed rollout generation, it's gossip-based learning where the communication itself is a training objective

the models learn to reason AND talk