there are some misconceptions about what's actually happening in different decentralised training runs
RL Swarm isn't just distributed rollout generation, it's gossip-based learning where the communication itself is a training objective
the models learn to reason AND talk