One for All and All for One: Distributed Learning of Fair Allocations With Multi-Player Bandits
Consider N cooperative but non-communicating players where each plays one out of M arms for T turns. Players have different utilities for each arm, represented as an N×M matrix. These utilities are unknown to the players. In each turn, players select an arm and receive a noisy observation of their utility for it. However, if any other players selected the same arm in that turn, all colliding players will receive zero utility due to the conflict. No communication between the players is possible.