RPS — Random peer sampling#
In literature, Random Peer Sampling (RPS) refers to the problem of reliably [1] drawing random samples from an unstructured p2p network.
Doing so in a reliable manner is not only hard because of inherent problems but also because of possible malicious peers that could try to bias the selection.
It is useful for all kind of gossip protocols that require the selection of random peers in the whole network like gathering statistics, spreading and aggregating information in the network, load balancing and overlay topology management.
The approach chosen in the RPS service implementation in GNUnet follows the Brahms design.
The current state is "work in progress". There are a lot of things that need to be done, primarily finishing the experimental evaluation and a re-design of the API.
The abstract idea is to subscribe to connect to/start the RPS service and request random peers that will be returned when they represent a random selection from the whole network with high probability.
An additional feature to the original Brahms-design is the selection of sub-groups: The GNUnet implementation of RPS enables clients to ask for random peers from a group that is defined by a common shared secret. (The secret could of course also be public, depending on the use-case.)
Another addition to the original protocol was made: The sampler mechanism that was introduced in Brahms was slightly adapted and used to actually sample the peers and returned to the client. This is necessary as the original design only keeps peers connected to random other peers in the network. In order to return random peers to client requests independently random, they cannot be drawn from the connected peers. The adapted sampler makes sure that each request for random peers is independent from the others.
Brahms#
The high-level concept of Brahms is two-fold: Combining push-pull gossip with locally fixing a assumed bias using cryptographic min-wise permutations. The central data structure is the view - a peer’s current local sample. This view is used to select peers to push to and pull from. This simple mechanism can be biased easily. For this reason Brahms ‘fixes’ the bias by using the so-called sampler. A data structure that takes a list of elements as input and outputs a random one of them independently of the frequency in the input set. Both an element that was put into the sampler a single time and an element that was put into it a million times have the same probability of being the output. This is achieved with exploiting min-wise independent permutations. In the RPS service we use HMACs: On the initialisation of a sampler element, a key is chosen at random. On each input the HMAC with the random key is computed. The sampler element keeps the element with the minimal HMAC.
In order to fix the bias in the view, a fraction of the elements in the view are sampled through the sampler from the random stream of peer IDs.
According to the theoretical analysis of Bortnikov et al. this suffices to keep the network connected and having random peers in the view.
