WGPS Confidentiality

  1. Setting and Goals
    1. Techniques Overview
      1. All the Details
        1. Security Model
          1. Threat Model
            1. Scope
              1. Goals
                1. Muriarty
                  1. Epson
                2. References

                  In order to synchronise data, peers must inform each other about which data they are interested in. If done openly, this would let peers learn about details such as NamespaceIds, SubspaceIds, or Paths that they have no business knowing about. In this document, we describe a technique that does not leak this information, even in realistic adversarial settings.

                  Setting and Goals

                  We consider the setting where two peers wish to synchronise some data that is subject to read access control via capabilities. More precisely, they want to specify pairs of namespaces and AreasOfInterest, and then synchronise the intersections.alj: private interests?

                  The simplemost solution consists in the peers openly exchanging ??read_capability?? and then specifying their AreasOfInterest, which must be fully included in the ??granted_area?? of the ??read_capability??. This works well for managing read access control and determining which Entries to synchronise, but it leaks some potentially sensitive information. Two examples:

                  First, suppose that Alfie creates an Entry at Path gemma_stinks, and gives a ??read_capability?? for this Path to Betty. Later, Betty connects to Gemma's machine for syncing, and asks for gemma_stinks in Alfie’s subspace. In sending her ??read_capability??, she hands a signed proof to Gemma that Alfie thinks1,21Gemma does not, in fact, stink.2Also, Alfie is really very nice and would never say such a thing outside of thought experiments for demonstrating the dangers of leaking Paths. she stinks. Not good.

                  Second, suppose a scenario where everyone ??e2e_paths??, with individual encryption keys per subspace. Alfie synchronises with Betty, asking her for random-looking Paths of the same structure in ten different subspaces. Betty has the decryption keys for all but one of the subspaces. All the paths she can decrypt happen to decrypt to gemma_stinks. This gives Betty a strong idea about what the tenth person thinks of Gemma, despite the fact that Betty cannot decrypt the Path. Not good.

                  Ideally, we would like to employ a mechanism where peers cannot learn any information beyond the ??granted_area?? of the ??read_capability?? which they hold at the start of the process. Unfortunately, such a mechanism would have to involve privacy-preserving verification of cryptographic signatures, and any suitable primitives are exceedingly complicated.

                  Instead, we design solutions which do not allow peers to learn about the existence of any NamespaceId, SubspaceId, or Path which they did not know about already. If, for example, both peers knew about a certain namespace, they should both get to know that the other peer also knows about that namespace. But for a namespace which only one of the peers knows about, the other peer should not learn its NamespaceId.

                  Such solutions cannot prevent peers from confirming guesses about data they shouldn't know about. Hence, it is important that NamespaceIds and SubspaceIds are sufficiently long and random-looking. Similarly, encrypting Components with different encryption keys for different subspaces can ensure that Paths are unguessable. Because valid TimestampsFinding efficient encryption schemes and privacy-preserving synchronisation techniques that work for Timestamps is an interesting research endeavour, but out of scope for us. can easily be guessed, we do not try to hide information about them.

                  Techniques Overview

                  At a high level, we employ three mechanism for preserving Entry confidentiality:

                  The first bullet point should be straightforward: no requested data is explicitly handed over, unless the sync partner demonstrates that it was granted read access. We describe mechanisms for enforcing read access control here.

                  The second bullet point requires a bit more elaboration; it serves to defend against active eavesdroppers. At the start of the sync session, the two peers perform a handshake in which they negotiate how to encrypt their communication. A typical choice would be a Diffie–Hellman key exchange, which results in a shared secret, which can be used as the shared key for symmetric encryption of the communication session. Crucially, as part of the handshake, the peers prove to each other knowledge of the secret key corresponding to some public key they transmit. We require those to be public keys and secret keys for the signature scheme that denotes the ??access_receiver?? of ??read_capability??33All ??read_capability?? that a single peer presents in a sync session must have the same ??access_receiver??. This is not a restriction in practice when capabilities can be delegated. Peers can even create ephemeral keypairs per sync session and create valid capabilities by delegation which they discard after the session.. The peers only accept ??read_capability?? whose ??access_receiver?? is the public key for which the other other peer proved it has the corresponding secret key.

                  An active eavesdropper faces a dilemma: if they do not manipulate the handshake, they cannot derive the decryption secret and cannot listen in on the sync session. If they do manipulate the handshake by replacing the exchanged public keys with ones for which they have the corresponding secret keys, then they will later have to produce ??read_capability?? for those public keys. Since a good capability system makes forgery impossible, they will not be able to do so. The peers, then, will not transmit any sensitive data.

                  Telling a peer directly about which Areas in which namespaces you are interested in leaks NamespaceIds, SubspaceIds, and Paths. Instead, the peers merely transmit secure hashes of certain combinations of these. Intuitively, if both peers send the same hash, then they both know that they are interested in the same things. This is easily attackable however: one peer can simply mirror back hashes sent by the other, tricking them into beleaving that they have shared knowledge. For this reason, each peer is assigned a random bitstring to use as a salt for the hash function. A peer transmitsalj: This would be nice to have an illustration for. hashes salted with its own salt, but compares the hashes it receives against hashes that it computes locally with the other peer’s salt.

                  Before we go into the details of which data precisely to shash, we want to point out that peers must use references to the common hashes instead of mentioning the underlying NamespaceIds, SubspaceIds, and Paths. In particular, when transmitting ??read_capability??, peers must encode them in a special format that omits the confidential data.

                  All the Details

                  We now switch from the preceding informational style to a more precise specification of how to sync Willow data with untrusted peers while keeping most metadata confidential.


                  alj: TODO rewrite given its new positionData synchronisation via the WGPS faces a dilemma: on the one hand, we want to opportunistically synchronise data with as many peers as we can, on the other hand, we want to preserve confidentiality of all data that is guarded by access control. This document presents in detail how we balance those needs.

                  Data confidentiality goes much further than withholding Payloads. We need to protect NamespaceIds, SubspaceIds, and Paths. To make things more difficult, we want to keep these confidential even when an active eavesdropper listens in on a sync connection. We want to allow for synchronisation with anonymous peers, but we cannot protect against active eavesdropping in those sessions. Hence, we need to be careful which information we disclose even to a peer who has read access to some certain data.

                  NamespaceIds, SubspaceIds, and Paths do not only occur as part of authenticated Entries, but they also inform which data two peers want to sync in the first place. While two peers discover their common interests, we do not want to leak any of these either. To simplify our presentation, we introduce some definitions around these concepts, starting with the notion of a PrivateInterest.

                  Confidential data that relates to determining the AreasOfInterest that peers might be interested in synchronising.
                   
                  any denotes interest in all subspaces of the namespace.
                  }

                  Let p1 and p2 be PrivateInterests.

                  We say p1 is more specific than p2 if

                  We say that p1 is strictly more specific than p2 if p1 is more specific than p2 and they are not equal.

                  We say that p1 is less specific than p2 if p2 is more specific than p1.

                  We say that p1 and p2 are comparable if p1 is more specific than p2or p2 is more specific than p1.

                  We say that p1 includes an Entry e if

                  We say that p1 and p2 are disjoint there can be no Entry which is included in both p1 and p2.

                  We say that p1 and p2 are awkward if they are neither comparable nor disjoint. This is the case if and only if one of them has subspace_id any and a path p, and the other has a non-any subspace_id and a path which is a strict prefix of p.

                  We say that p1 includes an Area a if

                  Security Model

                  We can now lay out out the security model of the WGPS: which data does the WGPS expose in which scenarios? We do not have formal proofs for any of these claims, these are merely our design goals (which we believe to have achieved).

                  Throughout the following, Alfie and Betty are honest peers, Muriarty is a malicious peer who may deviate arbitrarily from the WGPS, and Epson is an active eavesdropper on the networking layer who can read, modify, drop, or insert arbitrary bytes on a WGPS communication channel.

                  Threat Model

                  We consider two primary scenarios:

                  • Alfie and Muriarty sync, and Muriarty tries to glean as much information from/about Alfie as possible.
                  • Alfie and Betty sync without knowing any long-term secrets of each other, and Epson attacks the session and tries to glean as much information from/about Alfie and Betty.

                  Note that Epson can simulate a Muriarty, or they could even be the same person. Epson is only interesting for cases where Alfie syncs with somebody who has more knowledge than Epson, so we do not consider the cases where Muriarty and Epson collaborate.

                  If Alfie and Betty know longterm public keys, they can exclude an active attacker during the handshake. If only one of them knows a longterm secret of the other, Epson is less powerful than in the setting where neither knows a longterm secret of the other; hence we only analyze the prior scenario.

                  Scope

                  We now list the information we wish to keep confidential. We group it in four levels, based on which kind of peer or attacker is allowed to glean which information.alj: Worst table styling ever, please send help. Might need multiple rows instead of nested lists?

                  L0
                  L1
                  L2
                  L3
                  • Fingerprintable session behaviour, such as resource control, error handling, etc. Virtually impossible to list exhaustively.

                  We consider fingerprinting, tracking, and deanonymisation based on information from L3 to be out of scope of this document.

                  Goals

                  We now describe which kind of information can be learned by which kind of attacker. A rough summary:

                  Consequently, NamespaceIds, SubspaceIds, and Paths that cannot be guessed are never leaked by using the WGPS. Conversely, attackers can confirm their guesses about PrivateInterests to some degree. Hence, it is important to keep NamespaceIds, SubspaceIds, and Paths unguessable by introducing sufficient entropy.

                  Syncing with Muriarty

                  When Alfie syncs with a malicious peer Muriarty, Muriarty is able to glean the following information:

                  Syncing Attacked By Epson

                  If Alfie syncs with an honest peer Betty, but an active attacker Epson can read and manipulate the communication channel, that attacker can glean the following information:

                  TODO
                   
                  TODO
                   
                  TODO
                  }
                  A capability that certifies read access to arbitrary SubspaceIds at some unspecified Path.
                   
                  The namespace for which this grants access.
                   
                  The user to whom this initially grants access (the starting point for any further delegations).
                   
                  Authorisation of the user_key by the namespace_key.
                   
                  Successive authorisations of new UserPublicKey.
                  }

                  References