Confidentiality in the Face of Pervasive Surveillance: A Threat Model and Problem Statement

Internet Architecture Board (IAB) R. Barnes Request for Comments: 7624 B. Schneier Category: Informational C. Jennings ISSN: 2070-1721 T. Hardie B. Trammell C. Huitema D. Borkmann August 2015 Confidentiality in the Face of Pervasive Surveillance: A Threat Model and Problem Statement Abstract Since the initial revelations of pervasive surveillance in 2013, several classes of attacks on Internet communications have been discovered. In this document, we develop a threat model that describes these attacks on Internet confidentiality. We assume an attacker that is interested in undetected, indiscriminate eavesdropping. The threat model is based on published, verified attacks. Status of This Memo This document is not an Internet Standards Track specification; it is published for informational purposes. This document is a product of the Internet Architecture Board (IAB) and represents information that the IAB has deemed valuable to provide for permanent record. It represents the consensus of the Internet Architecture Board (IAB). Documents approved for publication by the IAB are not a candidate for any level of Internet Standard; see Section 2 of RFC 5741. Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc7624. Barnes, et al. Informational [Page 1]

RFC 7624 Confidentiality Threat Model August 2015 BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 3. An Idealized Passive Pervasive Attacker . . . . . . . . . . . 5 3.1. Information Subject to Direct Observation . . . . . . . . 6 3.2. Information Useful for Inference . . . . . . . . . . . . 6 3.3. An Illustration of an Ideal Passive Pervasive Attack . . 7 3.3.1. Analysis of IP Headers . . . . . . . . . . . . . . . 7 3.3.2. Correlation of IP Addresses to User Identities . . . 8 3.3.3. Monitoring Messaging Clients for IP Address Correlation . . . . . . . . . . . . . . . . . . . . . 9 3.3.4. Retrieving IP Addresses from Mail Headers . . . . . . 9 3.3.5. Tracking Address Usage with Web Cookies . . . . . . . 10 3.3.6. Graph-Based Approaches to Address Correlation . . . . 10 3.3.7. Tracking of Link-Layer Identifiers . . . . . . . . . 10 4. Reported Instances of Large-Scale Attacks . . . . . . . . . . 11 5. Threat Model . . . . . . . . . . . . . . . . . . . . . . . . 13 5.1. Attacker Capabilities . . . . . . . . . . . . . . . . . . 14 5.2. Attacker Costs . . . . . . . . . . . . . . . . . . . . . 17 6. Security Considerations . . . . . . . . . . . . . . . . . . . 19 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 20 7.1. Normative References . . . . . . . . . . . . . . . . . . 20 7.2. Informative References . . . . . . . . . . . . . . . . . 20 IAB Members at the Time of Approval . . . . . . . . . . . . . . . 23 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 24 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 24 Barnes, et al. Informational [Page 2]

RFC 7624 Confidentiality Threat Model August 2015 1 . Introduction RFC7258]. The goal of this document is to describe more precisely the threats posed by these pervasive attacks, and based on those threats, lay out the problems that need to be solved in order to secure the Internet in the face of those threats. The remainder of this document is structured as follows. In Section 3, we describe an idealized passive pervasive attacker, one which could completely undetectably compromise communications at Internet scale. In Section 4, we provide a brief summary of some attacks that have been disclosed, and use these to expand the assumed capabilities of our idealized attacker. Note that we do not attempt to describe all possible attacks, but focus on those that result in undetected eavesdropping. Section 5 describes a threat model based on these attacks, focusing on classes of attack that have not been a focus of Internet engineering to date. 2 . Terminology RFC4949] and [RFC6973]. Terms used from [RFC6973] include Eavesdropper, Observer, Initiator, Intermediary, Recipient, Attack (in a privacy context), Correlation, Fingerprint, Traffic Analysis, and Identifiability (and related terms). In addition, we use a few terms that are specific to the attacks discussed in this document. Note especially that "passive" and "active" below do not refer to the effort used to mount the attack; a "passive attack" is any attack that accesses a flow but does not modify it, while an "active attack" is any attack that modifies a flow. Some passive attacks involve active interception and modifications of devices, rather than simple access to the medium. The introduced terms are: Barnes, et al. Informational [Page 3]

RFC 7624 Confidentiality Threat Model August 2015 RFC7258]. Passive Pervasive Attack: An eavesdropping attack undertaken by a pervasive attacker, in which the packets in a traffic stream between two endpoints are intercepted, but in which the attacker does not modify the packets in the traffic stream between two endpoints, modify the treatment of packets in the traffic stream (e.g., delay, routing), or add or remove packets in the traffic stream. Passive pervasive attacks are undetectable from the endpoints. Equivalent to passive wiretapping as defined in [RFC4949]; we use an alternate term here since the methods employed are wider than those implied by the word "wiretapping", including the active compromise of intermediate systems. Active Pervasive Attack: An attack that is undertaken by a pervasive attacker and, in addition to the elements of a passive pervasive attack, also includes modification, addition, or removal of packets in a traffic stream, or modification of treatment of packets in the traffic stream. Active pervasive attacks provide more capabilities to the attacker at the risk of possible detection at the endpoints. Equivalent to active wiretapping as defined in [RFC4949]. Observation: Information collected directly from communications by an eavesdropper or observer. For example, the knowledge that <alice@example.com> sent a message to <bob@example.com> via SMTP taken from the headers of an observed SMTP message would be an observation. Inference: Information derived from analysis of information collected directly from communications by an eavesdropper or observer. For example, the knowledge that a given web page was accessed by a given IP address, by comparing the size in octets of measured network flow records to fingerprints derived from known sizes of linked resources on the web servers involved, would be an inference. Collaborator: An entity that is a legitimate participant in a communication, and provides information about that communication to an attacker. Collaborators may either deliberately or unwittingly cooperate with the attacker, in the latter case because the attacker has subverted the collaborator through technical, social, or other means. Barnes, et al. Informational [Page 4]

RFC 7624 Confidentiality Threat Model August 2015 3 . An Idealized Passive Pervasive Attacker Section 4, it does set a lower bound on the capabilities of an attacker interested in indiscriminate passive surveillance while interested in remaining undetectable. We note that, prior to the Snowden revelations in 2013, the assumptions of attacker capability presented here would be considered on the border of paranoia outside the network security community. Our idealized attacker is an indiscriminate eavesdropper that is on an Internet-attached computer network and: o can observe every packet of all communications at any hop in any network path between an initiator and a recipient; o can observe data at rest in any intermediate system between the endpoints controlled by the initiator and recipient; and o can share information with other such attackers; but o takes no other action with respect to these communications (i.e., blocking, modification, injection, etc.). The techniques available to our ideal attacker are direct observation and inference. Direct observation involves taking information directly from eavesdropped communications, such as URLs identifying content or email addresses identifying individuals from application- layer headers. Inference, on the other hand, involves analyzing observed information to derive new information, such as searching for application or behavioral fingerprints in observed traffic to derive information about the observed individual. The use of encryption is generally sufficient to provide confidentiality by preventing direct observation of content, assuming of course, uncompromised encryption implementations and cryptographic keying material. However, encryption provides less complete protection against inference, Barnes, et al. Informational [Page 5]

RFC 7624 Confidentiality Threat Model August 2015 RFC5246]. 3.1 . Information Subject to Direct Observation RFC3365], most such protocols have a secure variant that encrypts the payload for confidentiality, and these secure variants are seeing ever-wider deployment. A noteworthy exception is DNS [RFC1035], as DNSSEC [RFC4033] does not have confidentiality as a requirement. This implies that, in the absence of changes to the protocol as presently under development in the IETF's DNS Private Exchange (DPRIVE) working group [DPRIVE], all DNS queries and answers generated by the activities of any protocol are available to the attacker. When store-and-forward protocols are used (e.g., SMTP [RFC5321]), intermediaries leave this data subject to observation by an attacker that has compromised these intermediaries, unless the data is encrypted end-to-end by the application-layer protocol or the implementation uses an encrypted store for this data. 3.2 . Information Useful for Inference RFC4303] further encrypts the transport-layer headers but still leaves IP address information unencrypted; in tunnel mode, these addresses correspond to the tunnel endpoints. Features of the security protocols themselves, e.g., the TLS session identifier, may leak information that can be used for Barnes, et al. Informational [Page 6]

RFC 7624 Confidentiality Threat Model August 2015 3.3 . An Illustration of an Ideal Passive Pervasive Attack 3.3.1 . Analysis of IP Headers Barnes, et al. Informational [Page 7]

RFC 7624 Confidentiality Threat Model August 2015 RFC7011] allow administrators to acquire statistics about sequences of packets with some common properties that pass through a network device. The most common set of properties used in flow measurement is the "five- tuple" of source and destination addresses, protocol type, and source and destination ports. These statistics are commonly used for network engineering but could certainly be used for other purposes. Let's assume for a moment that IP addresses can be correlated to specific services or specific users. Analysis of the sequences of packets will quickly reveal which users use what services, and also which users engage in peer-to-peer connections with other users. Analysis of traffic variations over time can be used to detect increased activity by particular users or, in the case of peer-to- peer connections, increased activity within groups of users. 3.3.2 . Correlation of IP Addresses to User Identities Barnes, et al. Informational [Page 8]

RFC 7624 Confidentiality Threat Model August 2015 3.3.3 . Monitoring Messaging Clients for IP Address Correlation RFC1939] and IMAP [RFC3501] are used to retrieve mail from mail servers, while a variant of SMTP is used to submit messages through mail servers. IMAP connections originate from the client, and typically start with an authentication exchange in which the client proves its identity by answering a password challenge. The same holds for the SIP protocol [RFC3261] and many instant messaging services operating over the Internet using proprietary protocols. The username is directly observable if any of these protocols operate in cleartext; the username can then be directly associated with the source address. 3.3.4 . Retrieving IP Addresses from Mail Headers RFC5321] requires that each successive SMTP relay adds a "Received" header to the mail headers. The purpose of these headers is to enable audit of mail transmission, and perhaps to distinguish between regular mail and spam. Here is an extract from the headers of a message recently received from the perpass mailing list: Received: from 192-000-002-044.zone13.example.org (HELO ?192.168.1.100?) (xxx.xxx.xxx.xxx) by lvps192-000-002-219.example.net with ESMTPSA (DHE-RSA-AES256-SHA encrypted, authenticated); 27 Oct 2013 21:47:14 +0100 Message-ID: <526D7BD2.7070908@example.org> Date: Sun, 27 Oct 2013 20:47:14 +0000 From: Some One <some.one@example.org> This is the first "Received" header attached to the message by the first SMTP relay; for privacy reasons, the field values have been anonymized. We learn here that the message was submitted by "Some One" on October 27, from a host behind a NAT (192.168.1.100) [RFC1918] that used the IP address 192.0.2.44. The information remained in the message and is accessible by all recipients of the perpass mailing list, or indeed by any attacker that sees at least one copy of the message. An attacker that can observe sufficient email traffic can regularly update the mapping between public IP addresses and individual email identities. Even if the SMTP traffic was encrypted on submission and relaying, the attacker can still receive a copy of public mailing lists like perpass. Barnes, et al. Informational [Page 9]

RFC 7624 Confidentiality Threat Model August 2015 3.3.5 . Tracking Address Usage with Web Cookies 3.3.6 . Graph-Based Approaches to Address Correlation 3.3.7 . Tracking of Link-Layer Identifiers Barnes, et al. Informational [Page 10]

RFC 7624 Confidentiality Threat Model August 2015 4 . Reported Instances of Large-Scale Attacks pass1], [pass2], [pass3], and [pass4]: o NSA's XKEYSCORE system accesses data from multiple access points and searches for "selectors" such as email addresses, at the scale of tens of terabytes of data per day. o GCHQ's Tempora system appears to have access to around 1,500 major cables passing through the UK. Barnes, et al. Informational [Page 11]

RFC 7624 Confidentiality Threat Model August 2015 dec1] [dec2] [dec3]. For example, the NSA BULLRUN project worked to undermine encryption through multiple approaches, including covert modifications to cryptographic software on end systems. Reported capabilities include the direct compromise of intermediate systems and arrangements with service providers for bulk data and metadata access [dir1] [dir2] [dir3], bypassing the need to capture traffic on the wire. For example, the NSA PRISM program provides the agency with access to many types of user data (e.g., email, chat, VoIP). The reported capabilities also include elements of active pervasive attack, including: o Insertion of devices as a man-in-the-middle of Internet transactions [TOR1] [TOR2]. For example, NSA's QUANTUM system appears to use several different techniques to hijack HTTP connections, ranging from DNS response injection to HTTP 302 redirects. o Use of implants on end systems to undermine security and anonymity features [dec2] [TOR1] [TOR2]. For example, QUANTUM is used to direct users to a FOXACID server, which in turn delivers an implant to compromise browsers of Tor users. o Use of implants on network elements from many major equipment providers, including Cisco, Juniper, Huawei, Dell, and HP, as provided by the NSA's Advanced Network Technology group [spiegel1]. o Use of botnet-scale collections of compromised hosts [spiegel2]. The scale of the compromise extends beyond the network to include subversion of the technical standards process itself. For example, there is suspicion that NSA modifications to the DUAL_EC_DRBG random number generator (RNG) were made to ensure that keys generated using that generator could be predicted by NSA. This RNG was made part of Barnes, et al. Informational [Page 12]

RFC 7624 Confidentiality Threat Model August 2015 RFC7258] to collectively describe these operations. The term "pervasive" is used because the attacks are designed to indiscriminately gather as much data as possible and to apply selective analysis on targets after the fact. This means that all, or nearly all, Internet communications are targets for these attacks. To achieve this scale, the attacks are physically pervasive; they affect a large number of Internet communications. They are pervasive in content, consuming and exploiting any information revealed by the protocol. And they are pervasive in technology, exploiting many different vulnerabilities in many different protocols. Again, it's important to note that, although the attacks mentioned above were executed by the NSA and GCHQ, there are many other organizations that can mount pervasive surveillance attacks. Because of the resources required to achieve pervasive scale, these attacks are most commonly undertaken by nation-state actors. For example, the Chinese Internet filtering system known as the "Great Firewall of China" uses several techniques that are similar to the QUANTUM program and that have a high degree of pervasiveness with regard to the Internet in China. Therefore, legal restrictions in any one jurisdiction on pervasive monitoring activities cannot eliminate the risk of pervasive attack to the Internet as a whole. 5 . Threat Model Barnes, et al. Informational [Page 13]

RFC 7624 Confidentiality Threat Model August 2015 5.1 . Attacker Capabilities Barnes, et al. Informational [Page 14]

RFC 7624 Confidentiality Threat Model August 2015 Barnes, et al. Informational [Page 15]

RFC 7624 Confidentiality Threat Model August 2015 Barnes, et al. Informational [Page 16]

RFC 7624 Confidentiality Threat Model August 2015 5.2 . Attacker Costs Barnes, et al. Informational [Page 17]

RFC 7624 Confidentiality Threat Model August 2015 RFC6962]. In terms of raw implementation complexity, passive pervasive attacks require only enough processing to extract information from the network and store it. Active pervasive attacks, by contrast, often depend on winning race conditions to inject packets into active connections. So, active pervasive attacks in the core of the network require processing hardware that can operate at line speed (roughly 100 Gbps to 1 Tbps in the core) to identify opportunities for attack and insert attack traffic in high-volume traffic. Key exfiltration attacks rely on passive pervasive attack for access to encrypted data, with the collaborator providing keys to decrypt the data. So, the attacker undertakes the cost and risk of a passive pervasive attack, as well as additional risk of discovery via the interactions that the attacker has with the collaborator. Some active attacks are more expensive than others. For example, active man-in-the-middle (MITM) attacks require access to one or more points on a communication's network path that allow visibility of the entire session and the ability to modify or drop legitimate packets in favor of the attacker's packets. A similar but weaker form of attack, called an active man-on-the-side (MOTS), requires access to only part of the session. In an active MOTS attack, the attacker need only be able to inject or modify traffic on the network element the attacker has access to. While this may not allow for full control of a communication session (as in an MITM attack), the attacker can perform a number of powerful attacks, including but not limited to: injecting packets that could terminate the session (e.g., TCP RST packets), sending a fake DNS reply to redirect ensuing TCP connections to an address of the attacker's choice (i.e., winning a "DNS response race"), and mounting an HTTP redirect attack by observing a TCP/HTTP connection to a target address and injecting a TCP data packet containing an HTTP redirect. For example, the system dubbed by researchers as China's "Great Cannon" [great-cannon] can operate in full MITM mode to accomplish very complex attacks that can modify content in transit, while the well-known Great Firewall of China is a MOTS system that focuses on blocking access to certain kinds of traffic and destinations via TCP RST packet injection. In this sense, static exfiltration has a lower risk profile than dynamic. In the static case, the attacker need only interact with the collaborator a small number of times, possibly only once -- say, Barnes, et al. Informational [Page 18]

RFC 7624 Confidentiality Threat Model August 2015 6 . Security Considerations Barnes, et al. Informational [Page 19]

RFC 7624 Confidentiality Threat Model August 2015 7 . References 7.1 . Normative References RFC6973] Cooper, A., Tschofenig, H., Aboba, B., Peterson, J., Morris, J., Hansen, M., and R. Smith, "Privacy Considerations for Internet Protocols", RFC 6973, DOI 10.17487/RFC6973, July 2013, <http://www.rfc-editor.org/info/rfc6973>. 7.2 . Informative References dec1] Perlroth, N., Larson, J., and S. Shane, "N.S.A. Able to Foil Basic Safeguards of Privacy on Web", The New York Times, September 2013, <http://www.nytimes.com/2013/09/06/us/ nsa-foils-much-internet-encryption.html>. [dec2] The Guardian, "Project Bullrun -- classification guide to the NSA's decryption program", September 2013, <http://www.theguardian.com/world/interactive/2013/sep/05/ nsa-project-bullrun-classification-guide>. [dec3] Ball, J., Borger, J., and G. Greenwald, "Revealed: how US and UK spy agencies defeat internet privacy and security", The Guardian, September 2013, <http://www.theguardian.com/world/2013/sep/05/ nsa-gchq-encryption-codes-security>. [dir1] Greenwald, G., "NSA collecting phone records of millions of Verizon customers daily", The Guardian, June 2013, <http://www.theguardian.com/world/2013/jun/06/ nsa-phone-records-verizon-court-order>. [dir2] Greenwald, G. and E. MacAskill, "NSA Prism program taps in to user data of Apple, Google and others", The Guardian, June 2013, <http://www.theguardian.com/world/2013/jun/06/ us-tech-giants-nsa-data>. [dir3] The Guardian, "Sigint -- how the NSA collaborates with technology companies", September 2013, <http://www.theguardian.com/world/interactive/2013/sep/05/ sigint-nsa-collaborates-technology-companies>. [DPRIVE] Bortzmeyer, S., "DNS privacy considerations", Work in Progress, draft-ietf-dprive-problem-statement-06, June 2015. Barnes, et al. Informational [Page 20]

RFC 7624 Confidentiality Threat Model August 2015 spiegel1] Appelbaum, J., Horchert, J., Reissmann, O., Rosenbach, M., Schindler, J., and C. Stocker, "NSA's Secret Toolbox: Unit Offers Spy Gadgets for Every Need", Spiegel Online, December 2013, <http://www.spiegel.de/international/world/ nsa-secret-toolbox-ant-unit-offers-spy-gadgets-for-every- need-a-941006.html>. [spiegel2] Appelbaum, J., Gibson, A., Guarnieri, C., Muller-Maguhn, A., Poitras, L., Rosenbach, M., Schmundt, H., and M. Sontheimer, "The Digital Arms Race: NSA Preps America for Future Battle", Spiegel Online, January 2015, <http://www.spiegel.de/international/world/new-snowden- docs-indicate-scope-of-nsa-preparations-for-cyber-battle- a-1013409.html>. [TOR1] Schneier, B., "How the NSA Attacks Tor/Firefox Users With QUANTUM and FOXACID", Schneier on Security, October 2013, <https://www.schneier.com/blog/archives/2013/10/ how_the_nsa_att.html>. [TOR2] The Guardian, "'Tor Stinks' presentation -- read the full document", October 2013, <http://www.theguardian.com/world/interactive/2013/oct/04/ tor-stinks-nsa-presentation-document>. IAB Members at the Time of Approval Jari Arkko (IETF Chair) Mary Barnes Marc Blanchet Ralph Droms Ted Hardie Joe Hildebrand Russ Housley Erik Nordmark Robert Sparks Andrew Sullivan Dave Thaler Brian Trammell Suzanne Woolf Barnes, et al. Informational [Page 23]