Carnivore and Open Source Software

December 4, 2000: In October, I was part of group of five security researchers invited by the Justice Department to identify technical issues with the Carnivore system that should be addressed by an outside review. We have just released our analysis of IITRI's draft report on Carnivore; our comments can be found here.

September 2, 2000 Update: Readers of this page will probably be interested in the short column Steve Bellovin and I wrote for Peter Neumann's Inside Risks page in the October 2000 CACM, which you can find here.

On Monday, July 24, 2000, the House Judiciary Committee's Subcommittee on the Constitution held hearings on "Fourth Amendment Issues Raised by the FBI's 'Carnivore' Program." There were witnesses from the FBI and Department of Justice as well as technical people, civil liberties advocates, and representatives from ISPs. I was invited to testify as an expert on the risks of Internet wiretapping generally and on the issues that would be raised by making the Carnivore software open-source in particular.

The "Carnivore" system runs on an FBI-owned PC that can be installed at an ISP to collect IP traffic as part of a court-ordered wiretap. Not much detail is publicly known about exactly how Carnivore works or what kinds of traffic it collects. This has contributed to an atmosphere of mistrust and confusion, which would be at least partly addressed if the FBI would provide more information, including making the Carnivore source code available for public scrutiny.

Internet wiretapping raises some very difficult technical, as well as legal, issues. The wide range of Internet access methods coupled with the fact that the Internet is based on a packet-switched, rather than connection-oriented, model, makes it much more difficult to reliably and faithfully collect exactly the Internet traffic associated with a particular source or destination than it is to perform traditional voice wiretaps. Furthermore, the security requirements on any device designed to collect Internet traffic are enormous; such a device would have access to a wide range of sensitive customer data and would be connected deep within the infrastructure of an ISP. The security benefits of making software open-source are well understood by the security community; open source could be expected to do much to strengthen a system as complex and security-critical as Carnivore.

Of course, making Carnivore open source is not a complete panacea for protecting against abuses or errors. First of all, it's likely rather complex, so simply scanning the source code probably won't tell us much about whether it is vulnerable to attack or misbehaves in the kinds of traffic it collects. That would require extensive, focused study. Open source code attracts several different kinds of reviewers. One is made up of people who are interested in and want to study a system for its own sake, but the main source of meaningful review usually comes from people who have to read and understand the code because they want to make useful modifications to it. Carnivore isn't likely to attract much of that latter (and I think more important) kind of review, at least from among the open community. On the other hand, groups of focused expert reviewers can (and often do) miss things. Any meaningful review, therefore, should include both independent expert reviewers as well as releasing the code to the public.

More seriously, I suspect that the meat (so to speak) of any meaningful analysis of Carnivore's security and behavior lies not in its core source code but rather in the parameters used when it is actually configured and installed. It is similar to asking whether Unix is secure - we can look at its source code and find (or not find) specific bugs, but the real issue in the security of any particular Unix installation is how it is managed and administered, not what version of the kernel it runs.

Still, releasing the source code is a critical first step in assuring the public that Carnivore can at least be configured to do what it is supposed to do, and I hope the FBI sees fit to take this step soon.

I've submitted as part of my written testimony a position paper I wrote with Steve Bellovin that makes the case that there is little harm, and much good, to be done by releasing the Carnivore code. It is attached below. We also wrote a guest column for Peter Neumann's Inside Risks page in the October 2000 CACM, which you can find here.

Matt Blaze
20 July 2000

Open Internet Wiretapping

Steve Bellovin
Matt Blaze

19 July 2000

I Introduction

Recent press reports have disclosed the existence of an FBI Internet wiretap device, known as "Carnivore". This is troubling for many reasons, not the least of which is that it is unclear just what the software and hardware does or how it works.

In the U.S., there are serious legal restrictions on the use of wiretaps by police agencies. The Supreme Court has consistently held that wiretaps qualify as searches under the Fourth Amendment.

Unrestricted wiretapping is clearly unconstitutional. Wiretap warrants must specify clearly whose material may be searched. A blanket search of all traffic on the Internet for, say, "any email messages containing the phrase 'weapons-grade plutonium'" would clearly be prohibited.

Federal rules on police wiretapping mandate special procedures designed to comply with Fourth Amendment protections against illegal searches. Each application for a wiretap warrant must supply copious detail on why a particular wiretap is needed, what lines are to be tapped, and why. The law also mandates "minimization" of the interception of communications not covered by the order, and requires that intercepts be recorded in a way that protects the contents from editing or alteration. Law enforcement agencies follow elaborate procedures for handling intercepted telephone calls. "Chains-of-evidence" help prevent tampering. Any intercepted traffic not covered by a warrant is discarded under supervision. When wiretap evidence is introduced at a criminal trial, the defense is entitled to examine the recordings and the processes used to create them and may challenge any discrepancies found.

Internet wiretapping, however, introduces several new technical problems. Unlike tape recordings of the human voice, it is not self-evidently obvious who said (or typed) intercepted Internet traffic. Message headers can be forged to falsely identify the source or destination of traffic. Digital messages (especially electronic mail) can be modified along their routes to change meaning or eliminate contextual details. Software bugs often make it possible for a third party to relay traffic through a computer without its owner's knowledge or cooperation. This kind of malicious tampering might occur long before the traffic reaches the interception point and without any evidence that it has happened. An intercepting law enforcement agency might have no reason to believe that it had been duped.

Even more seriously, the shared nature of Internet connections means that data packets from one user are almost immediately mixed in with those of others. Unlike the telephone system, where a single line serves a single customer and identifying a call of interest allows one to monitor the entire conversation, every Internet packet -- and these are each just a very small piece of a conversation or email message -- is individually addressed. That is, traffic on the Internet is much more like a series of small telegrams passing back and forth. Furthermore, the sender and recipient of these telegrams are identified only by "IP addresses" -- random-looking numbers that can change over time -- instead of names or telephone numbers. Any equipment or software used to collect Internet traffic as part of a legal wiretap must be written very carefully to ensure that the traffic it collects is, in fact, precisely what was intended for collection, neither more nor less. Doing this correctly is far more difficult than it might at first seem.

II Carnivore and Eavesdropping

According to published reports, Carnivore operates by eavesdropping on all network traffic on some link or links, examining it, and deciding what pieces are relevant, i.e., covered by the wiretap order. It is not obvious how this is done. For email, one can identify the recipients by looking at the mail transmission protocol traffic; the sender, however, cannot be identified without looking at the body of the letter, and not even then if a very modest attempt is made at concealment or forgery of the return address. A considerable amount of traffic would need to be saved and analyzed for this to work; that alone is troubling.

A more reliable mechanism is to use the IP address. But IP addresses are often dynamically assigned. The only way for an eavesdropping box to learn which IP addresses are interesting is to spy on the messages that assign IP addresses to particular users. That is, it has to learn of all users who are signed on in order to decide whose traffic is of interest. Even this is not completely reliable; if the monitoring box misses the sign-off message -- and it is quite common for monitoring tools to miss some packets, especially on heavily-loaded networks -- another user's traffic could very easily be picked up.

Even omissions of traffic that should have been monitored can be serious. An innocent email reply may appear to be incriminating if exculpatory context is missing.

Carnivore's job is made especially difficult by the fact that it must be at least somewhat general-purpose in its design. It must be able to be configured to operate reliably on a variety of Internet service provider (ISP) networks, under a wide range of operational conditions. A configuration that might result in correct operation at one ISP might result in erroneous or incomplete interception at another. There may be a significant risk that some Carnivore installations do not always collect all (or only) the traffic they are supposed to. Without knowing the details of how Carnivore is configured or its internal structure, however, it is impossible to be sure of the extent of this risk.

There are partial solutions to some of the problems outlined above. The question, though, is to what extent these protections are implemented. Does the system restrict the monitored data to just some selected users? Does it have to accumulate other data in order to do this? Is the filtering done properly? Is the recorded data protected against alteration?

III Open Source Wiretaps

The problem of knowing what software actually does is, of course, an old one. In fact, the question arises with respect to the privacy behavior of commercial software; there have been many reports of off-the-shelf products disclosing information without the knowledge of their systems' owners. One principle that is increasingly accepted in the software community is "open box" software -- software where the source code is open to inspection and modification by many different parties. (This concept is sometimes called "open source".) Among the popular open source systems are the Linux operating system and the Apache Web server. The latter is more widely used than commercial offerings from Netscape or Microsoft.

The basic premise is simple enough: the more eyes study a piece of software, the more likely it is that bugs will be found. In this case, a major question is design correctness: was the software designed to implement the legal strictures? Other notions of correctness are important as well. For example, can this software itself be attacked? Imagine the harm that a dedicated eavesdropping box can do if subverted! Open box software is not a panacea; it is still usually possible to configure secure software in an insecure manner, for example. But careful and wide scrutiny of the source code is the essential first step in developing confidence that any system behaves as it is supposed to.

It is difficult to overstate the value of the kind of widespread review that open source can provide for security-critical systems. Even intense review by small teams of experts often misses small but serious bugs that turn out to have severe security implications. For example, it was only review by the open research community that found several protocol failures in the National Security Agency's "Clipper" key escrow system, in spite of internal reviews by that Agency. Indeed, creating correctly operating security systems is considered to be such a extraordinarily difficult problem that there is little shame in having errors discovered once software is released for public scrutiny; it is an expected part of the quality assurance process.

In the case of wiretapping software, this issue even has legal ramifications. In any criminal trial involving wiretap evidence, the defense is sure to question the accuracy of the intercepts. Public scrutiny can only increase confidence in correct code, and hence in the correctness and completeness of the interception.

We are not impressed with the argument that it would be illegal to release the package under 18 USC 2512, which prohibits possession of devices whose primary purpose is surreptitious eavesdropping. Basic traffic interception tools are a common and essential part of every network administrator's toolkit. Carnivore is primarily a set of filtering tools, the possession of which is not (and should not be) illegal.

We are also unimpressed with the argument that knowledge of the toolset might make it easier for criminals to evade detection. The simplest defense against Carnivore (or any eavesdropping system) is use of strong encryption. This is perfectly legal, reasonably easy, and effective against any sort of filtering. The mere knowledge that Internet monitoring can be done at all is sufficient to induce some people to encrypt; precise knowledge of how it is actually accomplished is much less important.