Balkinization: Data Storage and the Fourth Amendment

Data Storage and the Fourth Amendment

This Boston Globe story contains a few quotes from a long discussion I had with Charlie Savage, a Globe reporter, about the 4th Amendment implications of Echelon style surveillance. One theory holds that if the government has a computer sift through messages and phone calls, there is no Fourth Amendment problem. (For the moment I put aside the rather important differences between phone calls and e-mails under current law. That's a big if, and I don't want the reader to overlook it). The basic idea is that having a computer sift through messages raises no constitutional problems because no human being is listening in or reading anything. Rather, all the surveillance is peformed by a computer program.

I think this argument is technologically naive. The question is not whether a computer program does the initial collection of data but what happens to the data after it is collected. As storage costs decrease to zero, it makes sense to keep a copy of everything you collect so that you can index and search through it later. If you think that the amount of traffic that goes through a system like Carnivore (or like Echelon) is simply too great to collect, you are using yesterday's assumptions. Given Moore's Law with respect to decreasing cost of computing power and its rough equivalent with respect to the decreasing costs of storage space, you should assume that if the government can invest in large server farms to store data (as Google already does) it will do so. Remember that Google already keeps a cached copy of almost everything it searches for on the Internet. And Google mail keeps a copy of all of your e-mail on its servers. Storage of enormous amounts of data is part of its business model. Why do we assume this capacity is beyond the United States government?

Once data in digital form (whether voice or text or video) is stored, it must be searched and analyzed to be of any use. Put another way, data mining requires both data collection and data storage that allows the data to be mined. At some point in the process, human beings will receive information from the system. Once they receive that information, they will want to know the context in which the data that the computer has spit out to them appeared. That is, they will want access to the data base. If the data has been stored, they will have access to it. Thus the key issue is not whether the data collection was done by a human being or by a computer program. The key issue is whether the results of the data collection are stored somewhere on a computer (or, more likely, a server farm) to which government agents have access.

Unless there is a policy requiring automatic destruction of the data after a specified time, the data will remain on the computer because as storage costs decrease it is cheaper to keep data than to spend the time figuring out what to get rid of. (Once again, think about Google Mail, which assumes that you will keep all your e-mail messages, no matter how trival, on its servers because it takes too long to sift through and delete the messages you don't need any more.). When storage costs approach zero, data collection increasingly means permanent data storage unless there is a specific policy to counteract it. (To put some perspective on this, the Defense Department appears to have adopted a 90 day retention policy for a different database of suspicious incidents collected about American citizens, but it also seems not to have followed its own data destruction policy.)

In our current imagination the paradigm case of an electronic Fourth Amendment violation is real time eavesdropping on a telephone conversation. But we all know that it should make no difference if the wiretap is recorded automatically and listened to later. In like fashion, it should make no difference if the government collects information for data mining purposes, stores it on a server farm somewhere, and then returns to search the collected information at its leisure. If the information is stored, then we have a potential Fourth Amendment problem, even if the data is not accessed immediately by any human being.

Indeed, if lack of sentience allows an end run around the Fourth Amendment, then why not have robots do all the government's searching? They can collect information, store it, and allow government agents to search what the bots have found at their leisure. Moreover, since wires go into every person's home, and wireless broadcasting emanates outside every person's home, even the home should lack any special Fourth Amendment status if bots are doing all the government's dirty work.

Again, the key issue is not who collects the data initially (human or robot) but whether the data is stored. None of the accounts I have read in the press tell me how long the data collected by computers is stored or who has access to the data base. That is the question that everyone should be asking.

Posted 8:48 AM by JB [link]

Comments:

It seems to me that if Mustafa calls Abdullah and says "Drive the truck with the dynamite in the parking garage of the Sears Tower, set the timer for five minutes and get as far away as you can", and listening in on this is illegal, then we are goners. We do not have a constitution -- we have a suicide pact. We may as well invite Osama bin Laden to be our Caliph, convert to Islam and save everyone a lot of trouble. Although a better solution would be to slit our throats.

But seriously, is the President's Constitutional authority to defend our borders under Article II limited to assigning soldiers with keen hearing to listen for the drone of approaching planes? Telecommunications have changed the world. I propose this: If our laws prohibit listening in on any international telecommunication, then we do away with international communications. Knock down the satellites and cut the transocean cables. Let them do it all by mail. Do you argue that this is not within the power of the government to do? The sender and addressee are in plain view on a mailed envelope. If either is suspicious the government can apply for a search warrant to open the letter or package.

What trade-off will you give for the convenience of instant communication across the globe? A little loss of your privacy or 1,200 degrees Fahrenheit from the aviation fuel burning in your home or office?

# posted by

nk : 10:41 PM

I used to be interested in this topic. If memory serves, what happens is that it's stored electronically for two years, and then is deleted.

That's if memory serves, though. It's been a long while since I read that. Then again, nobody reports on Carnivore/Echelon anymore.

# posted by

Lost : 12:18 AM

Perhaps its just me, but is there a specific reason that things change simply because information reaches human eyes, as opposed to only being checked by a computer program? Are human eyes somehow more sinister? Is it that the human may be corruptible, more likely to be irresponsible, may have an agenda not in keeping with the constitution, or that the human may be more likely to make a mistake, or might just have old-fashoned bad intentions? Is the presence of human intention the deciding criteria? If it is, then its worth noting that the programs which do the scanning are processes designed according to criteria laid out by the same 'human eyes' that we don't want to be indiscriminately reading/listening in on the communications being scanned.

In other words, the scanning being done has the same intent as the human eyes that told the computer program what to scan for. The perhaps unwitting statement made by the WP article is that its not good to scan communications manually, but its just fine to automate scanning. Is it just me, or is the article saying that less harm is done by automating something as opposed to doing it by hand? If its potentially harmful to do something by hand, it somehow not harmful once you automate it? How is it that we believe that automating something is automatically better? Have we been duped into thinking that new is unfailingly better? Its perhaps an unreasonable statement to make, but I hope the same leap of logic isn't applied to the justice system.

Food for thought, and maybe even indigestion.

# posted by

John : 12:25 PM

Email message headers only are 'scanned' for routing purposes by typically 3-6 (sometime more) SMTP routers/servers.

However, as in the case of the postal worker, the destination (and in some cases the source) is what is examined - not the content. Howard's premise is that since I've written an address on the outside of a letter, I have given the post office permission to open and read the letter.

I do not think this is the case. Nor does it seem to make any sense that just because someone has implied permission to store a message they also have permission to read the message. Also, even at the inception of Internet email, it was rare for a message to stay on a server for more than four hours.

In instances where the actual content of email is scanned, which is increasingly the case business and academia as Howard pointed out, it is disclosed to people that email is scanned. In fact, its as common to disclose that such scanning takes place as it is to perform such scanning.

If fact, if email routers did try to read the content of messages, few of them would work at all. Email is always scanned by separate servers dedicated to scanning. Further, there is no logical association between the content and its destination - that's why there's a 'To' field in the program you use to read and send email, and why email routers don't read the content of email messages.

Email is actually successfully delivered to people on the Internet for exactly the opposite reason that Howard cites - because email routers do not read the contents of entire messages.

# posted by

John : 12:17 PM

I find it curious that no one has mentioned the impact of message encryption on the utility of scanning e-mail since my recollection (from 10-15 years ago) is that e-mail encryption tools offering at least a modest degree of protection were readily available even "way back then". presumably serious terrorists would use such counterintelligence methods. is there some reason this is not relevant to the discussion?

# posted by

Charles T. Wolverton : 10:25 AM

My original point was that why messages are scanned and by whom is what is relevant.

Also, to throw in a bit regarding encryption for ctw, there is good encryption for email, PGP/GPG both public/private encryption schemes. That encryption is at least theoreticaly breakable, but even with vast amount of supercomputing available, a resaonably long key with a decent cypher would take long enough to crack that the information would very likely lose its timeliness. Cyphers do get broken all the time - a Chinese cryptographer broke MD5 last year, and another Chinese cryptographer demonstrated how to break SHA1 in much less time than previously thought - 2000 times faster than before. About 96 days and 36M$ of computing power in present day CPU cost/cycles was shown as sufficient to get collisions for SHA1. Now there's just SHA256 and SHA512 that are considered to be strong enough to be considered reasonably collision-free.

# posted by

John : 9:08 AM

bitswapper:

tnx for the update. if I interpret it correctly, it confirms my suspicion that computationally viable scanning of massive amounts of e-mail text would be ineffective against even minimally sophisticated encryption. this would appear to be an argument against this version of data mining in the tradeoff between security and privacy interests.

# posted by

Charles T. Wolverton : 10:22 AM

Not long came upon the clause and possess been recently meter reading on. I must expressage my personal wonderment of your written material skill as well as capability to make audience understand right from the start to the remnant. I'd really like to see new blogposts as well as plowshare my own feelings with you.

rs gold
runescape gold

# posted by

BJ521 : 2:51 AM