Balkinization  

Friday, December 23, 2005

Data Storage and the Fourth Amendment

JB

This Boston Globe story contains a few quotes from a long discussion I had with Charlie Savage, a Globe reporter, about the 4th Amendment implications of Echelon style surveillance. One theory holds that if the government has a computer sift through messages and phone calls, there is no Fourth Amendment problem. (For the moment I put aside the rather important differences between phone calls and e-mails under current law. That's a big if, and I don't want the reader to overlook it). The basic idea is that having a computer sift through messages raises no constitutional problems because no human being is listening in or reading anything. Rather, all the surveillance is peformed by a computer program.

I think this argument is technologically naive. The question is not whether a computer program does the initial collection of data but what happens to the data after it is collected. As storage costs decrease to zero, it makes sense to keep a copy of everything you collect so that you can index and search through it later. If you think that the amount of traffic that goes through a system like Carnivore (or like Echelon) is simply too great to collect, you are using yesterday's assumptions. Given Moore's Law with respect to decreasing cost of computing power and its rough equivalent with respect to the decreasing costs of storage space, you should assume that if the government can invest in large server farms to store data (as Google already does) it will do so. Remember that Google already keeps a cached copy of almost everything it searches for on the Internet. And Google mail keeps a copy of all of your e-mail on its servers. Storage of enormous amounts of data is part of its business model. Why do we assume this capacity is beyond the United States government?

Once data in digital form (whether voice or text or video) is stored, it must be searched and analyzed to be of any use. Put another way, data mining requires both data collection and data storage that allows the data to be mined. At some point in the process, human beings will receive information from the system. Once they receive that information, they will want to know the context in which the data that the computer has spit out to them appeared. That is, they will want access to the data base. If the data has been stored, they will have access to it. Thus the key issue is not whether the data collection was done by a human being or by a computer program. The key issue is whether the results of the data collection are stored somewhere on a computer (or, more likely, a server farm) to which government agents have access.

Unless there is a policy requiring automatic destruction of the data after a specified time, the data will remain on the computer because as storage costs decrease it is cheaper to keep data than to spend the time figuring out what to get rid of. (Once again, think about Google Mail, which assumes that you will keep all your e-mail messages, no matter how trival, on its servers because it takes too long to sift through and delete the messages you don't need any more.). When storage costs approach zero, data collection increasingly means permanent data storage unless there is a specific policy to counteract it. (To put some perspective on this, the Defense Department appears to have adopted a 90 day retention policy for a different database of suspicious incidents collected about American citizens, but it also seems not to have followed its own data destruction policy.)

In our current imagination the paradigm case of an electronic Fourth Amendment violation is real time eavesdropping on a telephone conversation. But we all know that it should make no difference if the wiretap is recorded automatically and listened to later. In like fashion, it should make no difference if the government collects information for data mining purposes, stores it on a server farm somewhere, and then returns to search the collected information at its leisure. If the information is stored, then we have a potential Fourth Amendment problem, even if the data is not accessed immediately by any human being.

Indeed, if lack of sentience allows an end run around the Fourth Amendment, then why not have robots do all the government's searching? They can collect information, store it, and allow government agents to search what the bots have found at their leisure. Moreover, since wires go into every person's home, and wireless broadcasting emanates outside every person's home, even the home should lack any special Fourth Amendment status if bots are doing all the government's dirty work.

Again, the key issue is not who collects the data initially (human or robot) but whether the data is stored. None of the accounts I have read in the press tell me how long the data collected by computers is stored or who has access to the data base. That is the question that everyone should be asking.


Comments:

What is so interesting is not that the 4th amendment is out of date or antiquated, but it seemed to be pefectly written to apply and adapt to all times.
 

The cameras in the London subways take pictures all the time and store them on disk. It would be a violation of privacy if they were to develop a system to track the movement of ordinary citizens all the time, but this is a very useful system to track back terrorists after an attack.

If you transmit data through the public systems, some of that data will be stored electronically for a while. In transit it is stored in memory. At the destination, it is stored on the hard disk of your "Post Office" server until you decide to pick up your messages.

The Fourth Amendment protects "persons, houses, papers, and effects, against unreasonable searches and seizures." Some court decisions have suggested that it is a violation of this amendment to take aerial infra-red photographs of the roofs of houses for the purpose of identifying people who may be growing marijuana. This cannot then be used to say that it is illegal to take aerial photographs for any purpose because they might later be used by the government in a way that some court finds would violate the law.

This means that any policy has to be based on intent and effect rather than mechanics. The storage of data passing through the public communications infrastructure by itself should never be prohibited by law, either for the government or for those maintaining the infrastructure. What should be controlled is the access to that data and the way in which it is used.

There are certainly technical solutions that can be mandated to protect the stored data. Any protection mechanism can be bypassed, but generally it would be easier to illegally tap into the original data stream than to crack a properly maintained security system. Then there would be time to sift through questions of probable cause and get judicial warrants to access the data legally.

If it is possible for the Government to get a warrant to view data legally, then it should be permissible for the government to take steps to preserve the data long enough to obtain the warrant and longer if there may be a gap between the collection of the data and the circumstance leading to probable cause. Length of storage is not an issue as long as proper custody and security of the raw data is maintained.

It is no more possible to claim privacy for your particular bits flowing through the Internet than it is to claim privacy for your particular infrared photons flying through the air. It is possible to legislate the end use of collected material, where intent and effect are obvious. Attempts to legislate based on intermediate processing will always produce silly results.
 

It seems to me that if Mustafa calls Abdullah and says "Drive the truck with the dynamite in the parking garage of the Sears Tower, set the timer for five minutes and get as far away as you can", and listening in on this is illegal, then we are goners. We do not have a constitution -- we have a suicide pact. We may as well invite Osama bin Laden to be our Caliph, convert to Islam and save everyone a lot of trouble. Although a better solution would be to slit our throats.

But seriously, is the President's Constitutional authority to defend our borders under Article II limited to assigning soldiers with keen hearing to listen for the drone of approaching planes? Telecommunications have changed the world. I propose this: If our laws prohibit listening in on any international telecommunication, then we do away with international communications. Knock down the satellites and cut the transocean cables. Let them do it all by mail. Do you argue that this is not within the power of the government to do? The sender and addressee are in plain view on a mailed envelope. If either is suspicious the government can apply for a search warrant to open the letter or package.

What trade-off will you give for the convenience of instant communication across the globe? A little loss of your privacy or 1,200 degrees Fahrenheit from the aviation fuel burning in your home or office?
 

I used to be interested in this topic. If memory serves, what happens is that it's stored electronically for two years, and then is deleted.

That's if memory serves, though. It's been a long while since I read that. Then again, nobody reports on Carnivore/Echelon anymore.
 

Perhaps its just me, but is there a specific reason that things change simply because information reaches human eyes, as opposed to only being checked by a computer program? Are human eyes somehow more sinister? Is it that the human may be corruptible, more likely to be irresponsible, may have an agenda not in keeping with the constitution, or that the human may be more likely to make a mistake, or might just have old-fashoned bad intentions? Is the presence of human intention the deciding criteria? If it is, then its worth noting that the programs which do the scanning are processes designed according to criteria laid out by the same 'human eyes' that we don't want to be indiscriminately reading/listening in on the communications being scanned.

In other words, the scanning being done has the same intent as the human eyes that told the computer program what to scan for. The perhaps unwitting statement made by the WP article is that its not good to scan communications manually, but its just fine to automate scanning. Is it just me, or is the article saying that less harm is done by automating something as opposed to doing it by hand? If its potentially harmful to do something by hand, it somehow not harmful once you automate it? How is it that we believe that automating something is automatically better? Have we been duped into thinking that new is unfailingly better? Its perhaps an unreasonable statement to make, but I hope the same leap of logic isn't applied to the justice system.

Food for thought, and maybe even indigestion.
 

Computer messages may be scanned and stored dozens of times just to get to their destination. The header must be read, then the destination must be used to find the next step in the routing. Like a postcard, every computer message carries its destination and message in the same place.

It is less common these days to find intermediate nodes that store mail for a long time, but twenty years ago that was the most common mechanism. Certainly mail is stored at the final mail server, where it is backed up to tape regularly to provide recovery from system crashes. As several administrations and white collar criminals have discovered, "deleted" mail can be restored months or even years later.

So the point is that you cannot declare that it is illegal to scan or store messages. You might try to make it illegal for the government to scan messages but OK for everyone else to, but that only works if the government is never part of the intermediate delivery mechanism.

Furthermore, the Internet now rests on the edge of a cliff where any group of compromised machines could be remotely triggered to launch a serious denial of service attack that would shut the entire thing down. In the event of attack, the only thing that could keep US data communications running would be some government intervention and a serious message scanning and discarding heuristic (or government approval for private entities to do the same thing).

Even universities, who are terribly reluctant to do anything that appears to control content, are increasingly scanning incoming mail for well know virus payloads and disarming or discarding them. This is, however, trivial compared to what a determined enemy could do if they had people skilled enough to seriously attack the system.

So in the end, the reason why it is OK for machines to scan messages is the same reason why it is OK for postal workers to read the address on a letter. If they don't, nothing works. Therefore, in the long run privacy can only be legislated by controlling access based on intent or effect and not by controlling raw access.
 

Email message headers only are 'scanned' for routing purposes by typically 3-6 (sometime more) SMTP routers/servers.

However, as in the case of the postal worker, the destination (and in some cases the source) is what is examined - not the content. Howard's premise is that since I've written an address on the outside of a letter, I have given the post office permission to open and read the letter.

I do not think this is the case. Nor does it seem to make any sense that just because someone has implied permission to store a message they also have permission to read the message. Also, even at the inception of Internet email, it was rare for a message to stay on a server for more than four hours.

In instances where the actual content of email is scanned, which is increasingly the case business and academia as Howard pointed out, it is disclosed to people that email is scanned. In fact, its as common to disclose that such scanning takes place as it is to perform such scanning.

If fact, if email routers did try to read the content of messages, few of them would work at all. Email is always scanned by separate servers dedicated to scanning. Further, there is no logical association between the content and its destination - that's why there's a 'To' field in the program you use to read and send email, and why email routers don't read the content of email messages.

Email is actually successfully delivered to people on the Internet for exactly the opposite reason that Howard cites - because email routers do not read the contents of entire messages.
 

The Post Office operates efficiently because workers only look at the right (address) part of the Post Card and don't bother to read the left (message) part. It is true that current Sendmail based mail systems only bother to look at the headers (and remember they rewrite them so that every message contains a list of intermediate processing steps).

You only need to record the from: and to: headers to do the type of traffic analysis that the newspapers are reporting the goverment is doing. There is, however, a problem when you assume that you can legislate a rule that intermediate routers can never semantically process the content of a message.

The Intenet mail standards were never based on the assumption that everyone used the same language, character sets, file formats, and conventions. There are a set of RFC standards for the usual "Sendmail" internet mail protocol. However, other protocols (IBM RSCS, X.400) still exist and are in use. Internet mail is Inter-Net because it is translated at network boundaries whenever the standards change.

Thus a few decades back, every message transmitted between the universities on Bitnet and those on the Internet had to be reformatted at the network boundary from EBCDIC to ASCII standards. Ten years from now we may be migrating from the current unworkable mess of SPAM, phishing, and other diseases to some new, safer, more reliable system, and messages will again have to be reformatted at the boundary.

Or even better, suppose that Google does take over the world, and ten years from now I can send a mail message through the Google-enhanced Internet and have it automatically translated from English to the language preferred by the addressee. I certainly would not want some ideologically based law blocking such a service from developing.

It is important not to legislate or regulate based on the assumption that the thing you know best is the only way things have ever worked,the only way they work now, and will work the same forever.

If the Internet will ever be ready to protect itself against a bot attack, it will use the same kind of traffic analysis and heruistic data mining that the government is currently alleged to have used to find terrorists. You cannot legislate about the technology, but rather only about the purpose to which it is used.
 

I find it curious that no one has mentioned the impact of message encryption on the utility of scanning e-mail since my recollection (from 10-15 years ago) is that e-mail encryption tools offering at least a modest degree of protection were readily available even "way back then". presumably serious terrorists would use such counterintelligence methods. is there some reason this is not relevant to the discussion?
 

My original point was that why messages are scanned and by whom is what is relevant.

Also, to throw in a bit regarding encryption for ctw, there is good encryption for email, PGP/GPG both public/private encryption schemes. That encryption is at least theoreticaly breakable, but even with vast amount of supercomputing available, a resaonably long key with a decent cypher would take long enough to crack that the information would very likely lose its timeliness. Cyphers do get broken all the time - a Chinese cryptographer broke MD5 last year, and another Chinese cryptographer demonstrated how to break SHA1 in much less time than previously thought - 2000 times faster than before. About 96 days and 36M$ of computing power in present day CPU cost/cycles was shown as sufficient to get collisions for SHA1. Now there's just SHA256 and SHA512 that are considered to be strong enough to be considered reasonably collision-free.
 

bitswapper:

tnx for the update. if I interpret it correctly, it confirms my suspicion that computationally viable scanning of massive amounts of e-mail text would be ineffective against even minimally sophisticated encryption. this would appear to be an argument against this version of data mining in the tradeoff between security and privacy interests.
 

Absent any legislation, the inherent power of the President to employ warranless surveillane has been established law and a precident establshed by our threee greatest presdients and every one of them who succeeded FDR.The Hiouse report of l978 which supported the FISA bill, adopted the principle that the Congress had the power reasonably to restrict that power. The platoon of established constitutional experts who supported FISA in their famous letter to the Congress relied on that principle.As Judge Posner is his several writings has made clear, FISA is not reasonable. Its exclusion of warranless electronic surveillance, and only authorizing such surveillance of a target reasonably belived to be a terrorist or agent thereof is patently unreasonable. The mission of our intelligence is not to survey identified terrorists but to find them. As the Judge wrote, FISA "might as well have been enacted in l878 to rgulate the intercepton of telegams."
Try as we may, and we must try harder, we cannot rely on our physical defense of our airports, our ports, our infrastructure, our major cities, our government's offices.Our intelligenceis our most valuable defense, and to exclude the use of the enormous facilities of the NSA, with one of the world's largest assembly of comlpute power, its satellites, its foreign resources, its large staff,its experience, is itself patently unreasonable. Unless you believe the Constitution is a suicide pact, you must hold that the NSA warrant suraveillance is legal and it is FISA's exclusive clause which is unconstituional.
Interestingly, critics of the NSA program have argued that the Supreme Court in Hamdan who have demolished the President's claim of warrantless surveiollance power.Read the Supreme
Court opinion, and Justice Kennedy's concurring opinion joined in by Justice Stevens and you will learn that the decision was the the miiltary commission autorized by the Predident was illegal because it violated the Military Code of Justice and the Geneva Convention AND because the Government had not established that it was justified, that is, reasonable. The chances of that Court condemning warrantless surveillance are low indeed.
All of this is spelled out in detail (4500 WORDS), confronting the sevral arguments made for FISA, in a blog I recently establishedat "www.aborden.typepad.Com." (NO CAP HERE)
 

Absent any legislation, the inherent power of the President to employ warrantless surveillance has been established law and a precedent established by our three greatest presidents and every one of them who succeeded FDR. The House report of l978 which supported the FISA bill, adopted the principle that the Congress had the power reasonably to restrict that power. The platoon of established constitutional experts who supported FISA in their famous letter to the Congress relied on that principle. As Judge Posner in his several writings has made clear, FISA is not reasonable. Its exclusion of warrantless electronic surveillance, and only authorizing such surveillance of a target reasonably believed to be a terrorist or agent thereof is patently unreasonable. The mission of our intelligence is not to survey identified terrorists but to find them. As the Judge wrote, FISA "might as well have been enacted in l878 to regulate the interception of telegrams."

Try as we may, and we must try harder, we cannot rely on our physical defense of our airports, our ports, our infrastructure, our major cities, our government's offices. Our intelligence is our most valuable defense, and to exclude the use of the enormous facilities of the NSA, with one of the world's largest assembly of computer power, its satellites, its foreign resources, its large staff, its experience, is itself patently unreasonable. Unless you believe the Constitution is a suicide pact, you must hold that the NSA warrant surveillance is legal and it is FISA's exclusive clause which is unconstitutional.

Interestingly, critics of the NSA program have argued that the Supreme Court in Hamdan who have demolished the President's claim of warrantless surveillance power. Read the Supreme Court opinion, and Justice Kennedy's concurring opinion joined in by Justice Stevens and you will learn that the decision was the military commission authorized by the President was illegal because it violated the Military Code of Justice and the Geneva Convention AND because the Government had not established that it was justified, that is, reasonable. The chances of that Court condemning warrantless surveillance are low indeed.

All of this is spelled out in detail (4500 WORDS), confronting the several arguments made for FISA, in a blog I recently established at www.aborden.typepad.com.
 

nice your blog and very nice dear your blog Ranking is too Good we like its you can see our also website for online shopping......
laptop batteries
 

Not long came upon the clause and possess been recently meter reading on. I must expressage my personal wonderment of your written material skill as well as capability to make audience understand right from the start to the remnant. I'd really like to see new blogposts as well as plowshare my own feelings with you.


rs gold
runescape gold
 

Post a Comment

Home