Balkinization  

Wednesday, February 14, 2007

A Belgian Court Waffles on the Web

JB

A Belgian court has ruled that Google violated copyright laws by linking to Belgian newspapers on Google News without their permission. The decision is very unwise; it exemplifies the remarkable inability of courts and lawyers to understand how copyright law must be interpreted to deal with search engines, which provide what an essential service in the digital age.

The result of the Belgian decision, for example, is that Google News will no longer link to the particular French language newspapers that complained. That means that people will no longer be directed to their sites, will not read their content and will not see their advertisements. In short, by demanding that Google take down the links, the newspapers seem to be cutting off their nose to spite their face. They are deliberately and willfully interfering with a service that drives readers to them.

Why are they doing this? The answer is simple. They hope to force Google to enter into agreements that will pay them to link to their sites, because in fact, they actually do want Google to drive traffic to them. (Remember that people have sued Google for *not* including them in the database, or for not listing them higher in the search results). The Belgian newspapers are betting that Google prefers comprehensiveness in its indexing coverage, and that if enough courts around Europe follow suit, Google will be forced to enter into a large-scale revenue sharing arrangement. The newspapers hope that not only will they get traffic driven to their sites, they will be handsomely paid for the privilege as well.

Newspapers, and indeed, any website operator, have always been able to prevent Google from indexing their content by using a robots.txt file. This prevents indexing. It also can be used to force Google to remove cached copies and links to the cache. Search engines generally create cached copies of material for two reasons. The first is to assist with indexing. The second is so that people can find the pages more quickly if web traffic is slow or can reach the pages if the website is down temporarily. Newspapers might want to to block access to their content after a week so that they can make money out of selling access to their archives. Hence they might want no caches, or they might want the caches to disappear after a specified period of time. In fact, the complaint about cached copies of newspaper articles is far more important in my view than the argument about links. But even here, the use of the robots.txt file solves the problem. You can stop Google from caching whenever you want.

But why should that be a sufficient defense? Why must the burden be on website operators to opt out of indexing and linking (and caching, in the case of search engines) instead of requiring Website operators formally to opt in before they can be indexed and linked to?

The reason is that on the World Wide Web, we must have a broader conception of fair use (or implied license) consistent with the nature of the medium. An opt-out rule for linking-- i.e., you must exclude yourself from search engines-- makes far more sense than an opt-in rule-- i.e., the search engine must get your permission before it may link to your site. (Because content owners can opt out, we might classify this as a default rule of implied license by the content owner rather than as a defense of fair use by the search engine.)

The medium of the Web is based on hyperlinks between documents. In fact, HTML, the basic language of Web pages, stands for Hypertext Markup Language-- or to put it another way, the very language of the Web is the language of links. These links are the basic conduits through which travel on the Web occurs. Indeed, links are not only the conduits, they are also the road signs that tell people where things are and how to get to them. Moreover, search engines, which generally try to index as many pages as they can and present links to them, are a necessary method of finding information and travelling to it. As soon as we had a World Wide Web, we had lots and lots of pages and lots and lots of links, and we were going to need search engines. Otherwise, much of the Web would be essentially inaccessible. If people had a general right to prevent links to their pages, travel across the Web would be greatly hindered and the medium rendered useless.

A second argument applies particularly to search engines. An opt-in rule (you can't link to us until you get our permission) imposes significant transaction costs on anyone placing links on a webpage, but especially so on general purpose search engines, which may link to billions of pages. The search engine might have spend more time processing requests to be linked to than on any other part of its business. It is far more efficient to start with the presumption that webpages can be indexed unless the operator includes a little bit of code (the robots.txt file) that says "don't index me" or "don't make a backup copy for your cache" or both. The cost to the web page owner is small, and the savings to society is enormous.

For this reason, the law should always presume that it is legal to link to any site on the World Wide Web unless there are special reasons beyond copyright in the link itself. (There are a number of such reasons that I won't go into here, but they don't apply to the vast majority of links on the Web.) In addition, the degree of copying necessary for a search engine to index a page and produce a link should presumptively be regarded as fair use (or implied license) unless the page owner opts out. Otherwise we create incentives for toll roads everywhere on the Web, which defeats the purposes of the medium.

Finally, there is the matter of caching copies for search engines. Unlike linking, caching isn't necessary for travel across the Web. However, it benefits everyone on the Web, first because it helps search engines create their indexes, and second, because it makes it much easier to get to pages when traffic is slow or the site is temporarily inaccessible. The transactions costs argument for presumptively allowing caching is very much the same as the argument for linking. If people don't want their pages cached on a search engine, they can easily prevent it. Once again, opt-out is a far more logical presumption for copyright law on the Web than opt-in.



Comments:

Jack, I think your fair use analysis is probably spot-on as far as links are concerned. No real additional law would need to be generated to sustain that position, I would guess, although the current state of affairs would still have silly decisions like the Belgian court out there. Is there an appeal available? Or is the remedy to force the plaintiffs to live with the consequences of their victory and suffer the resulting loss of traffic?

On the issue of caching, I would guess there is a need for new law. There isn't any getting around the fact that it's wholesale copying of protected material, on terms not set by the owner. I can see some fair use arguments, but they strike me as something of a stretch.
 

Stylistic point: The last paragraph is almost entirely repetitive.

I agree with the argument, though.

I'm convinced that sooner or later, there will be billing for bandwidth (as opposed to, or in addition to, access). This might be either direct or indirect, but bandwidth does cost money, and while technology has been pretty good at keeping the transaction costs for bandwidth low enough so as to make it not worth while to charge for bandwidth usually, it's creeping in there (in HS links, max. download rates, blog account bandwidth limintations, etc.), this usually is just between one service provider and the principal acount holder. Sooner or later, there will start to be billing for the costs of end-to-end transmission, and it's there that caching will start to make a financial difference.

Cheers,ku
 

The article makes a good argument for fair use insofar as it deals with linking. However, it is incomplete as it omits a very important aspect of Google News. Not only does Google News have a link to the article, it typically also has the first couple of sentences from the article itself (as well as the article title). The fair use argument is a bit harder to include these as well, although my feeling is that Google should still win on this point.
 

I also agree that they will use the decision to try to get Google to pony up some cash for the links and article blurb. However, there is absolutely no chance that Google will do so. It is completely antithetical to their business model.
 

Maybe the site should have learned to use the oldest 'trick' in the book: use /robots.txt to tell the googlebot not to index their content.

If that didn't work, they could have blocked google.

Obviously lawyers and judges can't help much if you don't employ competent technical staff.
 

Post a Comment

Older Posts
Newer Posts
Home