Balkinization: A Belgian Court Waffles on the Web

A Belgian Court Waffles on the Web

A Belgian court has ruled that Google violated copyright laws by linking to Belgian newspapers on Google News without their permission. The decision is very unwise; it exemplifies the remarkable inability of courts and lawyers to understand how copyright law must be interpreted to deal with search engines, which provide what an essential service in the digital age.

The result of the Belgian decision, for example, is that Google News will no longer link to the particular French language newspapers that complained. That means that people will no longer be directed to their sites, will not read their content and will not see their advertisements. In short, by demanding that Google take down the links, the newspapers seem to be cutting off their nose to spite their face. They are deliberately and willfully interfering with a service that drives readers to them.

Why are they doing this? The answer is simple. They hope to force Google to enter into agreements that will pay them to link to their sites, because in fact, they actually do want Google to drive traffic to them. (Remember that people have sued Google for *not* including them in the database, or for not listing them higher in the search results). The Belgian newspapers are betting that Google prefers comprehensiveness in its indexing coverage, and that if enough courts around Europe follow suit, Google will be forced to enter into a large-scale revenue sharing arrangement. The newspapers hope that not only will they get traffic driven to their sites, they will be handsomely paid for the privilege as well.

Newspapers, and indeed, any website operator, have always been able to prevent Google from indexing their content by using a robots.txt file. This prevents indexing. It also can be used to force Google to remove cached copies and links to the cache. Search engines generally create cached copies of material for two reasons. The first is to assist with indexing. The second is so that people can find the pages more quickly if web traffic is slow or can reach the pages if the website is down temporarily. Newspapers might want to to block access to their content after a week so that they can make money out of selling access to their archives. Hence they might want no caches, or they might want the caches to disappear after a specified period of time. In fact, the complaint about cached copies of newspaper articles is far more important in my view than the argument about links. But even here, the use of the robots.txt file solves the problem. You can stop Google from caching whenever you want.

But why should that be a sufficient defense? Why must the burden be on website operators to opt out of indexing and linking (and caching, in the case of search engines) instead of requiring Website operators formally to opt in before they can be indexed and linked to?

The reason is that on the World Wide Web, we must have a broader conception of fair use (or implied license) consistent with the nature of the medium. An opt-out rule for linking-- i.e., you must exclude yourself from search engines-- makes far more sense than an opt-in rule-- i.e., the search engine must get your permission before it may link to your site. (Because content owners can opt out, we might classify this as a default rule of implied license by the content owner rather than as a defense of fair use by the search engine.)

The medium of the Web is based on hyperlinks between documents. In fact, HTML, the basic language of Web pages, stands for Hypertext Markup Language-- or to put it another way, the very language of the Web is the language of links. These links are the basic conduits through which travel on the Web occurs. Indeed, links are not only the conduits, they are also the road signs that tell people where things are and how to get to them. Moreover, search engines, which generally try to index as many pages as they can and present links to them, are a necessary method of finding information and travelling to it. As soon as we had a World Wide Web, we had lots and lots of pages and lots and lots of links, and we were going to need search engines. Otherwise, much of the Web would be essentially inaccessible. If people had a general right to prevent links to their pages, travel across the Web would be greatly hindered and the medium rendered useless.

A second argument applies particularly to search engines. An opt-in rule (you can't link to us until you get our permission) imposes significant transaction costs on anyone placing links on a webpage, but especially so on general purpose search engines, which may link to billions of pages. The search engine might have spend more time processing requests to be linked to than on any other part of its business. It is far more efficient to start with the presumption that webpages can be indexed unless the operator includes a little bit of code (the robots.txt file) that says "don't index me" or "don't make a backup copy for your cache" or both. The cost to the web page owner is small, and the savings to society is enormous.

For this reason, the law should always presume that it is legal to link to any site on the World Wide Web unless there are special reasons beyond copyright in the link itself. (There are a number of such reasons that I won't go into here, but they don't apply to the vast majority of links on the Web.) In addition, the degree of copying necessary for a search engine to index a page and produce a link should presumptively be regarded as fair use (or implied license) unless the page owner opts out. Otherwise we create incentives for toll roads everywhere on the Web, which defeats the purposes of the medium.

Finally, there is the matter of caching copies for search engines. Unlike linking, caching isn't necessary for travel across the Web. However, it benefits everyone on the Web, first because it helps search engines create their indexes, and second, because it makes it much easier to get to pages when traffic is slow or the site is temporarily inaccessible. The transactions costs argument for presumptively allowing caching is very much the same as the argument for linking. If people don't want their pages cached on a search engine, they can easily prevent it. Once again, opt-out is a far more logical presumption for copyright law on the Web than opt-in.

Posted 8:08 AM by JB [link]