Skip to content

What happened to Google’s book scanning project

The Atlantic has a wonderful article about the Google book scanning project and what became of it.

In 2002, Google began mass scanning every book it could possibly their hands on, OCRing it and making it searchable. Authors and publishers soon began sueing Google from here to the south pole and back, but in the end realized that they did not actually want to win their lawsuits.

Suppose the Authors Guild won: they were unlikely to recoup anything more than the statutory minimum in damages; and what good would it do to stop Google from providing snippets of old books? If anything those snippets might drive demand. And suppose Google won: Authors and publishers would get nothing, and all readers would get for out-of-print books would be snippets—not access to full texts.

The plaintiffs, in other words, had gotten themselves into a pretty unusual situation. They didn’t want to lose their own lawsuit—but they didn’t want to win it either.

The solution was the “Google Books Search Amended Settlement Agreement” (GBS), which took more than two years to formulate, and which would find an agreement between authors, publishers, libraries and Google, and to top it off, would define a regime on how to deal with out-of-print books that are still technically covered under copyright.

But making these scanned books available also drew objections, from many parties, including Amazon, because people feared that this scanning effort would turn Google into a giant book store. Looking back, most of these fears are nonsensical, but they led to the US department of justice putting the settlement on hold. Taking Google out of the equation was not possible:

In some ways, the parties to the settlement didn’t have a good way out: no matter how “non-exclusive” they tried to make the deal, it was in effect a deal that only Google could get—because Google was the only defendant in the case. For a settlement in a class action titled Authors Guild v. Google to include not just Google but, say, every company that wanted to become a digital bookseller, would be to stretch the class action mechanism past its breaking point.

To fix things, congress would have to adjust copyright law, which it would not do. The end result: Somewhere at Google, there is a database with 25 million books in them, and nobody is legally allowed to read them.

Sometimes, politics just suck.

Published inGoing Digital and the Copyright

One Comment

  1. AndreasLobinger

    I somehow fail to see a good reason, why google (a big data player and a search engine) cannot ‘just’ write 25 million letters to the assumed copyright holders to ask permission and then subsequentially following the positive returns open the database. In the case of negative answers google could negotiate a compensation from advertise revenue by this book to the copy holder, in case of no answer google could provide a link to the nearest (physical) copy of the book.

Leave a Reply

Your email address will not be published. Required fields are marked *