[分享]Scan This Book! From NYT

入得谷来,祸福自求。
Post Reply
密斯张三
Posts: 503
Joined: 2005-02-23 0:22
Contact:

[分享]Scan This Book! From NYT

Post by 密斯张三 » 2006-05-14 13:14

很长。。。。后面还有几节,包括copyright问题, google图书馆,等等。转的这段基本是在展望(图书数字化后)一个文本互相链接引用的巨大wiki型图书馆的前景是多么美好。这样,想了解什么主题,都不用背一大堆书回来翻来倒去调研、对比、去粗取精了,这些总结工作前人都给你做好了哇,剩下的就是指哪打哪了!嗯。。。各么下一步就是把所有人的大脑link起来成为一个大数据库,再也不用人人复人人的念书学知识。。。。。比如Jun看完一本书就立刻把读书笔记synchronize给我门大家 :super: :whistling:


May 14, 2006
Scan This Book!
By KEVIN KELLY




Correction Appended

In several dozen nondescript office buildings around the world, thousands of hourly workers bend over table-top scanners and haul dusty books into high-tech scanning booths. They are assembling the universal library page by page.

The dream is an old one: to have in one place all knowledge, past and present. All books, all documents, all conceptual works, in all languages. It is a familiar hope, in part because long ago we briefly built such a library. The great library at Alexandria, constructed around 300 B.C., was designed to hold all the scrolls circulating in the known world. At one time or another, the library held about half a million scrolls, estimated to have been between 30 and 70 percent of all books in existence then. But even before this great library was lost, the moment when all knowledge could be housed in a single building had passed. Since then, the constant expansion of information has overwhelmed our capacity to contain it. For 2,000 years, the universal library, together with other perennial longings like invisibility cloaks, antigravity shoes and paperless offices, has been a mythical dream that kept receding further into the infinite future.

Until now. When Google announced in December 2004 that it would digitally scan the books of five major research libraries to make their contents searchable, the promise of a universal library was resurrected. Indeed, the explosive rise of the Web, going from nothing to everything in one decade, has encouraged us to believe in the impossible again. Might the long-heralded great library of all knowledge really be within our grasp?

Brewster Kahle, an archivist overseeing another scanning project, says that the universal library is now within reach. "This is our chance to one-up the Greeks!" he shouts. "It is really possible with the technology of today, not tomorrow. We can provide all the works of humankind to all the people of the world. It will be an achievement remembered for all time, like putting a man on the moon." And unlike the libraries of old, which were restricted to the elite, this library would be truly democratic, offering every book to every person.

But the technology that will bring us a planetary source of all written material will also, in the same gesture, transform the nature of what we now call the book and the libraries that hold them. The universal library and its "books" will be unlike any library or books we have known. Pushing us rapidly toward that Eden of everything, and away from the paradigm of the physical paper tome, is the hot technology of the search engine.

1. Scanning the Library of Libraries

Scanning technology has been around for decades, but digitized books didn't make much sense until recently, when search engines like Google, Yahoo, Ask and MSN came along. When millions of books have been scanned and their texts are made available in a single database, search technology will enable us to grab and read any book ever written. Ideally, in such a complete library we should also be able to read any article ever written in any newspaper, magazine or journal. And why stop there? The universal library should include a copy of every painting, photograph, film and piece of music produced by all artists, present and past. Still more, it should include all radio and television broadcasts. Commercials too. And how can we forget the Web? The grand library naturally needs a copy of the billions of dead Web pages no longer online and the tens of millions of blog posts now gone ― the ephemeral literature of our time. In short, the entire works of humankind, from the beginning of recorded history, in all languages, available to all people, all the time.

This is a very big library. But because of digital technology, you'll be able to reach inside it from almost any device that sports a screen. From the days of Sumerian clay tablets till now, humans have "published" at least 32 million books, 750 million articles and essays, 25 million songs, 500 million images, 500,000 movies, 3 million videos, TV shows and short films and 100 billion public Web pages. All this material is currently contained in all the libraries and archives of the world. When fully digitized, the whole lot could be compressed (at current technological rates) onto 50 petabyte hard disks. Today you need a building about the size of a small-town library to house 50 petabytes. With tomorrow's technology, it will all fit onto your iPod. When that happens, the library of all libraries will ride in your purse or wallet ― if it doesn't plug directly into your brain with thin white cords. Some people alive today are surely hoping that they die before such things happen, and others, mostly the young, want to know what's taking so long. (Could we get it up and running by next week? They have a history project due.)

Technology accelerates the migration of all we know into the universal form of digital bits. Nikon will soon quit making film cameras for consumers, and Minolta already has: better think digital photos from now on. Nearly 100 percent of all contemporary recorded music has already been digitized, much of it by fans. About one-tenth of the 500,000 or so movies listed on the Internet Movie Database are now digitized on DVD. But because of copyright issues and the physical fact of the need to turn pages, the digitization of books has proceeded at a relative crawl. At most, one book in 20 has moved from analog to digital. So far, the universal library is a library without many books.

But that is changing very fast. Corporations and libraries around the world are now scanning about a million books per year. Amazon has digitized several hundred thousand contemporary books. In the heart of Silicon Valley, Stanford University (one of the five libraries collaborating with Google) is scanning its eight-million-book collection using a state-of-the art robot from the Swiss company 4DigitalBooks. This machine, the size of a small S.U.V., automatically turns the pages of each book as it scans it, at the rate of 1,000 pages per hour. A human operator places a book in a flat carriage, and then pneumatic robot fingers flip the pages ― delicately enough to handle rare volumes ― under the scanning eyes of digital cameras.

Like many other functions in our global economy, however, the real work has been happening far away, while we sleep. We are outsourcing the scanning of the universal library. Superstar, an entrepreneurial company based in Beijing, has scanned every book from 900 university libraries in China. It has already digitized 1.3 million unique titles in Chinese, which it estimates is about half of all the books published in the Chinese language since 1949. It costs $30 to scan a book at Stanford but only $10 in China.

Raj Reddy, a professor at Carnegie Mellon University, decided to move a fair-size English-language library to where the cheap subsidized scanners were. In 2004, he borrowed 30,000 volumes from the storage rooms of the Carnegie Mellon library and the Carnegie Library and packed them off to China in a single shipping container to be scanned by an assembly line of workers paid by the Chinese. His project, which he calls the Million Book Project, is churning out 100,000 pages per day at 20 scanning stations in India and China. Reddy hopes to reach a million digitized books in two years.

The idea is to seed the bookless developing world with easily available texts. Superstar sells copies of books it scans back to the same university libraries it scans from. A university can expand a typical 60,000-volume library into a 1.3 million-volume one overnight. At about 50 cents per digital book acquired, it's a cheap way for a library to increase its collection. Bill McCoy, the general manager of Adobe's e-publishing business, says: "Some of us have thousands of books at home, can walk to wonderful big-box bookstores and well-stocked libraries and can get Amazon.com to deliver next day. The most dramatic effect of digital libraries will be not on us, the well-booked, but on the billions of people worldwide who are underserved by ordinary paper books." It is these underbooked ― students in Mali, scientists in Kazakhstan, elderly people in Peru ― whose lives will be transformed when even the simplest unadorned version of the universal library is placed in their hands.

2. What Happens When Books Connect

The least important, but most discussed, aspects of digital reading have been these contentious questions: Will we give up the highly evolved technology of ink on paper and instead read on cumbersome machines? Or will we keep reading our paperbacks on the beach? For now, the answer is yes to both. Yes, publishers have lost millions of dollars on the long-prophesied e-book revolution that never occurred, while the number of physical books sold in the world each year continues to grow. At the same time, there are already more than a half a billion PDF documents on the Web that people happily read on computers without printing them out, and still more people now spend hours watching movies on microscopic cellphone screens. The arsenal of our current display technology ― from handheld gizmos to large flat screens ― is already good enough to move books to their next stage of evolution: a full digital scan.

Yet the common vision of the library's future (even the e-book future) assumes that books will remain isolated items, independent from one another, just as they are on shelves in your public library. There, each book is pretty much unaware of the ones next to it. When an author completes a work, it is fixed and finished. Its only movement comes when a reader picks it up to animate it with his or her imagination. In this vision, the main advantage of the coming digital library is portability ― the nifty translation of a book's full text into bits, which permits it to be read on a screen anywhere. But this vision misses the chief revolution birthed by scanning books: in the universal library, no book will be an island.

Turning inked letters into electronic dots that can be read on a screen is simply the first essential step in creating this new library. The real magic will come in the second act, as each word in each book is cross-linked, clustered, cited, extracted, indexed, analyzed, annotated, remixed, reassembled and woven deeper into the culture than ever before. In the new world of books, every bit informs another; every page reads all the other pages.

In recent years, hundreds of thousands of enthusiastic amateurs have written and cross-referenced an entire online encyclopedia called Wikipedia. Buoyed by this success, many nerds believe that a billion readers can reliably weave together the pages of old books, one hyperlink at a time. Those with a passion for a special subject, obscure author or favorite book will, over time, link up its important parts. Multiply that simple generous act by millions of readers, and the universal library can be integrated in full, by fans for fans.

In addition to a link, which explicitly connects one word or sentence or book to another, readers will also be able to add tags, a recent innovation on the Web but already a popular one. A tag is a public annotation, like a keyword or category name, that is hung on a file, page, picture or song, enabling anyone to search for that file. For instance, on the photo-sharing site Flickr, hundreds of viewers will "tag" a photo submitted by another user with their own simple classifications of what they think the picture is about: "goat," "Paris," "goofy," "beach party." Because tags are user-generated, when they move to the realm of books, they will be assigned faster, range wider and serve better than out-of-date schemes like the Dewey Decimal System, particularly in frontier or fringe areas like nanotechnology or body modification.

The link and the tag may be two of the most important inventions of the last 50 years. They get their initial wave of power when we first code them into bits of text, but their real transformative energies fire up as ordinary users click on them in the course of everyday Web surfing, unaware that each humdrum click "votes" on a link, elevating its rank of relevance. You may think you are just browsing, casually inspecting this paragraph or that page, but in fact you are anonymously marking up the Web with bread crumbs of attention. These bits of interest are gathered and analyzed by search engines in order to strengthen the relationship between the end points of every link and the connections suggested by each tag. This is a type of intelligence common on the Web, but previously foreign to the world of books.

Once a book has been integrated into the new expanded library by means of this linking, its text will no longer be separate from the text in other books. For instance, today a serious nonfiction book will usually have a bibliography and some kind of footnotes. When books are deeply linked, you'll be able to click on the title in any bibliography or any footnote and find the actual book referred to in the footnote. The books referenced in that book's bibliography will themselves be available, and so you can hop through the library in the same way we hop through Web links, traveling from footnote to footnote to footnote until you reach the bottom of things.

Next come the words. Just as a Web article on, say, aquariums, can have some of its words linked to definitions of fish terms, any and all words in a digitized book can be hyperlinked to other parts of other books. Books, including fiction, will become a web of names and a community of ideas.

Search engines are transforming our culture because they harness the power of relationships, which is all links really are. There are about 100 billion Web pages, and each page holds, on average, 10 links. That's a trillion electrified connections coursing through the Web. This tangle of relationships is precisely what gives the Web its immense force. The static world of book knowledge is about to be transformed by the same elevation of relationships, as each page in a book discovers other pages and other books. Once text is digital, books seep out of their bindings and weave themselves together. The collective intelligence of a library allows us to see things we can't see in a single, isolated book.

When books are digitized, reading becomes a community activity. Bookmarks can be shared with fellow readers. Marginalia can be broadcast. Bibliographies swapped. You might get an alert that your friend Carl has annotated a favorite book of yours. A moment later, his links are yours. In a curious way, the universal library becomes one very, very, very large single text: the world's only book.

3. Books: The Liquid Version

At the same time, once digitized, books can be unraveled into single pages or be reduced further, into snippets of a page. These snippets will be remixed into reordered books and virtual bookshelves. Just as the music audience now juggles and reorders songs into new albums (or "playlists," as they are called in iTunes), the universal library will encourage the creation of virtual "bookshelves" ― a collection of texts, some as short as a paragraph, others as long as entire books, that form a library shelf's worth of specialized information. And as with music playlists, once created, these "bookshelves" will be published and swapped in the public commons. Indeed, some authors will begin to write books to be read as snippets or to be remixed as pages. The ability to purchase, read and manipulate individual pages or sections is surely what will drive reference books (cookbooks, how-to manuals, travel guides) in the future. You might concoct your own "cookbook shelf" of Cajun recipes compiled from many different sources; it would include Web pages, magazine clippings and entire Cajun cookbooks. Amazon currently offers you a chance to publish your own bookshelves (Amazon calls them "listmanias") as annotated lists of books you want to recommend on a particular esoteric subject. And readers are already using Google Book Search to round up minilibraries on a certain topic ― all books about Sweden, for instance, or books on clocks. Once snippets, articles and pages of books become ubiquitous, shuffle-able and transferable, users will earn prestige and perhaps income for curating an excellent collection.

Libraries (as well as many individuals) aren't eager to relinquish ink-on-paper editions, because the printed book is by far the most durable and reliable backup technology we have. Printed books require no mediating device to read and thus are immune to technological obsolescence. Paper is also extremely stable, compared with, say, hard drives or even CD's. In this way, the stability and fixity of a bound book is a blessing. It sits there unchanging, true to its original creation. But it sits alone.

So what happens when all the books in the world become a single liquid fabric of interconnected words and ideas? Four things: First, works on the margins of popularity will find a small audience larger than the near-zero audience they usually have now. Far out in the "long tail" of the distribution curve ― that extended place of low-to-no sales where most of the books in the world live ― digital interlinking will lift the readership of almost any title, no matter how esoteric. Second, the universal library will deepen our grasp of history, as every original document in the course of civilization is scanned and cross-linked. Third, the universal library of all books will cultivate a new sense of authority. If you can truly incorporate all texts ― past and present, multilingual ― on a particular subject, then you can have a clearer sense of what we as a civilization, a species, do know and don't know. The white spaces of our collective ignorance are highlighted, while the golden peaks of our knowledge are drawn with completeness. This degree of authority is only rarely achieved in scholarship today, but it will become routine.

Finally, the full, complete universal library of all works becomes more than just a better Ask Jeeves. Search on the Web becomes a new infrastructure for entirely new functions and services. Right now, if you mash up Google Maps and Monster.com, you get maps of where jobs are located by salary. In the same way, it is easy to see that in the great library, everything that has ever been written about, for example, Trafalgar Square in London could be present on that spot via a screen. In the same way, every object, event or location on earth would "know" everything that has ever been written about it in any book, in any language, at any time. From this deep structuring of knowledge comes a new culture of interaction and participation.

The main drawback of this vision is a big one. So far, the universal library lacks books. Despite the best efforts of bloggers and the creators of the Wikipedia, most of the world's expertise still resides in books. And a universal library without the contents of books is no universal library at all.

There are dozens of excellent reasons that books should quickly be made part of the emerging Web. But so far they have not been, at least not in great numbers. And there is only one reason: the hegemony of the copy.

Post Reply