The Toyo Bunko (Oriental Library) is a leading library in the field of Asian studies. Its collection amounts to 880,000 books of historical importance, and an especially interesting collection in the Toyo Bunko is "Morrison Library," which consists of 24,000 books about China and Asia written in several European languages. Regarding its relevance, scale and coverage, we decided to start our digital archive project from the Morrison Library, and initiated the digitization of precious books in 2002.
We are in the pursuit of two research directions. The first direction concerns the application of optical character recognition (OCR), machine translation and image processing for the automatic analysis of digitized documents. Our motivation behind this direction is the need for the management of large number of books; that is, we put more emphasis on speed in the speed and precision tradeoff in order to increase the number of books in the digital archive. Although the current OCR technology is imperfect, even imperfect results can support useful search. The second direction is the collaborative annotation environment for digital cultural resources. We begin with closed annotation by domain experts, but in the future we aim at establishing a mechanism for soliciting collective annotation in a collaborative environment.
User interface on the Toyo Bunko Portal is designed so that it suggests as many navigational links as possible to increase for a user the opportunity of browsing deep into the archive. We provide other mechanisms for defining various contexts in which digital archives are to be viewed using our proposed database engine. Finally, this project is within a framework of Digital Silk Road Project.
We added 53 new books to become the collection of 203 books, 59,358 pages.