| The
Digital Paper Archive is
a digital archiving system for printed material,
such as books and newspapers. The system processes
and
archives
the material using an OCR engine while performing
image compression on scanned images. The scanned
text is marked up in XML format and indexed in parallel.
The search results utilise the OCR data to return
relevant portions of the scanned images, rather
than text. This result format allows for the relevant
page snippet to be displayed in high resolution
to allow for verification of the search relevance.
A low resolution thumbnail can be displayed automatically
while automated links to a payment system are in
place for download of high resolution scanned images.
This system was designed from the ground up with
publishers in mind and has been developed with
input from major publishers.
Our research on this topic focuses on achieving
high quality search results using our innovative
patent pending technology, processing and managing
very
large
amounts of data
using clusters
of inexpensive
computers, utilising data mining techniques to
associate related results and experimenting new
machine translation techniques. Information extraction
techniques are also being developed to create networks
of related persons, events and place names automatically.
A multilingual query translation system is also
being developed to allow multilingual material
to be retrieved.
|
|
 |
| |
|
Digital Paper Archive
For further information on this product or a
detailed quotation, kindly email us at info@maltalinks.com,
or refer to the contact
us page for more details.
|
|