Maltalinks Search Information Extraction WebShots Explorer Toolbar  
 
   
The Digital Paper Archive is a digital archiving system for printed material, such as books and newspapers. The system processes and archives the material using an OCR engine while performing image compression on scanned images. The scanned text is marked up in XML format and indexed in parallel.

The search results utilise the OCR data to return relevant portions of the scanned images, rather than text. This result format allows for the relevant page snippet to be displayed in high resolution to allow for verification of the search relevance. A low resolution thumbnail can be displayed automatically while automated links to a payment system are in place for download of high resolution scanned images. This system was designed from the ground up with publishers in mind and has been developed with input from major publishers.

Our research on this topic focuses on achieving high quality search results using our innovative patent pending technology, processing and managing very large amounts of data using clusters of inexpensive computers, utilising data mining techniques to associate related results and experimenting new machine translation techniques. Information extraction techniques are also being developed to create networks of related persons, events and place names automatically. A multilingual query translation system is also being developed to allow multilingual material to be retrieved.

 
 

Digital Paper Archive

For further information on this product or a detailed quotation, kindly email us at info@maltalinks.com, or refer to the contact us page for more details.

 

 
 
Search: This Website The Internet