Digital Paper Achive Maltalinks Search WebShots Explorer Toolbar  
 
   

The information extraction/screen scraper tool is an innovative application in an exciting, new application area. The ideas behind this application have been influenced by real, yet unaddressed needs in various Internet-based applications such as spam filtering, automated content linking and enhancement, keyword based advertising, etc. together with more academic work involving the semantic web and advanced search technology.

There are three main phases in the information extraction/screen scraper process:

  • Automated identification of suitable target phrases/keywords or class of concepts.
  • Automated learning and refinement of the identification process initially guided and monitored through human supervision, using a small document collection for testing and verification.
  • Automated information extraction over a large document collection using search engine crawling components together with the results of the learning process.

The goal of the information extraction/screen scraper is to automatically identify personal names, corporation names, meeting locations, place names, etc. mentioned in large collections of diverse, unstructured, textual information. The presentation and content of a particular text will also be learnt automatically. This identification mechanism is utilised to identify particular phrases or keywords in a piece of text, irregardless of the content presentation layout.

Users will then indicate which terms and pieces of information is interesting and relevant using a small sample collection of typical documents. The system will then automatically learn and construct rules that enable it to identify these pieces of information, eventually matching the exact needs of the user.

When users are satisfied with the system performance, large collections of documents – either stored internally on a corporate database system or collected from the Internet using the Maltalinks search engine document crawling components – will be processed and the required information extracted automatically.

 
 

Information Extraction

For further information on this product or a detailed quotation, kindly email us at info@maltalinks.com, or refer to the contact us page for more details.

 

 
 
Search: This Website The Internet