|
The
information extraction/screen scraper tool
is an innovative application in an exciting, new
application area. The ideas behind this application
have been influenced by real, yet unaddressed needs
in various Internet-based applications such as
spam filtering, automated content linking and enhancement,
keyword based advertising, etc. together with more
academic work involving the semantic web and advanced
search technology.
There are three main phases in the information extraction/screen
scraper process:
- Automated identification of suitable target
phrases/keywords or class of concepts.
- Automated
learning and refinement of the identification
process initially guided and monitored through
human supervision, using a small document collection
for
testing and verification.
- Automated information
extraction over a large document collection using
search engine crawling
components
together with the results of the learning
process.
The goal of the information extraction/screen
scraper is to automatically identify personal names,
corporation names, meeting locations, place names,
etc. mentioned in large collections of diverse,
unstructured, textual information. The presentation
and content of a particular text
will
also be learnt
automatically. This identification mechanism is utilised
to identify particular phrases or keywords in a piece
of text, irregardless of the content presentation
layout.
Users will then indicate which terms and pieces
of information is interesting and relevant using
a small sample collection of typical documents. The
system will then automatically learn and construct
rules that enable it to identify these pieces of
information, eventually matching the exact needs
of the user.
When users are satisfied with the system performance,
large collections of documents – either stored
internally on a corporate database system or collected
from the Internet using the Maltalinks search engine
document crawling components – will be processed
and the required information extracted automatically.
|
|
 |
| |
|
Information Extraction
For further information on this product or a
detailed quotation, kindly email us at info@maltalinks.com,
or refer to the contact
us page for more details.
|
|