Aims of this talk Demonstrates using GATE for automating SW-specific tasks such as semantic annotation and ontology learning from texts SARDINE: pattern-based relation extraction in the fisheries domain Adding new concepts and instances to the ontology Finding relations between existing concepts in the ontology SPRAT: generic version of SARDINE
Recap: IE for the Semantic Web Traditional IE is based on a flat structure, e.g. recognising Person, Location, Organisation, Date, Time etc. For the Semantic Web, we need information in a hierarchical structure Idea is that we attach annotations to the documents, pointing to concepts in an ontology Information can be exported as an ontology annotated with instances
The NeOn project NeOn (Networking Ontologies) is a 4-year 14.7 million Euro EU project involving 14 European partners. Focus on using ontologies for large-scale semantic applications in distributed organizations Handles multiple networked ontologies that exist in a particular context, are created collaboratively, and might be highly dynamic and constantly evolving.
ODd SOFAS The Food and Agricultural Organisation of the UN have odd sofas…..
Wall climbing sofa
FAO Case Study Actually, it’s nothing to do with sofas, or any kind of seating. They do, however, have an Ontology-driven stock over-fishing alert system Focuses on agricultural sector and information management for hunger prevention Case study aims at management of alerts to avoid over-fishing in already stretched global waters Role of GATE is to analyse textual resources to find new information such as new fish names, and relations between ontology elements, e.g. “Atlantic cod are fished in the Gulf of Maine”
SARDINE SARDINE identify mentions of fish species from text It identifies - existing fish names listed in the ontology and their morphological variants
- potential new fish names not listed in the ontology
- potential relations between fish names
For the new fish, it attempts to classify them in the ontology, based on linguistic information such as synonyms and hyponyms of existing fish It may generate properties also for existing fish in the ontology
Synonyms: - mummichogs (fundulus heteroclitus)
Names appearing in lists: - “plankton, herring and clams....”
- “clams, herring and other types of fish”
More specific fish names: - Japanese flounder
- Red salmon
- Suberites sponges
Example of JAPE rule (1) Example: “Suberites sponges” (where “sponge” is a known class) Rule: AdjClass ( ({Token.category == JJ}) ({Class}):super ):sub --> :sub.SardineSubclass = {rule=AdjClass}, :super.SardineSuperclass = {rule=AdjClass}, … … …
Example of JAPE rule (2) Example: “Frogs are a kind of amphibian.” Rule:Subclass1 ( ({NP}):sub ( {Lookup.minorType == be} {Token.category == DT} {Lookup.majorType == kind} ) ({NP}):super ) --> … …
Annotated text in GATE
Augmenting the Ontology The new classes found are linked to existing classes in the ontology For existing fish, and new fish which we identified as a synonym or hyponym of an existing fish, the link is to an existing ontology instance When we don't identify a link to any existing fish, we create a new concept The changes to the ontology are stored and can be verified later by human experts
Generated “animal” ontology
Recognising components from the ontology In addition to the standard IE components, we use some special ontology components. The OntoRootGazetteer enables us to match words or phrases in the text with classes, instances or properties in an ontology, as any morphological variant Morphological analysis is performed on both text and ontology, then matching is done between the two at the root level. Text is annotated with features containing the root and original string(s) When new elements are added to the ontology, these features can be used to regenerate alternative forms
Modifying the ontology We developed a special GATE plugin called NEBOnE (Named Entity Based ONtology Editor) This reuses technology taken from CLOnE (Controlled Language ONtology Editor) CLOnE is designed to create new classes, instances etc from raw (controlled) text generated by the user NEBOnE enables changes to be made to the ontology based on information extraction from input texts (e.g. web pages) in natural language Morphological analysis enables both root forms and variants to be added to the ontology (as properties), along with other variants (e.g. capitalisation)
Finding relations between known elements In this case study, we use existing information from the ontology to find relations between them. e.g. fish species -- gear type We have already annotated all fish species, gear types, fishing areas and so on in the text, based on ontology lookup JAPE grammar first finds the subject of the document (a gear type) and adds the information as a document feature When a species name is found, we create a new annotation for the relation “gear_used”, with a property denoting the species, and another property denoting the ID number of the gear.
Viewing relations
Using ANNIC to view results By running our application on a Lucene datastore, we can then use ANNIC to view the results Search for the pattern consisting of the name of the relation annotation (in this case “gear_used”) Show the relevant features (species, gear ID, gear type)
Using ANNIC to view results
SPRAT This is a generic version of SARDINE that runs on all kinds of texts, not just fisheries Does not require a seed ontology Useful for building a domain ontology from scratch Tested on wikipedia pages
How well can we do it? Traditional NE recognition on news texts: ~90% precision/recall Ontology-based information extraction on news texts: ~80% precision/recall Pattern-based relation extraction on Wikipedia texts: high accuracy but low recall (or vice versa depending on setup) Relation finding between known entities: ~90% precision/recall
More information Neon Project: http://www.neon-project.org Neon Toolkit is freely available: http://www.neon-toolkit.org SARDINE application can be downloaded from the GATE website http://gate.ac.uk/projects/neon/sardine
Dostları ilə paylaş: |