Introduction WordNet – A lexical database Searching the dictionary conceptually Different organizing principle for different syntactic category Synsets or the Synonymy Sets are the basic building blocks Lexical knowledge base is the heart of any intelligent information processing system
WordNet for Hindi Hindi WordNet is an on-line lexical database for Hindi language Unique features - Graded antonyms and meronymy relationships
- Efficient underlying database design
- Cross part of speech linkage
Semantic relations in WordNet Synonymy Hypernymy / Hyponymy Antonymy Meronymy / Holonymy Gradation Entailment Troponymy
Synonymy - True synonyms are rare
- Synonymy related to a context
- {Gar ‚ kmara}
- {Gar ‚ Aavaasa}
- {Gar ‚ janmakuMDlaIya sqaana}
- {Gar ‚ svadoSa}
Semantic Relations Hypernymy and Hyponymy - Relation between word meaning (synsets)
- X is a hyponym of Y if X is a kind of Y
- Hyponymy is transitive and asymmetrical
- Hypernymy is inverse of Hyponymy
lionanimalliving entityentity Saor pSau sajaIva Aist%va
Semantic Relations Antonymy - Oppositeness in meaning
- Relation between word forms
Meronymy and Holonymy - Part-whole relation, branch is a part of tree
- X is a meronymy of Y if X is a part of Y
- Meronym is transitive and asymmetrical
- Holonymy is inverse relation of Meronymy
Troponym and Entailment Entailment - { Kra-Ta laonaa – saaonaa £
-
Troponym - { laÐgaD,anaa ‚ kdmatala krnaa – calanaa £
- ¡ fusafusaanaa – baaolanaa £
Antonymy Relation
Meronymy Relation
Gradation
Classification of verbs Simple verbs (sarla iËyaa) : saaonaa‚ Kanaa Conjunct verbs (saMyau@t iËyaa) Compound verbs (samaaisak iËyaa) Á Kanaa–pInaa Causative verbs (p`orNaa%mak iËyaa) Á saulavaanaa
Basic relations or lexical links are between synonym sets Lexical database is stored in MySQL package Sub-tasks identified - Database design
- Data entry interface
- Implementation of Organizer Utility
- Application programs to access and display the information in the lexical database
Data Entry Interface GUI designed in Java/JFC Automatic generation of synset id’s Screen to view the entered data
Organizer Utility Designed to preprocess the data Reflexive pointers are generated - e.g. if A hypernym of B then B hyponym of A is automatically generated
Each semantic relation is mapped to a separate table (normalized) Font conversion - Roman Hindi DV-TTYogesh
Relation between Synsets Relation between Word-forms
System Statistics Over 8500 synsets entered in the database MySQL used as the back-end database server Data entry interface designed in Java/JFC Organizer utility written in perl Web based data retrieval system developed in HTML and PHP
Application of WordNet Word Sense Disambiguation Interface to Internet Search Engines Text classification Information Retrieval system Document Similarity
Conclusion The structure of Hindi Language have been studied and new features have been introduced in the Hindi WordNet The MySQL database has been found to be quite efficient The web interface for querying the lexical database is under continuous evolution
Dostları ilə paylaş: |