Indo WordNet a wordNet for Hindi Introduction



Yüklə 485 b.
tarix02.01.2018
ölçüsü485 b.
#19271


Indo WordNet A WordNet for Hindi


Introduction

  • WordNet – A lexical database

  • Searching the dictionary conceptually

  • Different organizing principle for different syntactic category

  • Synsets or the Synonymy Sets are the basic building blocks

  • Lexical knowledge base is the heart of any intelligent information processing system



WordNet for Hindi

  • Hindi WordNet is an on-line lexical database for Hindi language

  • Design has been inspired by the famous English WordNet

  • Unique features

    • Graded antonyms and meronymy relationships
    • Efficient underlying database design
    • Cross part of speech linkage


Semantic relations in WordNet

  • Synonymy

  • Hypernymy / Hyponymy

  • Antonymy

  • Meronymy / Holonymy

  • Gradation

  • Entailment

  • Troponymy



Semantic Relations

  • Synonymy

    • True synonyms are rare
    • Synonymy related to a context
      • {Gar ‚ kmara}
      • {Gar ‚ Aavaasa}
      • {Gar ‚ janmakuMDlaIya sqaana}
      • {Gar ‚ svadoSa}


Semantic Relations

  • Hypernymy and Hyponymy

    • Relation between word meaning (synsets)
    • X is a hyponym of Y if X is a kind of Y
    • Hyponymy is transitive and asymmetrical
    • Hypernymy is inverse of Hyponymy
  • lionanimalliving entityentity

  • Saor  pSau  sajaIva  Aist%va



Semantic Relations

  • Antonymy

    • Oppositeness in meaning
    • Relation between word forms
  • Meronymy and Holonymy

    • Part-whole relation, branch is a part of tree
    • X is a meronymy of Y if X is a part of Y
    • Meronym is transitive and asymmetrical
    • Holonymy is inverse relation of Meronymy


Troponym and Entailment

  • Entailment

    • { Kra-Ta laonaa – saaonaa £
  • Troponym

    • { laÐgaD,anaa ‚ kdmatala krnaa – calanaa £
    • ¡ fusafusaanaa – baaolanaa £


Antonymy Relation



Meronymy Relation



Gradation



Classification of verbs

  • Simple verbs (sarla iËyaa) : saaonaa‚ Kanaa

  • Conjunct verbs (saMyau@t iËyaa)

  • Compound verbs (samaaisak iËyaa) Á Kanaa–pInaa

  • Causative verbs (p`orNaa%mak iËyaa) Á saulavaanaa





Design and Implementation

  • Basic relations or lexical links are between synonym sets

  • Lexical database is stored in MySQL package

  • Sub-tasks identified

    • Database design
    • Data entry interface
    • Implementation of Organizer Utility
    • Application programs to access and display the information in the lexical database




Data Entry Interface

  • GUI designed in Java/JFC

  • Separate screen for data entry of different categories

  • Automatic generation of synset id’s

  • Screen to view the entered data







Organizer Utility

  • Designed to preprocess the data

  • Reflexive pointers are generated

    • e.g. if A hypernym of B then B hyponym of A is automatically generated
  • Each semantic relation is mapped to a separate table (normalized)

  • Font conversion

    • Roman Hindi  DV-TTYogesh


Storage Structure

  • Relation between Synsets

    • tblNounHypernyms
  • Relation between Word-forms

    • tblNounAntonyms








System Statistics

  • Over 8500 synsets entered in the database

  • MySQL used as the back-end database server

  • Data entry interface designed in Java/JFC

  • Organizer utility written in perl

  • Web based data retrieval system developed in HTML and PHP

  • DV-TTYogesh Font used to display Hindi Text



Application of WordNet

  • Word Sense Disambiguation

  • Interface to Internet Search Engines

  • Text classification

  • Information Retrieval system

  • Document Similarity



Conclusion

  • The structure of Hindi Language have been studied and new features have been introduced in the Hindi WordNet

  • Currently over 8500 synsets have been inserted into the database

  • The MySQL database has been found to be quite efficient

  • The web interface for querying the lexical database is under continuous evolution



Yüklə 485 b.

Dostları ilə paylaş:




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə