|
Evaluation of Hindi→English, Marathi→English and English→Hindi clir at fire 2008 Nilesh Padariya, Manoj Chinnakotla, Ajay Nagesh and Om P. Damani
|
tarix | 02.01.2018 | ölçüsü | 524 b. | | #19267 |
|
Evaluation of Hindi→English, Marathi→English and English→Hindi CLIR at FIRE 2008
CLIR System Architecture
System Flow Example
First Participation in CLEF 2007 Developed basic Query Translation system for Hindi to English and Marathi to English Transliteration Algorithm - Simple rule-based system
- Edit-distance based index-lookup to retrieve index tokens
- Accuracy: ~ 65% at top 20
Translation Disambiguation Performance at CLEF 2007 - Hindi to English: 67.06 % of Monolingual
- Marathi to English: 56.09% of Monolingual
Failure Analysis for CLEF 2007
Collection of parallel list of names for evaluation - Available datasets too small
- Do not contain a good mix of words from native and loans words
- Our current dataset: around 25K words
Algorithmic Improvements Current accuracy figures - Hindi to English: 80% accuracy at rank 5
- English to Hindi: Evaluation to be done
Translation Disambiguation Empirical study on translation disambiguation strategies and parameter choices Choice of disambiguation strategy - Best Pair
- Best cohesion
- Best sequence
- Iterative
Various parameters to the iterative disambiguation algorithm - Number of final candidates to choose
- Use of weights?
- Similarity measure
Datasets used: TREC AP, CLEF 2007 Best choice: Iterative, Dice Coefficient, 1 translation candidate, weights do not improve much
Only Transliteration on Query? Motivation - Quite common to use actual Hindi word in English documents in Indian domains
- Examples:
- NEs crucial for fetching relevant documents
Experiments - Transliterate whole query
- Transliterate only NEs, no translation
Overall Results (Title Only)
P-R Curves for English Target
P-R Curves for Hindi Target
Results of Transliteration Experiment
P-R Curves for Transliteration Expt.
Conclusion Improved transliteration and translation disambiguation modules based on CLEF 2007 analysis Hindi to English CLIR performance is 75% of monolingual and Marathi to English is 64% of monolingual Need further investigation on results especially the monolingual baselines – Hindi, Marathi and English Only transliteration achieves around 35% of monolingual performance in Hindi and 25% in Marathi
Acknowledgements The second author is supported by the Infosys Fellowship Award Project linguists at CFILT, IIT Bombay
References S. Tarek and K. Grzegorz, Substring-Based Transliteration, In Proceedings of ACL, 2007 F. Huang, Cluster-specific named entity transliteration, In HLT ’05, pages 435–442, Morristown, NJ, USA, 2005.
I. Ounis, G. Amati, P. V., B. He, C. Macdonald, and Johnson, Terrier Information Retrieval Platform, In Proceedings of ECIR 2005, volume 3408 of Lecture Notes in Computer Science, pages 517–519. Springer, 2005.
Christof Monz and Bonnie J. Dorr, Iterative Translation Disambiguation for Cross-Language Information Retrieval, In SIGIR ’05, Pages 520-527, New York, USA, ACM Press Nicola Bertoldi and Marcello Federico, Statistical Models for Monolingual and Bilingual Information Retrieval, Information Retrieval, 7 (1-2): 53-72, 2004
References (Contd..) Martin Braschler and Carol Peters, Cross Language Evaluation Forum: Objectives, Results, Achievements,Information Retrieval, 7 (1-2): 7-31, 2004 Ricardo BaezaYates and Berthier RibeiroNeto, Modern Information Retrieval, Pearson Education, 2005. Dan Gusfield, Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology,Cambridge University Press, 1997.
Thanks!
Dostları ilə paylaş: |
|
|