Efforts in Language & Speech Technology Natural Language Processing Lab



Yüklə 445 b.
tarix02.01.2018
ölçüsü445 b.


  • Efforts in Language & Speech Technology

  • Natural Language Processing Lab

  • Centre for Development of Advanced Computing

  • (Ministry of Communications & Information Technology)

  • ‘Anusandhan Bhawan’,

  • C 56/1 Sector 62, Noida – 201 307, India

  • karunesharora@cdacnoida.com




Translation Support System (English to Hindi)







Test suite for Translation Support Systems



  • Knowledge Management

  • Parallel Corpus & Tools



Gyan Nidhi : Parallel Corpus

  • GyanNidhi’ which stands for ‘Knowledge Resource’ is parallel in 12 Indian languages , a project sponsored by TDIL, DIT, MC &IT, Govt of India



Gyan Nidhi: Multi-Lingual Aligned Parallel Corpus

  • What it is? The multilingual parallel text corpus contains the same text translated in more than one language.

  • What Gyan Nidhi contains? GyanNidhi corpus consists of text in English and 11 Indian languages (Hindi, Punjabi, Marathi, Bengali, Oriya, Gujarati, Telugu, Tamil, Kannada, Malayalam, Assamese). It aims to digitize 1 million pages altogether containing at least 50,000 pages in each Indian language and English.







Tools: Prabandhika: Corpus Manager

  • Categorisation of corpus data in various user-defined domains

  • Addition/Deletion/Modification of any Indian Language data files in HTML / RTF / TXT / XML format.

  • Selection of languages for viewing parallel corpus with data aligned up to paragraph level

  • Automatic selection and viewing of parallel paragraphs in multiple languages

    • Abstract and Metadata
    • Printing and saving parallel data in Unicode format


Sample Screen Shot : Prabandhika



Tools: Vishleshika : Statistical Text Analyzer

  • Vishleshika is a tool for Statistical Text Analysis for Hindi extendible to other Indian Languages text

  • It examines input text and generates various statistics, e.g.:

      • Sentence statistics
      • Word statistics
      • Character statistics
  • Text Analyzer presents analysis in Textual as well as Graphical form.



Sample output: Character statistics





  • Speech Technology and tools









Other Areas of expertise



Areas for future work






Dostları ilə paylaş:


Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2019
rəhbərliyinə müraciət

    Ana səhifə