|
Efforts in Language & Speech Technology Natural Language Processing Lab
|
tarix | 02.01.2018 | ölçüsü | 445 b. | | #19264 |
|
Natural Language Processing Lab Centre for Development of Advanced Computing (Ministry of Communications & Information Technology) ‘Anusandhan Bhawan’, C 56/1 Sector 62, Noida – 201 307, India karunesharora@cdacnoida.com
Translation Support System (English to Hindi)
Knowledge Management Parallel Corpus & Tools
Gyan Nidhi : Parallel Corpus ‘GyanNidhi’ which stands for ‘Knowledge Resource’ is parallel in 12 Indian languages , a project sponsored by TDIL, DIT, MC &IT, Govt of India
What it is? The multilingual parallel text corpus contains the same text translated in more than one language. What Gyan Nidhi contains? GyanNidhi corpus consists of text in English and 11 Indian languages (Hindi, Punjabi, Marathi, Bengali, Oriya, Gujarati, Telugu, Tamil, Kannada, Malayalam, Assamese). It aims to digitize 1 million pages altogether containing at least 50,000 pages in each Indian language and English.
Categorisation of corpus data in various user-defined domains Addition/Deletion/Modification of any Indian Language data files in HTML / RTF / TXT / XML format. Selection of languages for viewing parallel corpus with data aligned up to paragraph level Automatic selection and viewing of parallel paragraphs in multiple languages - Abstract and Metadata
- Printing and saving parallel data in Unicode format
Sample Screen Shot : Prabandhika
Vishleshika is a tool for Statistical Text Analysis for Hindi extendible to other Indian Languages text It examines input text and generates various statistics, e.g.: - Sentence statistics
- Word statistics
- Character statistics
Text Analyzer presents analysis in Textual as well as Graphical form.
Speech Technology and tools
Other Areas of expertise
Dostları ilə paylaş: |
|
|