Efforts in Language & Speech Technology Natural Language Processing Lab

Yüklə 445 b.

tarix	02.01.2018
ölçüsü	445 b.
	#19264

Efforts in Language & Speech Technology
Natural Language Processing Lab
Centre for Development of Advanced Computing
(Ministry of Communications & Information Technology)
‘Anusandhan Bhawan’,
C 56/1 Sector 62, Noida – 201 307, India
karunesharora@cdacnoida.com

Translation Support System (English to Hindi)

Test suite for Translation Support Systems

Knowledge Management
Parallel Corpus & Tools

Gyan Nidhi : Parallel Corpus

‘GyanNidhi’ which stands for ‘Knowledge Resource’ is parallel in 12 Indian languages , a project sponsored by TDIL, DIT, MC &IT, Govt of India

Gyan Nidhi: Multi-Lingual Aligned Parallel Corpus

What it is? The multilingual parallel text corpus contains the same text translated in more than one language.
What Gyan Nidhi contains? GyanNidhi corpus consists of text in English and 11 Indian languages (Hindi, Punjabi, Marathi, Bengali, Oriya, Gujarati, Telugu, Tamil, Kannada, Malayalam, Assamese). It aims to digitize 1 million pages altogether containing at least 50,000 pages in each Indian language and English.

Tools: Prabandhika: Corpus Manager

Categorisation of corpus data in various user-defined domains
Addition/Deletion/Modification of any Indian Language data files in HTML / RTF / TXT / XML format.
Selection of languages for viewing parallel corpus with data aligned up to paragraph level
Automatic selection and viewing of parallel paragraphs in multiple languages

Abstract and Metadata
Printing and saving parallel data in Unicode format

Sample Screen Shot : Prabandhika

Tools: Vishleshika : Statistical Text Analyzer

Vishleshika is a tool for Statistical Text Analysis for Hindi extendible to other Indian Languages text
It examines input text and generates various statistics, e.g.:

Sentence statistics
Word statistics
Character statistics

Text Analyzer presents analysis in Textual as well as Graphical form.

Sample output: Character statistics

Speech Technology and tools

Other Areas of expertise

Areas for future work

Yüklə 445 b.

Dostları ilə paylaş:

Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2024
rəhbərliyinə müraciət