Essentials of Language Documentation

Yüklə 5,72 Mb.

Pdf görüntüsü

səhifə	42/144
tarix	22.07.2018
ölçüsü	5,72 Mb.
	#57633

1 ... 38 39 40 41 42 43 44 45 ... 144

Chapter 4 – Data and language documentation

101

preparations for assigning their rights into the future by including informa-

tion in your will and ensuring that your executors understand how to assign

them on your death.

3.3.1. Archiving text materials

The preferred format for archiving text materials is eXtensible Markup

Language (XML), a document description language used to encode the

content of structured documents (see Sperberg-McQueen and Burnard

2002). XML is a subset of SGML (standard generalized markup language)

and is used to explicitly describe a domain of knowledge through markup

tags enclosed in angle brackets (see Chapter 14 with the example of a ‘play

structure’ implicit in a published document). Each part of a structured do-

cument is described within a defined and logical structure (stored in XML

schemas or DTDs ‘document type definitions’). XML is a good archival

format because XML documents explicitly represent data structure, and are

directly readable by humans even if computer software to display the

documents is not available.

XML documents are typically created by export from working context

materials, rather than being directly written by the researcher, because the

process of writing well-structured XML tends to be tedious and error prone

(various XML editors exist and these can be used to create documents, to

check markup tag syntax [well formedness], to create DTDs, and to ensure

that a document complies with a schema or DTD). XML encoded docu-

ments can be transformed into various archival and presentation formats by

XSLT, extensible stylesheet language transformations. Thus, an XSLT

could create a concordance of an annotated text collection, or HTML files

for web publication. Archivists can provide advice on possible transforma-

tions of XML documents.

The following are two examples of XML encoding. First, consider the

structure of a typical bilingual lexicon (such as seen in the Guwamu example

presented above):

lexicons contain entries;

the attributes of entries are: form, category, subcategory, language,

meaning specification (and any other additional information such as

notes, speaker, recorder, sense relations, sentence examples);

meaning specification can be gloss (for morpheme-by-morpheme gloss-

ing and finderlist production) and definition;

102

Peter K. Austin

cross-references to other lexical entries have a sequential order chosen

by the lexicographer;

cross-references to sentences examples also have a specified sequential

order.

Table 3 shows the Guwamu sample entry discussed above in XML form,

which would be a possible archival representation.

Table 3. Example of an XML structure (lexicon entry)

Gu

n

n

k.o.kangaroo

male red kangaroo

used as a generic term for kangaroos

SAW

WW

13/Mar/2005

gula

gumbarr

dhugandu

Gu206

Gu255

If we view this data using XML-aware software such as an XML editor

or a

web browser such as Mozilla Firefox or the current version of MS Internet

Explorer, the hierarchical relationships between the data entities are dis-

played as in Figure 2.

Chapter 4 – Data and language documentation

103

Figure 2. XML structure display (lexicon entry)

For an annotated corpus we can set up a structure where:

the corpus contains sentences;

sentence properties are: sentence number, sentence form, sentence gloss,

speaker, recorder, sentence source reference, grammatical notes;

sentences contain words in sequential order;

word properties are: word form, word gloss;

words contain morphemes in sequential order;

morpheme properties are morpheme form, morpheme gloss, morpheme

category, morpheme subcategory.

Yüklə 5,72 Mb.

Dostları ilə paylaş:

1 ... 38 39 40 41 42 43 44 45 ... 144