12
Nikolaus P. Himmelmann
–
a brief characterization of the content of the session (what topic is being
talked about? what kind of communicative event [narrative, conversa-
tion, song, etc.] is being documented?);
–
links between different files which together constitute the session, e.g. a
media file (audio or video) and a file containing a transcription, trans-
lation, and various types of commentary relevant for interpreting the re-
cording contained in the media file (on which see further below).
The metadata on both levels have two interrelated functions. On the one
hand, they facilitate access to a documentation or a specific record within a
documentation by providing key access information in a standardized for-
mat (what, where, when, etc.). In this function, they are similar to a cata-
logue in a library and we can thus speak of a cataloguing function.
9
On the
other hand, they have an
organizational function in that they define the
structure of the corpus which, in particular in the case of documentations in
digital format, in turn provides the basis for various procedures such as
searching, copying, or filtering within a single documentation or across a
set of documentations. Obviously, a metadata standard which targets the
organizational function has to be richer and more elaborate than one which
targets the cataloguing function. The former is actually a corpus manage-
ment tool, which defines digital structures and supports various computa-
tional procedures, rather than just a standard for organizing a catalogue.
Currently there exist two metadata standards which in fact complement
each other in that they target these different functions. The OLAC standard
targets exclusively the cataloguing function and provides an easy and fast
access to a large number of diverse repositories of primary data on a
worldwide scale (in both digital and non-digital formats). The IMDI stan-
dard, which incorporates all the information included in the OLAC standard
and hence is compatible with it, is actually a corpus management tool
which primarily targets digitally archived language documentations. Further
discussion of metadata concepts and standards is found in Chapters 4 and
13.
Apart from metadata, there is in most instances also a need for further
information accompanying each recording as well as the documentation as
a whole in order to make the corpus of primary data useful to users who do
not know the language being documented. On the level of individual ses-
sions, such additional information is called here an annotation.
10
Thus, in
the case of audio or video recordings of communicative events, it is obvi-
ously useful to provide at least a transcription and a translation so that users
Chapter 1 – Language documentation: What is it and what is it good for?
13
not familiar with the language are able to understand what is going on in
the recording.
However, the exact extent and format of the annotations that should be
included in each session is a matter of debate. It is common
to distinguish
between minimal and more elaborate annotation schemes. A widely as-
sumed minimal annotation scheme consists of just a transcription and a free
translation which should accompany all, or at least a substantial number of,
primary data segments. More elaborate annotation schemes include various
levels of interlinear glossing, grammatical as well ethnographical commen-
tary, and extensive cross-referencing between the various sessions and re-
sources compiled in a given documentation. See further Chapters 8 and 9.
On the level of the overall documentation, information accompanying
the primary data set other than metadata is, for lack of a well-established
term, subsumed here under the heading general access resources
(alterna-
tively, it could also simply be called “annotation”). Such general (in the
sense of: relevant for the documentation as a whole) access resources
would include:
–
a general introduction which provides background information on the
speech community and language (language name(s), affiliation, major
varieties, etc.), the fieldwork setting(s), the methods used in recording
primary data, an overview of the contents, structure, and scope of the
primary data corpus and its quality;
–
brief sketches of major ethnographic and grammatical features being
documented;
–
an explication of the various conventions that are being used (orthogra-
phy, glossing abbreviations, other abbreviations);
–
indices for languages/varieties, key analytic concepts, etc.;
–
links and references to other resources (books and articles previously
published on the variety
or community being documented; other pro-
jects relating to the community or its neighbors, etc.).
For further discussion of some aspects of relevance here, see Chapters 8
and 12.
Table 1 provides a schematic overview of the components of the lan-
guage documentation format sketched in this section.