108
Peter K. Austin
Media Player formats are all compressed in a way that loses information;
they are useful for working and presentation (e.g. for publication, on web
sites) but not suitable for archiving.
More on sound archiving
There are a large number of well-equipped sound archives around the world,
ranging from regional, to national, to international coverage. Some, such as the
Austrian National Sound Archive have been established for a long time and
have extensive experience with material in older ‘legacy’ formats. The Interna-
tional Association of Sound Archives (IASA) publishes lots of valuable and up-
to-date advice about archiving issues, and the Language Archives Newsletter
(http://www.mpi.nl/LAN) focuses on archiving for linguistic research.
3.4. Presentation, publication, and distribution
One of the ways that the presentation, publication, and distribution of rich
language documentations can be achieved currently is via multimedia
which links media, annotations (time-aligned transcriptions, analysis and
translations, hyperlinks) and metadata. One such format is linked files (in-
cluding HTML, MP3 sound clips, QuickTime, etc.) distributed via the
world wide web, but bandwidth can be problem for publication of media
files – even small movies of a few minutes in a compressed format can be
megabytes in size and take a long time to download via slow connections
(the use of video streaming software can partially overcome this limitation).
There is also SMIL (‘Synchronized Multimedia Integration Language’)
which is an application of XML to encode mixed media, text and image
information in a presentation form.
For highly complex richly annotated and linked media currently we
need to use multimedia platforms such as Macromedia Director, delivered
on CD-ROM or DVD as a publication format (see Chapter 15). Unfortu-
nately, the future of these formats and the carriers is unclear and how we
can archive multimedia for the future is also currently problematic. One
current major need is good multimedia players and ways for users to inter-
act with the rich documentations; it is necessary to model and design inter-
faces and access formats for various audiences. An example of such a for-
mat is the Spoken Karaim CD, described by Csató and Nathan (2003b),
Chapter 4 – Data and language documentation
109
which presents video and audio recordings with accompanying transcrip-
tions, translations, glosses, lexicon, and cultural information, all of which
are linked and interactive. The interface enables users to explore their own
pathways through the corpus and to search, collect items of interest, back-
track, and interact with the corpus. It has a simple attractive interface that
enables maximum interactivity without forcing the user to digest too much
information, and has been used for Karaim language support in education,
language maintenance, and revitalization (Nathan and Csató, forthc.).
Figure 5 is a screenshot from a CD-ROM of conversational documen-
tary materials in the Sasak language of eastern Indonesia (Austin, Jukes,
and Nathan 2000) which is based on the Karaim model. The top-left win-
dow shows images of the consultants who worked on the corpus, and below
it a Sasak lexicon arranged alphabetically (clicking on an entry in the lexi-
con reveals full details of the individual item in the top left window in place
of the images), and on the top right is the Sasak transcription of the conver-
sation (colors indicate the two speakers, their voices can be heard in the left
and right channels respectively of the associated time-aligned digital stereo
recording). Below the transcription is a small central window displaying
morpheme-by-morpheme analysis and gloss for a selected item in the text,
and below that, a display of the free translation in English of the speaker
turns (again color-coded). In the lower bottom left of the display there is a
search facility which the user can employ to find occurrences of morphemes
Figure 5. Screenshot from a CD-ROM presenting Sasak conversational materials
110
Peter K. Austin
or glosses of interest throughout the corpus, and in the top left is a set of
buttons that produce pronominal inflected forms of verbs (via a morpho-
logical generator) when the user moves them over a selected lexical entry
in the top left window (see Chapter 15 and Nathan 2000 b for further details
about the morphological generator developed for the Spoken Karaim CD).
4. Conclusions
Language documentation is an emerging field that involves recording,
analysis, annotation, archiving, and publication of rich and complex data.
By properly structuring the data representations and planning methods to
flow data between different formats and contexts, you can work produc-
tively with your materials, as well as publish and distribute them for others
and archive your resources to preserve them for the future. It is important
that all these aspects of a documentation project be incorporated in its plan-
ning and execution, in order to ensure maximally effective and useful
documentation.
Acknowledgements
Most of the material presented here has been “road tested” in lectures at
Frankfurt
University, Uppsala
University, the School
of Oriental
and
African
Studies, and the DoBeS summer school; I am grateful for comments and
feedback from audiences on these occasions. A proportion of this chapter
derives from information on language documentation and guidelines for
grant applicants co-written by David Nathan and myself and published on
the Hans Rausing Endangered Languages website (see particularly http://
www.hrelp.org/documentation/whatisit). I am grateful to David Nathan for
permission to incorporate this material into the present chapter, and for his
detailed comments on an earlier draft which picked up a number of errors
and infelicities. Thanks also to Jost Gippert, Nikolaus Himmelmann, Robert
Munro, and Peter Wittenburg for suggestions for improvement of earlier
presentations. Any remaining errors are solely mine.