6
Nikolaus P. Himmelmann
research economy. If someone worked on a minority language in the
Philippines 50 years ago and someone else wanted to continue this work
now, it would obviously be most useful if this new project could build on
the complete set of primary data collected at the time and not just on a
grammar sketch and perhaps a few texts published by the earlier project.
Similarly, even if a given project on a little-known language is geared to-
wards a very specific purpose – say, the conceptualization of space – it is in
the interest of research economy (and accountability) if this project were to
feed all the primary data collected in the project work into an open archive
and not to limit itself to publishing the analytical results plus possibly a
small sample of primary data illustrating their basic materials.
While the set of primary data fed into an archive in these examples
would surely fail to constitute a comprehensive record of a language, it
could very well be of use for purposes other than the one motivating the
original project (data from matching tasks developed to investigate the lin-
guistic encoding of space, for example, are also quite useful for the analysis
of intonation, for conversation analytic purposes, for grammatical analysis,
and so on). More importantly, if it were common practice to feed complete
sets of primary data into open archives (which do not necessarily have to
form a physical unit), comprehensive documentations for quite a number of
little-known languages could grow over time, which in turn would strength-
en the empirical basis of all disciplines working on and with such lan-
guages and cultures. That is, while much of the discussion in this chapter
and book is concerned with projects specifically targeted at creating sub-
stantial language documentations, the basic idea of creating lasting, multi-
purpose documentations which are openly archived is not necessarily tied
to such projects. It is very well possible and desirable to create such docu-
mentations in a step-by-step fashion by compiling and integrating the pri-
mary data sets collected in a number of different projects over an extended
period of time. In fact, it is highly likely that in most instances, really com-
prehensive documentations can only be created in this additive way.
Finally, establishing open archives for primary data is also in the interest
of making analyses accountable. Many claims and analyses related to lan-
guages and speech communities for which no documentation is available
remain unverifiable as long as substantial parts of the primary data on which
the analyses are based remain inaccessible to further scrutiny. Accountability
here is intended to include all kinds of practical checks and methodological
tests with regard to the empirical basis of an analysis or theory, including
replicability and falsifiability. The documentation format developed here
Chapter 1 – Language documentation: What is it and what is it good for?
7
encourages, and also provides practical guidelines for, the open and widely
accessible
archiving of all primary data collected for little-known lan-
guages, regardless of their vitality.
2
3. A basic format for language documentations
This section presents a basic format for language documentations and then
highlights some features which distinguish this format from related enter-
prises.
3.1. The basic format
3.1.1. Primary data
Continuing the argument developed in the preceding sections, it should be
clear that a language documentation, conceived
of as a lasting, multipur-
pose record of a language, should contain a large set of primary data which
provide evidence for the language(s) used at a given time in a given com-
munity (in all of the different senses of “language”). Of major importance in
this regard are specimens of observable linguistic behavior, i.e. examples
of how the people actually communicate with each other. This includes all
kinds of communicative activities in a speech community, from everyday
small talk to elaborate rituals, from parents baby-talking to their newborn
infants to political disputes between village elders.
It is impossible to record all communicative events in a given speech
community, not only for obvious practical, but also for theoretical and ethical
reasons. Most importantly, such a record would imply a totalitarian set-up
with video cameras and microphones everywhere and the speakers unable to
control what of their behavior is recorded and what not. A major theoretical
problem pertains to the fact that there is no principled way for determining
a temporal boundary for such a recording (all communicative events in one
day? two weeks? one year? a century?).
Consequently, there is a need to sample the kinds of communicative
events to be documented. Once again, we can distinguish between a prag-
matic guideline and theoretically grounded targets. The pragmatic guideline
simply says that one should record as many and as broad a range as possible
of communicative events which commonly occur in the speech community.