workers organized by either a government, university or independent organization,
traveling through a particular region of a country with the intention of surveying the
languages used. This method is somewhat deeper than the census approach, in that it
involves face-to-face encounters, where a census may not, and can afford to focus more
specifically on language issues, as the purposes of the field survey allow. Through this
method, alert researchers can often avoid the pitfalls of census statistics, that lead to
under-reporting of minority languages. Nonetheless, linguistic field surveys are often
more superficial than is necessary to fully confirm the identification of new languages,
and the population estimates reported are often educated guesses formed by observing
people in their native habitat. Furthermore, interactions with the local people may be
mediated through government officials or agencies, leading to some of the same
problems as the responses to a national census. If the researchers are members of a
foreign or national metropolitan community, they may be ethnically distinct from the
local inhabitants, and less likely to build the necessary trust in the short duration of the
research to obtain reliable responses to some types of questions. Hence, field surveys are
often a good starting point for future work in language identification and enumeration,
but their identifications are necessarily more preliminary and incomplete than the detailed
field research that ideally follows.
The most valuable form of information about languages comes from in-depth
linguistic fieldwork. Documenting the existence of a previously un-described language,
or identifying its relation to other languages, is a time-consuming process. Ideally it is
carried out on location in the area where the language in question is spoken, as this makes
it easier to recruit speakers of the language to serve as linguistic informants who supply
key information about the language, its words, judgments about appropriate sentence
structure, and meanings of expressions. Alternatively, linguistic fieldwork may be carried
out in a foreign context, such as in a research university, if one or more linguistic
informants have already been recruited. Often, work of this sort is done with native
speakers of the languages in question who are being trained as professional linguists,
whether to benefit language restoration efforts in their communities, language policy and
planning in the governments of their home countries, or their own intellectual goals.
The linguistic informant may be either bilingual or monolingual; monolingual
informants require more skill on the part of the field linguist, and in most areas
multilingualism is common enough that one can so most linguistic fieldwork is done with
multilingual informants. Nonetheless, the field linguist must typically be knowledgeable
about other languages of the region, especially any related languages. On the one hand,
s/he must be able to communicate with the informant, so that s/he can successfully elicit
the words and expressions that will establish the structure of the language. On the other,
s/he needs to be able to relate those forms, where possible, to those of other languages, so
that it is clear in what ways the informant’s speech variety is distinct. Painstaking and
systematic procedures must be followed, and common sources of error carefully avoided.
Depending on the information being sought, the elicitation process can take
anywhere from a few hours of work to several months or even years. The more different
a speech variety is from known varieties, the more time is required to make a good
description. This alone explains why so little is known about so many languages. For
example, from the Tasmanian languages, all that survive are a few word lists, as this is all
that anyone had bothered to collect before the languages went extinct. In places of
extreme linguistic diversity, such as Papua New Guinea, we often have only general
descriptions provided by travelers and explorers in the region.
At present, field linguistics is only a small part of the occupation of linguists.
While many linguistics graduate programs require a component of training in linguistic
fieldwork, this requirement is not universal, nor is it focused entirely on under-described
languages. Linguistic field surveys are also rare, being complex to organize, and
relatively expensive for their participants’ time and resources. And linguistics embraces a
range of other questions, some of which involve field research of other types, so a large
amount of linguistic fieldwork is actually focused on questions concerning large and
well-described languages. This results in a shortage of trained researchers, resources and
time focused on identifying and describing new and under-described languages. Since
any one researcher may be involved in many projects, repeat visits to areas of past
research may take place at intervals of twenty years or more. This is normally enough
time for war, disease, political change or economic fortune to completely alter the scene
one had witnessed earlier, many times reducing once-thriving language groups to the
point of near extinction. Consequently, much of the information we have about smaller
language groups is likely to be out of date. Promoting ongoing linguistic field research is
one of the major challenges facing the collection of sound and useful language statistics.
2. Sources for language statistics
At present there are very few sources of language statistics. Probably the best known is
the Ethnologue, because of its publicly available web-based version. One can often type
the name of a lesser-known language into a web-search engine, and have the Ethnologue
page for that language returned as the first hit. The introduction of language statistical
summaries in the fifteenth edition (Gordon 2005) has also made the Ethnologue a popular
resource among researchers, marketers and others who desire information about the
languages spoken in specific parts of the world. A second source of language statistics,
also with web-accessible and print versions, is the Linguasphere (Dalby 2000). The
Linguasphere is primarily intended as a comprehensive taxonomic classification of the
world’s speech communities, and carries less in the way of actual population statistics
(populations are proprted rounded to the nearest power of ten). At the same time, it
classifies speech communities to a much finer degree than the Ethnologue, and hence
provides an important point of comparison regarding language identifications. Finally
there are a number of other linguistic academic references, which may deal with
languages at a global or regional level. We will not undertake a comprehensive review of
these here, but instead will survey a few of the more important ones.
2.1. The Ethnologue
The Ethnologue can be described as a comprehensive catalogue of the known languages
spoken in the world. It is currently in its fifteenth edition, available in a free web-based