Evaluating Language Statistics:
The Ethnologue and Beyond
A report prepared for the UNESCO Institute for Statistics
John C. Paolillo
School of Informatics, Indiana University
Assisted by Anupam Das
Department of Linguistics, Indiana University
March 31, 2006
0. Introduction
How many languages are there in the world? In a region or a particular country? How
many speakers does a given language have? Are there more speakers of English or
Mandarin? How are the numbers of these speakers changing, in the world, in a country or
on the Internet? Linguists are often asked questions such as these, whether by members
of other disciplines, lay-people, or policy makers. Yet despite the interest in and obvious
importance of these questions, they are not easy questions to answer, and there are few
sources one can turn to for definitive answers.
Since the early 1990s, new awareness of a number of language-related issues have
foregrounded the need for good answers to these questions. On the one hand, there is the
economic trend of globalization, which requires people from a variety of different
countries, ethnicities, cultures and language backgrounds to communicate with one
another. Globalization has been accompanied by claims about the economic importance
of one language vis-a-vis another, and the importance of specific languages in global
communication functions or for scientific and cultural exchange. Such discussions have
led to re-evaluations of the status of many languages in a range of contexts, such as the
role of English globally and in the European Union, and the role of Mandarin Chinese in
the Pacific Rim and on the Internet.
On the other hand, there is an increased social consciousness around the importance of
language diversity in the development and maintenance of knowledge, cultural heritage,
and human dignity, under the related causes of linguistic human rights and the protection
of endangered languages. These social concerns raise new questions: when is a language
endangered? When can it still be protected, and when is it already extinct beyond hope?
How are the language rights of world’s citizens best served? And what can one expect
for the evolution of the complex system represented by the world’s languages in all their
contexts of use? In short, what will be the contribution of language to the next century of
humanity’s existence?
Questions such as these underscore the need for good sources of information about
language statistics, and in particular, language population statistics, as the answer to all of
these questions, whether asked in specific for a given locale or in general for the world as
a whole, is likely to begin with an assessment of what is known about the affected
populations. For this reason it is essential that we survey the available information about
language populations and seek to evaluate its worth. In what ways is the existing
information adequate for our needs? In what ways might it be improved? Are there
countries of regions in which the information we have is better than others? If there are
multiple sources of information, how well are these to be trusted? Are some sources more
trustworthy than others?
This report seeks to answer this latter set of questions, through a systematic evaluation of
available information on language populations. Unfortunately, there are very few
comprehensive sources of information about language populations at present.
Consequently this report focuses principally on two different catalogues of language
information: (i) the Ethnologue, compiled by SIL International, and (ii) the Linguasphere,
compiled by David Dalby of the School of Oriental and African Studies in London. Both
catalogues have been actively compiled for more than 50 years, and both have reasonably
recent activities, with dedicated websites and ongoing development. Of the two, the
Ethnologue has more specific information about language populations, whereas the
Linguasphere mainly is concerned with cataloging linguistic relatedness among different
varieties of speech.
This report is organized as follows. Section 1 describes the linguistic issues that define
the context collecting, reporting and interpreting language statistics: the definition of the
notion “language”, its relation to family relatedness and linguistic structure, the
phenomenon of language death and disappearance and the process of linguistic fieldwork.
Section 2 describes the main currently available sources of information in which
comprehensive language statistics are presented. Subsections describe the Ethnologue
and Linguasphere publications specifically, followed by a final subsection in which other
sources of language statistics, in particular for endangered languages, are discussed.
Section 3 presents an evaluation of currently available language statistics, focusing on
data availability and currency, as reflected in the existing sources. Section 4 presents a
global linguistic profile based on the existing language statistics, to ascertain what can be
learned form this information, and what other sorts of information would be desirable.
The fifth and final section suggests how the existing statistics might be developed and
improved in the future.
1. Language statistics: the challenge
1.1. The notion of “language”
Before one can discuss language statistics and the number of speakers of the world’s
languages, one must define what one means by the word “language”. While we all think
of a language as being a variety of speech which one can use to express oneself verbally
and be understood, identifying the boundaries of a language — a crucial issue if
languages are to be counted and their speakers enumerated — is not a trivial matter.
People may mean many different things by “language”. For some, “language” means the
linguistic form of a substantial literature. Such a definition is unsatisfactory for the simple
reason that writing is only a few thousand years old while humanity, and the distinctly
human attribute of speech, is far older. Further complicating the issue is that in some
societies, including the Arabic-speaking world, Greece, the German-speaking part of
Switzerland, and in many parts of India, written language employs a different linguistic
system from everyday speech.
Sometimes languages are regarded as associated with a particular nation or
country, as if each nation had only one language. While nation states and other forms of
nationalism have done much to spread particular languages, there is scarcely a country in
the world citizens that speak a single language and most countries have tens and even
hundreds of languages. Languages are also regarded as varieties of speech with a wider