3. Evaluation of language statistics
The Ethnologue provides extensive information on its sources of information both in the
entries itself and in the bibliography. Hence, it is possible to get an idea of the nature of
the information in the Ethnologue and the quality of its data. The Linguasphere Register
does not provide the same kind of documentation within entries, but instead provides
links to many of its sources on various pages of its website. Hence, we cannot evaluate
the Linguasphere directly, but we can compare it to the Ethnologue to ascertain Of
particular interest are the cited population figures, their source and the currency of the
data represented. Also of interest are the methods by which the data were collected
(through field linguistic survey, census, etc.). Finally, we should also be interested in
what, if anything, we can learn from the statistics that are presented. By tabulating the
statistics presented in different ways, and attempting to understand what they might tell
us about language populations, diversity and endangerment, we can potentially learn
about the gaps in the existing knowledge about languages and their speakers, as well as
the nature of the sources of information that we do have.
This section comprises an evaluation primarily of the Ethnologue, comparing at
relevant points to the Linguasphere Register as well as other relevant references. The
evaluation of language entries is conducted on two distinct sets of data. The first is a
random sample of 2001 entries from the 15
th
edition of the Ethnologue, for which we can
conduct a more in-depth investigation. The second data set is the complete set of
language entries from the 14
th
edition of the Ethnologue, which was collected for an
earlier project (Paolillo 2005). We also undertake a separate analysis of country entries
and maps, from the 15
th
edition.
The analysis of the language entries proceeds in three parts. First, in section 3.1
we investigate the cited sources for language entries, using summary counts of the
different sources classified according to type. Second, we examine the currency of the
data across language entries, by source, language family, country and region. Third, we
investigate the language group sizes recorded in the Ethnologue using the same
breakdowns as for currency, also comparing with the Linguasphere Register to examine
the consistency across the two resources. We then consider location information in
section 3.4, information about media, literatures and language use in section 3.5,
followed by classification issues in section 3.6. A short summary in section 3.8 concludes
this section of the report.
3.1. Sources and currency of data
Our evaluation of the sources of the Ethnologue is based upon the random sample of
2001 language entries. We identified the population estimate for each entry, if present,
and identified its source, and the year of the citation. We then classified each of the
sources according to one of several types: SIL, academic, Government, World Christian
Database, other Christian missionary, and other sources. When multiple sources were
given for a single entry, we used only the most recent one to determine both source type
and year. A number of entries had a date for a population figure, but no source. These
were recorded as “not indicated”. Still others gave a population figure, but had no source
or date. These were recorded as “none”. Finally, a number had no population estimate,
and hence no source information for it; these were recorded as “no estimate”. The types
and year of sources are cross-tabulated in Table 2.
Table 2. Type of source by year for population figures in a random sample of 2001
Ethnologue entries.
1920-5 1956-65 1966-75 1976-85 1986-95 1996-pres.
Total
SIL
0
0
15
93
169
242
519
Academic
1
1
17
149
104
204
477
Government
0
1
1
25
114
103
245
WCD
0
0
0
1
2
157
160
Missionary
0
0
0
10
64
47
121
Other
0
0
4
0
5
6
15
Not indicated
1
1
0
20
68
192
282
None
0
0
0
0
0
0
118
No estimate
0
0
0
0
0
0
64
Total
2
3
37
298
526
951
2001
Table 2 indicates that almost half of the Ethnologue’s sources for population
figures in the language entries are relatively recent; the bulk of the remainder fall within
the last 30 years, but there are some disturbingly old sources, such as one from 1920 and
one from 1925, in this sample. The two languages in question are both reportedly spoken
in Nigeria: Beele [bxq], 120 speakers in Bauchi state in a few villages near the Bole, and
Sheni [scv], 200 speakers in Kaduna state. It is unclear whether these would have
survived to the present day with such small numbers of speakers. Nigeria has 510 living
languages listed, so perhaps it is understandable that these small languages have been
missed in subsequent reports.
The distribution of source types indicates that the Ethnologue relies on SIL
sources for more than a quarter of its population estimates, and nearly as many from
academic sources. Presumably this is because many of the languages reported in the
Ethnologue are smaller and would not be reliably individuated by government and other
sources. A second major source of population estimates comes from the World Christian
Database (WCD) and other Christian missionary sources, collectively accounting for just
over a tenth of the language entries. What distinguishes these sources from many others
is the possibility that they have staff reporting these estimates from the field, in the
manner of academic linguists and SIL. However, it is less likely that such estimates
would be from trained linguists employing established language survey methods.
Conversations with the Ethnologue editorial indicated that their main concern with these
data sources is that they might report ethnic populations, instead of actual language
populations. While the two methods of counting can give similar estimates, it is
hazardous to assume so, especially in cases of language shift.
The “not indicated” and “none” categories also account for a large proportion of
the language entries. For both of these categories, the Ethnologue staff surmised that