these were most likely carried over from earlier editions of the Ethnologue, where a
citation for population had not originally been recorded, and no other more current
population information had been located. This may in fact be true for the “none”
category, which by definition is not associated with a date in Table 2, but the “not
indicated” figures are concentrated in the more recent years, and may be census or
almanac figures. Language entries without population estimates accounted for roughly
3% of those in our sample; this might be higher than we would hope, but little can be
done.
Table 3 compares the dates of population estimates in the sample of Ethnologue
language entries with geographic region. The data for the different regions appears to be
about equally current, although there appears to be a somewhat greater number of older
estimates in Africa, particularly in the 1966-1975 period. Entries where a date is not
indicated for a population estimate appear to be a bit more common in Europe and North
America, The estimates from Oceania, Southeast Asia and Africa, where greater numbers
of languages are found, tends to be better documented in this regard.
Table 3. Date of population estimates of Ethnologue language entries by region.
Not ind. 1920-5 1956-65 1966-75 1976-85 1986-95 1996-pres.
Africa
44
2
2
29
49
160
309
E Asia
6
0
0
0
7
15
38
S & Cent. Asia
25
0
0
1
13
40
100
SE Asia
19
0
0
1
68
91
162
W Asia
4
0
0
0
4
7
12
N America
16
0
1
0
12
26
39
S Am & Carib.
28
0
0
1
14
97
105
Europe
23
0
0
0
4
27
30
Oceania
19
0
0
5
127
63
156
Table 4 compares source type and region among the language entries. Here we are
interested in understanding whether particular sources specialize in particular regions of
the world. It appears that this is the case: a chi-test for independence on only the rows and
columns with sufficient data (rows: SIL, Academic, Government, WCD, Missionary, Not
indicates; columns: Africa, North America, South America, South and Central Asia,
Southeast Asia, Oceania) is significant (chi-square= 434.8426, df = 25, p<0.0001).
Inspecting the residuals,we find that WCD is cited more for Africa and Southeast Asia
than for other regions, other Christian Missionary sources are cited more for Africa and
South America, and SIL is cited more for Oceania, North and South America. Academic
sources are more cited for North America and Oceania (not surprising given that Wurm
and Hattori 1981, a comprehensive reference for the Pacific, is cited 135 times in our
sample), and government sources are important to South America and South and Central
Asia. Overall numbers of languages in the sample from East Asia, West Asia and Europe
are a bit too low to know if these regions utilize different sources in any patterned way.
Population estimates with no source indicated appear with the greatest preponderance in
South and Central Asia, and to a less extreme extent in Africa, suggesting that source
documentation could be improved by a systematic effort to update these regions. A
country-by-country comparison of the full Ethnologue database could potentially reveal
where particular problem areas are.
Table 4. Source of population estimates of Ethnologue language entries by region.
Africa N Amer S Amer E Asia SC Asia SE Asia W Asia Europe Oceania
SIL
144
30
92
0
2
96
1
1
153
Academic
88
36
25
28
27
115
8
29
121
Government
54
10
54
10
39
19
3
13
43
WCD
76
0
18
5
15
37
2
3
4
Missionary
60
1
18
0
5
13
2
10
12
Not indicated
121
0
10
17
65
40
5
5
19
None
43
16
28
6
25
19
4
23
18
Other
9
1
0
0
1
2
2
0
0
3.2. Language group sizes
Of central concern in our evaluation of the language entries are the population sizes. It is
from these that we potentially have the most to learn about the processes by which
languages grow or shrink, or become endangered. Grimes (1986), using an earlier version
of the Ethnologue’s database (approximately the 10
th
edition), observed that language
population sizes are log-normally distributed, and that different regions had different
typical sizes. Nearly a full generation has transpired, and with it, major changes in the
size and comprehensiveness of the Ethnologue, but the basic observation has been shown
to hold for updated versions of the data (e.g. Paolillo 2005, for the 14
th
edition). The same
observation can be made from our sample of language entries as well, as in Figure 1,
where the probability density is plotted against the logarithm of the population size.
2
The central tendency of this distribution is 5661 (95% confidence interval
between 4907 and 6531), and 95% of the language populations lie in the range from 13
individuals to 2.53 million. While this may seem to be a small population size, given that
there are languages such as Mandarin Chinese that have nearly a billion speakers, there
are very few such languages, and though they happen to account for a large proportion of
the world’s people, there are many more languages that are smaller in size. Moreover,
this result is reasonably robust, and compares well with earlier work. The notion of what
a typical speaker experiences is a different one which we take up subsequently.
2
We use a logarithmic scale of population size so as to bring out the central tendency in
the data. Other scales (e.g. linear, or sorted log-log plots) tend to conceal this structure.