the same facts as yielding different entries. However, if an E16/E17/E18 entry shows
signs of misunderstanding (missing the existence of a variety, having an erroneous indi-
cation of intelligibility level, or giving a blanket statement with no indicated basis, etc.),
any variety that is arguably not intelligible is listed as a missing language inAppendixA.
In all cases, references are provided to the literature that support the argument made
regarding the missing language in question.
Some 236 (E16), 477 (E17), and 198 (E18) missing languages were encountered.
More than half of the 477 missing languages for E17 represent languages known to be
extinct by 1951, which were not intended to be included in E16/E18 but were, at least
according to its introduction, intended to be included in E17. (The corresponding num-
ber of missing languages in E16/E18, including those extinct by 1951, would have been
501 (E16) and 468 (E18).) The exact numbers of missing languages divided by
macroarea are shown in Table 2.
3.2. Spurious languages. Appendix B lists entries in E16/E17/E18 that are spuri-
ous. To be more precise, an entry is listed here as spurious if:
• it duplicates another extant E16/E17/E18 entry, or
• it cannot be asserted that the entity denoted in the entry was a language different
from every other entry in E16/E17/E18.
Again, I do not list languages that are spurious solely by virtue of the interpretation of a
dialect situation correctly understood (but interpreted differently) in E16/E17/E18, and
in all cases references are provided to the literature that support the argument made
about the spurious language in question.
Some 191 (E16), 168 (E17), and 141 (E18) spurious languages were encountered.
The numbers of spurious languages divided by macroarea are shown in Table 2.
REVIEW ARTICLE
731
E16
missing
A1951
(missing B1951)
spurious
Africa
64
(9)
47
Australia
50
(35)
4
Eurasia
56
(93)
71
North America
13
(39)
6
Pacific
29
(5)
22
South America
24
(84)
41
total
236
(265)
191
E17
Africa
55
11
41
Australia
40
32
6
Eurasia
52
91
59
North America
11
49
4
Pacific
25
5
17
South America
22
84
41
total
205
272
168
E18
Africa
49
(10)
25
Australia
40
(32)
5
Eurasia
52
(90)
51
North America
11
(49)
4
Pacific
24
(5)
16
South America
22
(84)
40
total
198
(270)
141
Table
2. Numbers of missing and spurious languages in E16/E17/E18. The actual languages are detailed in
Appendix A and B. The column marked B1951 signifies that the languages in question were extinct by 1951,
while that marked A1951 signifies that the languages in question were not known to be extinct by 1951.
4. The language/dialect division. Many blanket statements have appeared re-
garding the (too high?) number of languages in E16/E17/E18 and the language/dialect
division. To take a few recent examples, Gippert (2012:21), with an example involving
Germanic languages, declares that ‘How dubious the calculation of languages in “Eth-
nologue” is … the number of 6,500 languages world-wide, consistently repeated in
both scientific and popular publications … is nothing but a popular myth’. Similarly,
Dixon (2012:463–64), citing a few examples of politically motivated language splits,
argues that ‘two modes of speaking are regarded as dialects of a single language if they
are mutually intelligible … even the figure of 5,445 “languages” [from the tenth edition
of Ethnologue—HH] is far too high … my estimate is that the figure is not more than
4,000, and probably a good deal less than this’. Indeed, it is easy to come up with ex-
amples of overcounting from the E16/E17/E18 listing, or, given the leeway in the
E16/E17/E18 definition, to come up with examples of inconsistencies. It is also easy to
come up with examples where there is no overcounting and, less easy but still not diffi-
cult, to come up with examples of undercounting (see e.g. the review of the 15th edition
for examples that are all retained in E16; Hammarström 2005). However, examples are
only examples and do not necessarily generalize.
I wish to point out here that defining languages on purely linguistic grounds is not
necessarily fraught with theoretical problems. A widespread belief holds that one
cannot define language vs. dialect in any consistent and intuition-preserving way based
solely on the binary (yes/no) criterion of mutual intelligibility. This view is premature:
Hammarström 2008 shows that, for any set of varieties and a yes or no relation of intel-
ligibility between each member of a pair, it is possible to define language/dialect in a
consistent way, that is, such that all varieties that belong to the same language are mu-
tually intelligible, and such that language entries are not unnecessarily multiplied. A
second widespread idea holds that intelligibility between languages as a binary property
(rather than gradient) is necessarily an arbitrary decision, that is, 77% lexicostatistical
similarity, 87% in a sentence-repetition test, or some other threshold percentage in a
text-comprehension test. This too may be premature, as a binary intelligibility without
thresholds is definable on formal languages that mimic essential properties of natural
languages (Hammarström 2010).
To seriously address the question of whether there is overcounting in general in
E16/E17/E18, and to obtain a sharper estimate of the number of mutually intelligible
languages (henceforth MI-languages) in the world, I have sampled 100 entries from
E16 at random, checked each, and labeled it with one of the following:
• −1: represents varieties intelligible to speakers of some other entry
• OK: represents varieties intelligible to all of its own speakers but not to those of
some other entry, or
• +1: represents varieties not intelligible to all of its own speakers nor to those of
some other entry.
9
The languages sampled and the individual assessment (plus source and comments) for
each is given in Appendix D. In all cases, the information in the cited sources is prefer-
able to E16 since the sources explain how and where the information presented was
obtained.
732
LANGUAGE, VOLUME 91, NUMBER 3 (2015)
9
This indicates that the entry, based on unintelligibility, should be split. In cases encountered in the sample,
the entry should be split in two, rather than some higher number.