Of the 100 entries, on the criterion of intelligibility, twenty-one should be merged
with another existing entry, six entries should be split (in two), and the other seventy-
three entries should remain. This boils down to a proportion of (73 + 6 * 2)/100 = 0.85
mutually intelligible languages to E16 entries. Since the sample was random, with high
probability, the results do generalize (Cochran 1963).
The sample was 100 out of 6,969 entries of mother-tongue spoken languages not al-
ready deemed spurious. 0.85 * 7054 entries is 5995.9. With a confidence interval of
99%, the number of L1 spoken languages in E16 is between 5,092 and 6,899. With a
confidence interval of 95%, the number of L1 spoken languages in E16 is between
5,324 and 6,668.
Given that there are something like 5,996 L1 spoken MI-languages in E16, adding
the number of MI-languages not in E16 should give us the total number of known lan-
guages in the world. There are 236 MI-languages not extinct by 1951 and 265 extinct by
1951 (see Appendix A). Thus, a good estimate of the total number of known MI-lan-
guages is 6,497 (with a confidence interval of 99% it is between 5,593 and 7,400, and
with a confidence interval of 95%, it is between 5,825 and 7,169). These figures are
summarized in Table 3.
REVIEW ARTICLE
733
10
A stock is defined (Whalen & Simons 2012:156) as ‘the largest grouping of languages for which related-
ness can be demonstrated and for which a plausible protolanguage can be reconstructed’.
estimate
95% interval
99% interval
lower
higher
lower
higher
In E16
5,996
5,092
6,899
5,324
6,668
MI-languages A1951 not in E16
236
MI-languages B1951 not in E16
265
total
number of MI-languages
6,497
5,593
7,400
5,825
7,169
Table
3. Figures on the estimated number of attested assertable MI-languages spoken as a first language,
based on the E16 figures with missing languages added (A1951 signifies missing MI-languages not known to
be extinct before 1951, and B1951 signifies missing MI-languages extinct before 1951).
Thus, a total number of living languages around 6,000 or of known languages around
6,500 is far from being ‘a popular myth’. It is a fairly well-justified estimate.
5. Classification. In §2, we reviewed the description of the principles said to be be-
hind the E16/E17/E18 classification of languages into families and subfamilies. The
present section addresses the actual outcome. Of spoken mother-tongue languages, Eth-
nologue recognizes 121 (E16), 140 (E17), or 132 (E18) language families, 50 (E16), 82
(E17), or 96 (E18) language isolates, and 73 (E16), 65 (E17), or 62 (E18) unclassified
languages, as well as a number of mixed languages and creoles. While language classi-
fication is not the primary focus of E16/E17/E18, it is worthwhile to evaluate it properly,
in order for it not to be mischaracterized and misapplied inside and/or outside the field of
linguistics. For example, Pompei and colleagues (2011) call the Ethnologue classifica-
tion an ‘expert classification’. Whalen and Simons (2012:161–62) interpret E16/E17’s
unclassified languages as being independent linguistic stocks
10
and lament the loss of di-
versity if these ‘unclassified’ languages go extinct. Are these inferences justified?
In fact, the E16/E17/E18 classification contains a large number of languages that are
not (sub)classified in harmony with experts. The first category of errors are of an ele-
mentary kind: bookkeeping, name confusion, misunderstanding of linguistic vs. nonlin-
guistic classification, not checking relevant research, and not keeping up with relevant
research. The second category is where expert publications provide contradictory or in-
sufficient information, and E16/E17/E18 have chosen to follow one or the other expert
inconsistently, rather than attempting to find out which expert has the most/least con-
vincing argument.
The first type of error seems to occur uniformly in all areas, except perhaps in North
America. Appendix C gives some examples of errors of this kind in order to illustrate
the point (for E16; the situation is not much different in E17/E18). In the interest of
space, this is not (in fact, it is far from) an exhaustive list.
At the end of the day, how ‘expert’-like is the E16/E17/E18 classification overall?
Hammarström et al. 2014 has a complete classification and subclassification of the lan-
guages of the world based on a consistent weighing of the arguments of experts, where
the justification for each node is traceable to the relevant publication. A standard way to
measure the difference between two trees T
1
and T
2
is the Robinson-Foulds distance,
which, in essence, counts the number of nodes found in T
1
but not in T
2
plus the number
of nodes found in T
2
but not in T
1
(Day 1985). We restrict the comparison to the 6,794
(E16)/6,812 (E17)/6,835 (E18) languages that are classified as part of a family, as an iso-
late, or left unclassified (i.e. excluding mixed languages, creoles, pidgins, sign lan-
guages, and speech registers) and that are not spurious (as per the listing in this review).
The E16 classification thus has 2,242 nodes, of which 1,265 are also found in the clas-
sification of Hammarström et al. 2014. The Hammarström et al. 2014 classification has
a total of 3,596 nodes concerning E16 languages, of which, again, 1,265 are found in E16.
This amounts to an unnormalized Robinson-Foulds distance of
2242 – 1265 + 3596 – 1265
= 1654
and a normalized distance of
3308 + 1265 – 1
= 0.723. This can be taken to mean that only
56.4% (1,265/2,242) of the E16 nodes are expert-like, and that only 35.2% (1,265/3,596)
of expert-like nodes are recognized in E16, yielding a total expert-like-ness of only
1 − 0.723 = 0.276 or 27.6%.
The E17 classification thus has 2,198 nodes, of which 1,337 are also found in the clas-
sification of Hammarström et al. 2014. The Hammarström et al. 2014 classification has
a total of 3,617 nodes concerning E17 languages, of which, again, 1,337 are found in E17.
This amounts to an unnormalized Robinson-Foulds distance of
2198 – 1337 + 3617 – 1337
=
1570.5 and a normalized distance of
3141 + 1337 – 1
= 0.702. This can be taken to mean that
only 60.8% (1,337/2,198) of the E17 nodes are expert-like, and that only 37.0% (1,337/
3,617) of expert-like nodes are recognized in E17, yielding a total expert-like-ness of
only 1 − 0.702 = 0.298 or 29.8%.
The E18 classification thus has 2,200 nodes, of which 1,354 are also found in the clas-
sification of Hammarström et al. 2014. The Hammarström et al. 2014 classification has
a total of 3,654 nodes concerning E18 languages, of which, again, 1,354 are found in E18.
This amounts to an unnormalized Robinson-Foulds distance of
2200 – 1354 + 3654 – 1354
= 1573
and a normalized distance of
3146 + 1354 – 1
= 0.699. This can be taken to mean that only
61.5% (1,354/2,200) of the E18 nodes are expert-like, and that only 37.1% (1,354/3,654)
of expert-like nodes are recognized in E18, yielding a total expert-like-ness of only
1 − 0.699 = 0.301 or 30.1%.
Thus, although E17 and E18 come marginally closer than E16, in no sense can
E16/E17/E18 be approximated to an ‘expert’-classification.
6. Discussion. Apart from the languages listed as missing/spurious and apart from
extinct languages that went extinct before 1951, as far as I have been able to tell, the re-
maining entries in E16/E17/E18 exist in a one-to-one relationship with speech commu-
nities recognizable from the literature. However, the literature itself does not cover the
world entirely. There are various regions of the world that are inhabited, but the linguis-
734
LANGUAGE, VOLUME 91, NUMBER 3 (2015)
2
3308
2
2
3141
3146
tic literature cannot fully account for which languages are spoken there and how they
relate to other known varieties. Thus, in all likelihood, there are further languages ex-
tant in the world that neither E16/E17/E18 nor the literature can argue for convincingly.
A few trends seem, impressionistically, to be present in the list of spurious languages:
• Cross-border languages counted twice
• Both an overarching language with considerable variation and its subvarieties
• Merging of different raw lists of languages, for example, old vs. new listings or
census lists vs. linguistic survey lists, without deep checking for duplicates
• Duplication of the ancestral or new language of an ethnic group who have shifted
language in near-historical times
• Thin entities, for example, a people are said to have lived on a certain island with-
out much further information
One and the same problem underlies these kinds of errors: the lack of explicit sources
for the justification of a language. If there had been a source for every entry detailing
what the entry is based on (location, name, linguistic data, or whatever is thought to
constitute the evidence for the language), it would be a near-mechanical task to merge
different lists by matching the data at hand. At present, one has to search the entire lit-
erature and second-guess the justification for the entry. Presumably, this is the reason
why there are almost as many spurious languages in E16/E17/E18 as there are missing
living languages.
E16/E17/E18 is not alone in not citing the individual justification for language list-
ings. Nearly all modern language listings for continent-sized areas produced by lin-
guists have the same policy of not citing explicit sources (or are derivative of the
Ethnologue), for example, Dixon 2002 for Australia, Tryon 2006 for the Pacific, Masica
1993 for the Indo-Aryan languages of South Asia, Maho 2003 for the Bantu languages,
Bradley 2007 for Southeast Asia, and so on. In fact, the only contemporary language
listings produced by linguists that do provide individual justifications are Goddard
1996 and Mithun 1999 for North America, Adelaar & Muysken 2004 for the Andes re-
gion of South America, and van Driem 2001 for the Himalayan region. In particular,
LINGUIST List,
11
which is in charge of listing extinct languages for ISO 639-3, has fol-
lowed the practice of not tying entries to sources. As a standard of comparison, this list-
ing contains more errors of all kinds mentioned in this review, on a far simpler task.
7. Conclusion. From a scientific perspective, there is really only one serious fault
with E16/E17/E18, namely, that the source for the information presented is not system-
atically indicated. Furthermore, the introduction contains a number of items where the
description of the principles behind E16/E17/E18 is questionable. Nevertheless, Ethno-
logue is an impressively comprehensive catalogue of world languages, and it is far su-
perior to anything else produced prior to 2009. In particular, it is superior by virtue of
being explicit. Most works with an overlapping goal produced by linguists contain ex-
traordinary amounts of vagueness in language definition, borders, justification, and
scope. I have listed upward of five hundred missing extinct and living languages and
several hundred spurious languages, so the number of errors that could have been pre-
vented with more work is far from negligible. The remaining entries, as far as I have
been able to tell, match one-to-one with a speech community recognizable in the litera-
ture. A redivision of those speech communities along the lines of mutual intelligibility
REVIEW ARTICLE
735
11
Under http://multitree.org/codes/, accessed 20 January 2012.
would recognize fewer languages (about 85%) than E16 (likely also for E17/E18). The
number 85% can be ascertained with confidence intervals, so there are limits to the ea-
gerness to split. Many languages are known only through SIL surveys, and the language
inventory as a whole is reasonably well informed. There is a rapid stream of change re-
quests submitted to ISO 639-3 on behalf of the Ethnologue editor covering many of
the languages highlighted in the present review. Therefore, I look forward to an even
sharper 19th edition.
REFERENCES
Adelaar, Willem F. H.
, and Pieter C. Muysken. 2004. The languages of the Andes.
(Cambridge language surveys.) Cambridge: Cambridge University Press.
Bakker, Peter
, and Mikael Parkvall. 2010. Catalogue of pidgin languages. Paper pre-
sented at the second Atlas of Pidgin and Creole Language Structures (ApiCS) Confer-
ence, 11–14 November 2010.
Bartlett, P. O.
2006. Artificial languages. Encyclopedia of language and linguistics, 2nd
edn., ed. by Keith Brown, vol. 1, 488–90. Amsterdam: Elsevier.
Blust, Robert
. 2008. Is there a Bima-Sumba subgroup? Oceanic Linguistics 47.45–113.
Bradley, David
. 2007. East and southeast Asia. Encyclopedia of the world’s endangered
languages, ed. by Christopher Moseley, 349–424. London: Routledge.
Cage, Ken
. 2003. Gayle—the language of kinks & queens: A history and dictionary of gay
language in South Africa. Johannesburg: Jacana Media.
Campbell, Lyle
. 1997. American Indian languages: The historical linguistics of Native
America. (Oxford studies in anthropological linguistics.) Oxford: Oxford University
Press.
Cochran, William G.
1963. Sampling techniques. 2nd edn. New York: Wiley.
Day, William
. 1985. Optimal algorithms for comparing trees with labeled leaves. Journal
of Classification 2.7–28.
Dieu, Michel
, and Patrick Renaud. 1983. Situation linguistique en afrique centrale—
inventaire préliminaire: Le cameroun. (Atlas linguistique de l’Afrique centrale.) Paris
and Yaoundé:Agence de Coopération Culturelle et Technique (ACCT); Centre Régional
de Recherche et de Documentation sur les Traditions Orales et pour le Développement
des Langues Africaines (CERDOTOLA); Direction Générale de la Recherche Scien-
tifique et Technique (DGRST), Institut des Sciences Humaines. [Carries the date 1983
but did not come out of the presses until 1985.]
Dixon, R. M. W.
2002. Australian languages: Their nature and development. (Cambridge
language surveys.) Cambridge: Cambridge University Press.
Dixon, R. M. W.
2012. How many languages? Basic linguistic theory, vol. 3: Further gram-
matical topics, 463–64. Oxford: Oxford University Press.
Frawley, William J.
(ed.) 2003. International encyclopedia of linguistics. 2nd edn. Ox-
ford: Oxford University Press.
Gippert, Jost
. 2012. Language-specific encoding in endangered language corpora. Poten-
tials of language documentation: Methods, analyses, and utilization (Language Docu-
mentation & Conservation special publication 3), ed. by Frank Seifart, Geoffrey Haig,
Nikolaus P. Himmelmann, Dagmar Jung, Anna Margetts, and Paul Trilsbeek, 17–24.
Honolulu: University of Hawai’i Press.
Goddard, Ives
(ed.) 1996. Handbook of North American Indians, vol. 17: Languages.
Washington, DC: Smithsonian Institution.
Gray, Russell D.
; Alexei J. Drummond; and Simon J. Greenhill. 2009. Language phy-
logenies reveal expansion pulses and pauses in Pacific settlement. Science 323.479–83.
Grimes, Barbara F.
(ed.) 1988. Ethnologue: Languages of the world. 11th edn. Dallas: SIL
International.
Grimes, Barbara F.
; Joseph E. Grimes; Malcolm Ross; Charles E. Grimes; and Dar-
rell Tryon
. 1995. Listing of Austronesian languages. In Tryon 1995, 121–80.
Hammarström, Harald
. 2005. Review of Ethnologue, 15th edn, ed. by Raymond G. Gor-
don, Jr. LINGUIST List 16.2637. Online: http://linguistlist.org/issues/16/16-2637.html.
Hammarström, Harald
. 2008. Counting languages in dialect continua using the criterion
of mutual intelligibility. Journal of Quantitative Linguistics 15.34–45.
736
LANGUAGE, VOLUME 91, NUMBER 3 (2015)
Hammarström, Harald
. 2010. Defining intelligibility on formal languages. Paper pre-
sented at the conference of the Centre for Language Technology, Gothenburg, 9 No-
vember 2010.
Hammarström, Harald
; Robert Forkel; Martin Haspelmath; and Sebastian Nord-
hoff
. 2014. Glottolog 2.3. Leipzig: Max Planck Institute for Evolutionary Anthropol-
ogy. Online: http://glottolog.org. Accessed on July 16, 2014. Database available online:
http://dx.doi.org/10.5281/zenodo.10899.
Kie
ß
ling, Roland
, and Maarten Mous. 2004. Urban youth languages in Africa. Anthro-
pological Linguistics 46.303–41.
Lastra, Yolanda
. 1990. El náhuatl del sur de Puebla. Anales de Antropología 27.383–90.
Lastra de Suárez, Yolanda
. 1986. Las áreas dialectales del náhuatl moderno. México:
Universidad Nacional Autónoma de México.
Maho, Jouni Filip
. 2003. A classification of the Bantu languages: An update of Guthrie’s
referential system. The Bantu languages (Routledge language family series), ed. by
Derek Nurse and Gérard Philippson, 639–51. London: Routledge.
Maho, Jouni Filip.
2009. Nugl online: The online version of the new updated Guthrie list,
a referential classification of the Bantu languages. Gothenburg: University of Gothen-
burg, Department of Oriental and African Languages. Online: http://goto.glocalnet.net
/mahopapers/nuglonline.pdf.
Masica, Colin P.
1993. The Indo-Aryan languages. (Cambridge language surveys.) Cam-
bridge: Cambridge University Press.
Mithun, Marianne
. 1999. The languages of native North America. (Cambridge language
surveys.) Cambridge University Press.
Moñino, Yves
. 1977. Conception du monde et langue d’initiation la’bi des gbaya-kara.
Langage et cultures africaines: Essais d’ethnolinguistique, ed. by Geneviève Calame-
Griaule, 115–47. Paris: François Maspéro.
Muysken, Pieter
. 2009. Kallawaya. Lenguas de Bolivia, vol. 1: Ambito andino, ed. by
Mily Crevels and Pieter Muysken, 147–67. La Paz: Plural Editores.
Pompei, Simone
; Vittorio Loreto; and Francesca Tria. 2011. On the accuracy of lan-
guage trees. PloS One 6.6.e20109. Online: http://journals.plos.org/plosone/article?id=
10.1371/journal.pone.0020109.
Tryon, Darrell T.
(ed.) 1995. Comparative Austronesian dictionary: An introduction to
Austronesian studies. (Trends in linguistics: Documentation 10.) Berlin: Mouton de
Gruyter. 4 vols.
Tryon, Darrell T
. 2006. Australasia and the Pacific. Atlas of the world’s languages, 2nd
edn., ed. by R. E. Asher and Christopher Moseley, 97–126. London: Routledge.
van Driem, George
. 2001. Languages of the Himalayas. (Handbuch der Orientalistik
2:10.) Leiden: E. J. Brill. 2 vols.
Whalen, Doug H.
, and Gary F. Simons. 2012. Endangered language families. Language
88.155–73.
[harald.hammarstroem@mpi.nl]
[Received 23 February 2013;
[
revision accepted 29 June 2015]
REVIEW ARTICLE
737
Dostları ilə paylaş: |