currency than dialects: speakers of English, for example, may speak different dialects of
their respective languages, depending on their locale; the speech of someone from the
British Midlands is different from that of Newcastle, London, New York, Atlanta, Lagos,
New Delhi, Port Moresby, Sydney, or Auckland. We nonetheless recognize all of these
forms of speech as English.
But again, there is a problem: many so-called “dialects” are in fact different
languages. A common example is that of Chinese, for which Mandarin Chinese is the
most widely known variety, and is the closest to the written form of Chinese, but whose
varieties such as Cantonese, Fukkinese, Shanghai, Wu, and others, are actually related
languages as different from one another as French, Italian, Portuguese, Romanian and
Spanish. Because these languages are spoken in a single (although very large) country,
and because they share a common writing system, there is a tendency to regard them as a
single language, rather than the distinct language systems that they are.
The situation for the English dialects is also unclear: many of the speakers of the
different varieties of English listed would have a great deal of difficulty understanding
one another (for example, Newcastle and Atlanta speakers of English). Moreover, the
varieties of English spoken in each of those places is not a unitary thing; markedly
different varieties of English can be found across socio-economic strata and ethnicities in
all of these places. Furthermore, in West Africa and Port Moresby, language varieties
exist that are quite clearly based on English, but which are highly divergent in structure
from most other varieties of English. Linguists generally concur in treating these speech
varieties, such as West African Creole English and New Ginea Tok Pisin, as languages
unto themselves, even though all (standard) English-speaking people from the locale may
find them intelligible.
These situations are not unique to English and Chinese, but occur again and again
in many situations, regardless of group size. At times these issues go unnoticed, but at
other times they can develop into major concerns, as for example with the different
varieties of Quiché and other Mayan languages spoken in Guatemala. Some members of
the Mayan Academy have pressed for recognition of a only a single Mayan language,
where others see as many as 56 distinct languages (Paul Lewis, personal communication
Feb 27 2006). Likewise, we commonly refer to Arabic, as if it were one language across
North Africa and Western Asia, and indeed there is a formal variety Modern Standard
Arabic, which can be used in many countries, especially among educated people. The
everyday spoken varieties are all quite different from one another and not in general
mutually intelligible. Other standard languages, such as French, Spanish, and German in
Europe, have similar relations to dialects that are not necessarily mutually intelligible
with one another.
The converse of this situation also occurs. Sometimes two groups may speak
mutually intelligible varieties, but for various other reasons, see themselves as distinct.
Serbian and Coratian are two names for language varieties that are very similar and until
recently were referred to collectively as Serbo-Croatian. Similarly, Hindi and Urdu are
written using distinct scripts and are treated as standard varieties in two different
countries, but for all intents and purposes, they represent mutually intelligible spoken
varieties. Hindi and Urdu participate in another pattern, in which geographically
neighboring varieties may be mutually intelligible, and mutually intelligible with local
varieties of other languages, but varieties from opposite geographic extremes are not.
Languages that may have some degree of intelligibility with Hindi-Urdu include Punjabi,
Maithili, Nepali, and Bhojpuri, among others.
All of these issues complicate the definition of “language” for statistical purposes.
For linguists, two main principles are used to identify languages. First and foremost, a
language is considered to be a collection of speech varieties that are mutually intelligible.
The linguistic basis for this principle is that varieties that are mutually intelligible are
likely to be structurally similar, even homogeneous. The second principle is group self-
identification. If two groups of people see themselves as different people, and they
identify those differences through language, then it may not be practical to recognize a
single language for both groups.
For large dialect chains, like those involving English, Chinese, Hindi-Urdu,
Arabic, and most of the examples we have cited, application of this principle would
require recognizing some distinct languages, e.g., at least among Standard English, West
African Creole English and Tok Pisin, or among Hindi-Urdu and the structurally distinct
Punjabi, Maithili, Nepali and Bhojpuri, or among several varieties of Arabic: Gulf,
Cairene, Levantine, Moroccan, Tunisian etc. Ideally these distinctions would be
established on the basis of intelligibility testing, a rigorous procedure in which speakers
from different locales are tested for comprehension after listening to recordings of each
other’s speech (Grimes 1995). This procedure is costly in time and resources, and is only
used where necessary. Short of this, field interviews may be used, but these tend to
address issues of group identification more than intelligibility, even under the most
careful interview procedures.
Finally, it is often difficult to part with traditional notions of language identity
coming from outside of linguistic analysis. Literary tradition and political association
may impose themselves in different ways on people’s understanding of language identity.
For example, in the German-speaking parts of Europe, varieties of language spoken near
the Dutch border may be linguistically closer to Dutch, but they are nonetheless
considered dialects of German, and many speakers consider themselves to be German,
rather than Dutch or any other national identity. And in the former Soviet republics of
Azerbaijan, Kazakhstan, Turkmenistan and Uzbekistan, it is unclear how many Turkic
languages would be recognized on the basis of mutual intelligibility, as these and other
Turkic language varieties spoken in central Asia are mutually intelligible to some extent,
but differences in the writing systems used (including Cyrillic, Roman and Arabic
scripts) and political divisions dating back more than a century have led to separate
identities among the people of these countries.
Hence, when different speech varieties are called languages, and when people are
grouped together and counted as speakers of a common language, it will often be for
different reasons in different instances. Moreover, it will not always be clear in any given