Microsoft Word u lg rept doc

Yüklə 1,04 Mb.

Pdf görüntüsü

səhifə	15/24
tarix	30.10.2018
ölçüsü	1,04 Mb.
	#76647

1 ... 11 12 13 14 15 16 17 18 ... 24

Figure 1. Distribution of population size in the sample of language entries (figures in

thousands, logarithmic scale).

There is one small departure from a log-normal distribution that is observable in

Figure 1 which should also be noted. This is the somewhat elevated tail on the left; for a

true log-normal distribution, we should expect this to taper off to zero, as it does on the

right. There are two possible explanations for this. The first is that population sizes are

truncated at 1; populations smaller than that can only represent languages that are extinct,

which are not shown here.

This could prevent the left tail from dropping to zero

normally. The second is that the elevated tail may represent a tendency of the Ethnologue

to retain speakers for small languages even when they are no longer spoken. This has

already been suggested in a review of the Ethnologue by Hammarström (2005), in which

it was pointed out that the a number of Australian languages recorded as already extinct

by another source were listed as extant in the Ethnologue. Hence, it might be profitable to

systematically examine smaller entries to ascertain whether more current data will show

there to be speakers for them or not.

Having established the general distributional nature of the population statistics,

we can now proceed to ask if there are systematic distributional effect, be they biases or

interpretable differences, according to the other factors we have already observed: the

type of source cited for population estimates, the date of the source, and geographic

One might expect that properly counting extinct languages could improve the statistical

profile of the left tail. However, the number of extinct languages in all of human history

is very large, and it is not clear which ones would be relevant. Extinct languages in the

Ethnologue are regarded as recently so, i.e. all were reported as living at some earlier

point. Since the Ethnologue covers a 50-year time span, and there are no indications as to

when a language became extinct, it is not possible to decide which of these entries should

be considered relevant.

region. These observations are presented as box-and-whisker plots in Figures 2-4. In each

of these plots, the vertical axis is the base ten logarithm of population size (3 corresponds

to 1,000; 4 corresponds to 10,000, etc.). Each box has a center bar indicating its median

value, with a notch around the bar indicating a 95% confidence interval for that value.

The box represents the range occupied by the central 50% of the data for that category,

and the whiskers extending above and below the box approximate a range enclosing 95%

of the data. Outliers are indicated as individual data points outside these latter ranges. By

comparing the central bars and the overlap among the notches of the different categories,

one can get a sense of the differences in population size across the set of categories.

Figure 2. Log

population size by geographic region among language entries.

Figure 2 indicates clearly that different regions have somewhat different typical

language population sizes. Africa, East Asia, Europe, South and Central Asia and

Western Asia appear to have somewhat larger population sizes than North America,

Oceania, and South America and the Caribbean. Southeast Asia has language population

sizes intermediate between these two sets. This confirms the observation of Grimes

(1986) of different geographic regions having different language size norms. The

observations also comport with our prior knowledge about the languages of the regions.

North America, where shift from the indigenous languages to English is all but complete,

has a relatively small median size at almost exactly 100 individuals. Oceania has a

median size around 1,000 individuals, a value widely reported for the countries of the

region such as Papua New Guinea. Some regions with larger median sizes, such as

Africa, nonetheless have a substantial number of smaller language groups, as indicated by

the small language group outliers. Note that Africa, East Asia, Europe, Oceania, South

America, and Southeast Asia all have small outlying groups; these would be good

candidates for endangered languages. Since different regions appear to have different

typical language sizes, the cutoff for what is likely to be endangered is likely to be

different for different regions.

Figure 3. Log

population size by source among language entries.

In Figure 3, we consider the contribution of different sources to population groups

of different sizes. Again we can see that there appear to be significant differences among

the different sources used. SIL and Academic sources tend to be for somewhat smaller

groups than Missionary, WCD, Government and other sources. This we might partly

expect, given the tendency for different sources to report on different regions, and the

different size trends observed for different regions in Figure 2. Populations for which no

source is given also tend to be smaller, while those that have a year but no source

indicated tend to be larger. This suggests that the two types of figures represent different

kinds of information entirely. Given that both reflect some uncertainty about the language

population data, and given that they account for about 20% of the language entries in our

sample, entries with such fragmentary citations on population data need be thoroughly

checked before we can fully rely on them. Again, this effect is probably distributed

unevenly across regions, so focusing on particular regions as suggested earlier may help

to address these issues as well.

Yüklə 1,04 Mb.

Dostları ilə paylaş:

1 ... 11 12 13 14 15 16 17 18 ... 24