Chapter 6 – Documenting lexical knowledge
147
Table 1. Hanunoo pronouns
S
H
M
kuh ‘I’ 1s
+
–
+
muh ‘you’ 2s
–
+
+
yah ‘s/he’ 3s
–
–
+
tah ‘we two’ 1du
+
+
+
tam ‘we all’ 1pl
INCL
+
+
–
yuh ‘you all’ 2pl
–
+
–
dah
‘they’
3pl
– – –
mih ‘we (but not you)’ 1pl
EXCL
+
–
–
Another useful descriptive paradigm widely applied to (and in fact driven
by) lexicographic practice is the “frame-semantics” approach associated
with Charles Fillmore (see, for example, Fillmore and Atkins 1992). Indi-
vidual words, on this view, project wider, structured “frames” – configura-
tions of elements and actions, some of which receive explicit grammatical
realization and some of which remain implicit in the frame. Families of
words then share frames. For example, the Framenet description of the
“Commerce-buy” frame – which might be instantiated by such verbs as
buy, lease, or rent – is
These are words describing a basic commercial transaction involving a
buyer and a seller exchanging money and goods, taking the perspective of
the buyer. The words vary individually in the patterns of frame element re-
alization they allow. For example, the typical pattern for the verb BUY:
BUYER buys GOODS from SELLER for MONEY. Abby bought a car
from Robin for $ 5,000.
Clearly, frames themselves can be interrelated. Compare the description for
the “Giving” frame, which the “Commerce” frame above “inherits”:
A Donor transfers a
Theme
from a Donor to a
Recipient
.
13
This
frame in-
cludes only actions that are initiated by the Donor (the one that starts out
owning the
Theme
). Sentences (even metaphorical ones) must meet the fol-
lowing entailments: the Donor first has possession of the
Theme
. Following
the transfer the Donor no longer has the
Theme
and the
Recipient
does.
148
John B. Haviland
In some ways related as a metasemantic device is the approach, most ex-
plicitly developed in Levin (1993), that uses various syntactic diagnostics –
such as patterns of diathesis – to partition lexical sets into families or
classes. Testing various diagnostic syntactic behaviors against their occur-
rence with specific verbs partitions the verbs into classes which can, ac-
cording to this logic, be expected to display commonalities of meaning. For
example, Levin proposes the following constructions as relevant tests to
discover semantic classes among transitive verbs.
(9) Diathesis diagnostics
MIDDLE
: The bread cuts easily.
CONATIVE
: Carla hit at the door.
BODY
-
PART POSSESSOR ASCENSION
: Terry touched Bill on the shoulder.
Applied to specific verbs (each of which may have a variety of hyponyms,
thus forming meaning families), these tests reveal different syntactic classes
corresponding to putative meaning families. The meaning families can, in
turn, be used to group individual lexical items, and the groupings are thus
justified not simply on notional but also on syntactic grounds.
(10) Diathesis diagnostics applied to different verbs (from Levin 1993: 6)
touch hit cut break
CONATIVE
No
Yes
Yes
No
BODY
-
PART POSS
.
ASC
. Yes Yes Yes No
MIDDLE
No No Yes Yes
After one has documented the basic structures of a grammar, and collected
an ample corpus of texts, how does one supplement elicited examples and
textually situated tokens of use to achieve a systematic compilation of lexi-
cal knowledge? Interlinear glossing of a large corpus can be used mechani-
cally to generate a structured word list, whose analytical perspicacity is in
direct proportion to the compiler’s care and consistency in morphological
and semantic tagging during the glossing procedure. Various computational
tools aid lexical extraction from text corpora – not only dedicated linguistic
database tools like SIL’s Shoebox/Toolbox, but also both general and spe-
Chapter 6 – Documenting lexical knowledge
149
cialized concordance tools (written, for example, as unix shell scripts, or
with programming languages like
PERL
or
ICON
14
).
Other computer techniques can also aid in eliciting lexemes in a lan-
guage, taking advantage of regular phonological patterns. A well-known
example is Terry Kaufman’s method for generating an exhaustive list of
“potential roots” in Mayan languages, based on the observation that the
root canon in Mayan is CVC or some simple variant thereof. Table 2 shows
a short
ICON
program that begins with all the consonants and vowels
15
in the
Mayan language Tseltal and produces a complete list of all permutations of
the form CV(:)(j)C. The program produces 8820 potential roots. (The first
of those beginning with b are shown in Table 3.) Each of these can be ex-
haustively (and exhaustingly) tested with native speakers to see which forms
actually produce recognizable lexical items – many speakers of Mayan lan-
guages and others with similarly straightforward phonotactics have, over
the years, been subjected to such a mind-numbing task.
Table 2. Tseltal root salad, in the Icon programming language
procedure main()
C := "`bcCjkKlmnpPrstTwxyzZ"
V := "aAeEiIoOuU"
M := "0j"
every (c1 := !C) do {
every (v1 := !V) do {
every (m1 :=!M) do {
every (c2 := !C) do {
root := c1||v1||m1||c2
write(root))
}}}}
end
Table 3. The first possible Tseltal roots beginning with
b
ba' bab bach bach' baj bak bak' bal bam ban bap bap' bar bas bat
bat' baw bax bay bats bats' baj’ bajb bajch bajch' bajj bajk bajk' bajl
bajm bajn bajp bajp' bajr bajs bajt bajt' bajw bajx bajy bajts bajts'
baa’ baab baach … etc.