Last: a single-mechanism account of type amd token frequency effects and their relatives

Yüklə 506 b.
ölçüsü506 b.

LAST: A single-mechanism account of type amd token frequency effects

  • and their relatives

The Problem

  • Usage-based approaches to language are committed to explaining language structure in terms of domain-general abilities and mechanisms influencing language use

  • The major focus of the investigation of how use impacts structure have been frequency effects

  • However, how does frequency do what it does?

I. Token frequency

Word recognition

  • High-frequency words are accessed faster in both recognition and production (both for RT’s and M350)

  • Coltheart et al. (1977), Becker (1979), Glanzer and Ehrenreich (1979), McClelland and Rumelhart (1981), Schvaneveldt and McDonald (1981), Paap et al. (1987), Gordon (1983), Norris (1984), Goldinger et al. (1989), Monsell (1991), Luce et al. (2000), Plaut and Booth (2000), Embick et al. (2001)

Token frequency effects in semantic and orthographic priming

  • high frequency words are primed by their orthographic and semantic neighbors less than are low frequency words

  • Perea and Rosa (2000a); Schuberth and Eimas (1977), Schuberth et al (1981), Becker (1979), Stanovich and West (1981, 1983), Stanovich et al (1981), West and Stanovich (1982), Neely (1991), Borowski and Besner (1993), Plaut and Booth (2000)

Token frequency effects in inhibitory phonological priming

  • Goldinger et al (1989), Luce et al (2000): RT’s are slower when the target is preceded by a phonologically related prime

  • More inhibition is produced when primes are low frequency than when they are high frequency words

  •  The effect holds across excitatory and inhibitory priming.

Token frequency effects in morphological priming

  • Low frequency stems prime past tense patterns associated with them more than do high frequency stems (Moder 1992)

Token frequency effects in identity priming

  • Scarborough et al (1977), Jacoby and Dallas (1981), Jacoby (1983), Forster and Davis (1984), Norris (1984), Jacoby and Hayman (1987), Nevers and Versace (1998), Versace (1998), Perea and Rosa (2000a), Versace and Nevers (2003): high frequency words prime themselves less than low frequency words

II. Current accounts and their problems

Network Theory

  • Moder (1992), Bybee (2001): High frequency weakens a word’s connections to neighboring words


  • Why does high token frequency reduce the amount of identity priming?

Compound Cue Theory

  • Ratcliff and McKoon (1988): the prime and the target form a compound cue used to access LTM, the greater the familiarity of the cue, assessed as a function of familiarities of prime and target, the faster LTM access


  • Why does high frequency of the prime reduce priming and does not increase it?

Distributed Connectionism

  • Plaut and Booth (2000): prime and target are overlapping patterns of activation distributed over nodes with sigmoid activation functions; the greater the frequency of a prime/target, the smaller the ratio of input activation to output activation

Plaut and Booth 2000

Plaut and Booth 2000

Plaut and Booth 2000

III. Basic features of LAST

Architectural assumptions of the model

  • Memory is a network

  • In this network, each unit corresponds to a node

  • There are type nodes and token nodes such that every memorized chunk, e.g. a word, a morpheme, a phoneme, a construction, owns a type node, and every presentation of a chunk forms a token node

  • Most or all of the token’s activation spreads to one type (its best match)

  • Every type is connected to all other types (in a module)

Architecture of memory

Evidence for type and token nodes

  • Token nodes are necessary to represent sequential structure (Pinker and Prince 1988, Marcus 1998, Pinker 1999)

    • If there are no token nodes, there is no way to unambiguously represent sequences with repetition

Evidence for type and token nodes

  • The token nodes are needed to represent exemplar-specific information (e.g. Palmeri et al. 1993, Miller 1994, Pierrehumbert 2002)

  • Type frequency and token frequency have the opposite effects on morphological productivity: type frequency increases productivity, token frequency decreases it (Bybee 1988, 1995, 2001)

  • Voice variation influences the magnitude of identity priming (Palmeri et al. 1993) but allophonic variation does not (McLennan et al. 2003)

  • Identity priming can be preserved across large variations in perceptual input, e.g. cross-modal morphological priming, capital vs. lower-case letters (Bowers 2000)

Evidence for full connectedness among types

  • Ratcliff and McKoon (1981, 1988): Degree of similarity influences the magnitude of the priming effect in semantic priming but does not influence priming onset (how soon after prime presentation the effect is observed)

  • If every type was not connected to all other types and activation spread was not instantaneous, activation would reach targets similar to the prime first and those targets would show the effect due to prime presentation earlier

Dynamics of Activation Spread

  • The amount of activation leaving a node is limited (Anderson 1974, 2000, Lewis and Anderson 1976)

  • As activation is leaving a node, it is divided between all links connected to that node (Anderson 1974, 2000, Lewis and Anderson 1976) and the node itself.

IV. The LAST account for token frequency effects

Account of token frequency effects

Account of token frequency effects

Account of token frequency effects

V. A sample of other effects that have a LAST explanation

Further support for the model: Lexicon size effects

  • Perfetti and Hogaboam (1975), Stanovich et al (1981), Simpson and Lorsbach (1983), Schwantes (1985), Emmorey et al (1995), Nation and Snowling (1998), Castles et al (1999), Morford (2003)

  • more priming in younger children, poorer readers, late signers who all have smaller lexicons

  • Why?

Further support for the model: Speed of recognition and priming

  • Castles et al. (1999), Plaut and Booth (2000), Morford (2003)

  • More priming in subjects who are slower at word recognition (even when lexicon size is controlled)

  • Why?

Further support for the model: Age of acquisition effects

  • Bonin et al (2001), Meschyan and Hernandez (2002), Morrison et al (2002, 2003), Newman and German (2002), Zevin and Seidenberg (2002, 2004), Ghyselinck et al (2004):

  • words learned earlier are recognized and retrieved faster when token frequency is controlled

  • When the lexicon is small, more activation would reach any given node

Type and token frequency

  • Bybee (1995, 2001): Affixes that attach to many word types are more productive than affixes with low type frequency while high token/type ratio reduces productivity of an affix

  • In a nonce-probe task, activation spreads from the new type node representing the nonce word

  • A nonce word is not strongly connected to any of the competing affixes  most of activation reaching each of the competing affixes will come through existing words similar to the nonce word

  • More activation will reach affixes with higher type frequency since there are more possible mediators

Further support for the model: Associative activation/Habituation

  • Hall (2003): subjects habituate to a stimulus A iff presentations of the tokens of the stimulus are not interspersed with presentations of tokens of a related stimulus B

  • When A and B are presented in alternation, strong connections develop between them and presentations of B lead to activation of A.

  • Direct activation leads to habituation while associative activation counteracts this effect.

  • When B is activated, it activates A increasing A’s strength/resting activation level while keeping the number of links A heads constant

VI. Neighborhood density effects in priming and equitable distribution of activation

Further features of the model: Dynamics of Activation Spread

  • Equity Principle: the amount of activation allocated to a link is positively correlated with the strength of the link

  • A node will attract more activation if the links it heads are weak  more identity priming in sparse neighborhoods (Perea and Rosa 2000, Thomsen et al. 1996)

  • Relative strength effect: an associate of given strength will receive more activation if it has to compete with weak associates than if it has to compete with strong ones (Anaki and Henik 2003), e.g. hammer-nail vs. cat-mouse


TP effects

  • Aslin et al. 1998: infants can segment speech based solely on transitional probabilities;

  • TP of B given A = relative strength of AB

VII. The basics of the LAST account of (frequency effects in) associative learning

The model’s view of associative learning

  • The model’s view of associative learning

    • If two nodes are activated simultaneously due to co-occurrence or similarity (shared attributes), the link between them is strengthened because some activation spreads to the link’s propagation filter (PF)
    • Propagation filters (PF’s): nodes whose resting activation levels determine the strength and sign of links they are situated on but are not influenced by activation passing through the link (Sumida and Dyer 1992, Sumida 1997)
    •  more activation would spread through the link if the PF has a high r due to more activation being allocated to the link;

Link structure

  • Tail driven because

  • Preceding  following > following  preceding

  • Following at higher activation level when preceding and following become co-activated

  •  The link whose PF tailed by a linktron headed by the following must be stronger than the link tailed whose PF is tailed by a linktron headed by the preceding

  • The link whose tail is the following is stronger

  •  Tail-driven link structure

Linktron PF’s

  • Binary (+ or -)

    • Otherwise, link strength would depend equally on characteristics of both the head and the tail of the link
  • Non-trainable

    • Otherwise, head-driven inhibitory linktron would weaken tail-driven inhibitory linktron on first trial, leading to link strengthening when the tail is presented in isolation at later trials

Parameter setting

Pre-exposure, Desensitization, Blocking

  • The greater the token frequency of a node, the less activation would reach a given PF  the slower the speed of associative learning (lower associability)

  • Also, if a stimulus has a strong associate, it becomes harder to associate with other stimuli (e.g. Kamin 1969)

  • Not just rats:

    • /tlip/ less acceptable than /bwip/ (Moreton 2002)
    • */tl/ */bw/
    • But /t/ and /l/ are more frequent than /b/ and /w/  harder to associate
    • Cf. also ‘He disappeared it’ vs. ‘He vanished it’ (Brooks and Tomasello 1999)

Logarithmicity of frequency effects

  • Increase tF by 1 

    • increase T’s r,
    • add a T t link
  • High prior r 

    • Less activation stays in T  increase in r is smaller
    • Less activation is received by the new link’s PF  decrease in connectivity is smaller

VIII. Decay

Size and decay

  • Activation unit – the cluster of stars in the diagrams; an amount of activation defined by its current location and time and location of creation

  • The younger the activation unit, the faster it decays

  • The larger the activation unit, the slower it decays

Asymmetries in priming

  • Hi freq lo freq < lo freq hi freq

    • Semantic: Koriat 1981, Chwilla et al. 1998
    • Visual: Rueckl 2003
    • Morphological: Schriefers et al. 1992, Feldman 2003
    • Acoustic: Goldinger et al. 1989
    • Phonological: Radeau et al. 1995
  • Hi  lo – divided into smaller chunks earlier  more decay

  • That is, links headed by a high-frequency node are generally absolutely stronger (because their PF’s are headed by low-frequency nodes) but relatively weaker (because there are more Tt links)

  • Reduces to rlinktron < rTt

Persistence of morphological, syntactic, and identity priming

  • Morphological priming – id priming of roots

  • Syntactic priming (Bock 1986, Bock et al. 2000) – id priming of constructions

  • Identity priming persists because activation units are larger: relatively much activation from a type’s token(s) stays in the type (rT >> rTt, rTT) rT is same for all nodes

  • The higher the frequency of a type, the smaller the activation unit remaining in the type  don’t find morphological priming with affix repetition, don’t find in phonological, orthographic priming (Emmorey 1989, Feldman 2003, although cf. VanWagenen 2005)

  • Masked priming: small activation unit, fast decay

IX. Further work

Some crucial experiments

  • Priming an association should reduce priming across unprimed associations (Zeelenberg 1998, Zeelenberg et al. 2003)

  • High degree (token frequency, neighborhood density) of neighbors should correlate with slow access (Gruenenfelder and Pisoni 2005)

  • Token frequency effect in productivity: artificial grammar learning (relevant work: Goldberg et al. 2004, roots and sufs vs. verbs and constructions)

  • Is minimum SOA smaller in identity priming than in similarity-based or co-occurrence-based priming (semantic, phonological, orthographic)?

Some crucial experiments

  • Does frequency influence associability more for conditioned stimuli than unconditioned ones? (predicted by tail-driven link structure)

  • Slower decay of priming for low-frequency, low-density primes?

  • Reverse frequency effects when access to particular tokens is required; when production is delayed after type activation so that activation can leak out of the node? (Baayen et al. 2005)

Further issues

  • Segmentation: relative frequency effects (Hay 2003), stochastic artificial grammars (Saffran et al. 1996)

  • Creation of new type nodes due to unclassified tokens, chunking, or generalization (dependency detection vs. unit formation)

  • Integration of new type nodes (overlap in distributed reps linked to type nodes vs. co-activation with similar types that have been activated due to stimulus presentation)

  • Token-type matching (identity detection after Marcus et al. 1999, perceptual magnet effects)


  • Frequency effects, neighborhood effects and associative learning can be accounted for by the spread of activation through a localist associative network in which

    • There is a type layer and a token layer
    • All types are interconnected while a token is linked to only one type
    • Activation leaving a node is divided between all links connected to the node and the node itself
    • Stronger links receive more activation
  • Though recognized by many, an older node is heard by few as the neighborhood gets more crowded, the neighbors get more talkative, and the stories it tells become all too familiar. As it ages, it settles into familiar patterns interacting mostly with its old friends and growing unwilling and unable to interact with those it does not know well.

Dostları ilə paylaş:

Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur © 2019
rəhbərliyinə müraciət

    Ana səhifə