Konečno, sgorel – nel’za že



Yüklə 523 b.
tarix05.04.2018
ölçüsü523 b.
#36182





Konečno, sgorel – nel’za že

  • Konečno, sgorel – nel’za že

  • v polden’ ležat’ na solncepeke

  • [«Domovoj», 2002]

  • ‘Of course, you got a sunburn! You can’t ŽE lie in the hot sun in the middle of the day!’



Konečno, sgorel – nel’za že

  • Konečno, sgorel – nel’za že

  • v polden’ ležat’ na solncepeke

  • [«Domovoj», 2002]

  • ‘Of course, you got a sunburn! You can’t ŽE lie in the hot sun in the middle of the day!’



“The wide use of particles is a typical feature of colloquial Russian” (Vasilyeva 1972: 6)

  • “The wide use of particles is a typical feature of colloquial Russian” (Vasilyeva 1972: 6)

  • Emu ja mogu poverit’ – ‘I can trust him’

  • Ved’ emu ja mogu poverit’ – ‘I can trust him, you know this

  • Emu-to ja mogu poverit’ – ‘I know, I can trust him’

  • Emu ja ešče mogu poverit’ – ‘Well, I suppose, I can trust him’

  • Tak emu ja mogu poverit’ – ‘So I can trust him’

  • Vot emu ja mogu poverit’ – ‘He is the one I can trust’

  • Emu ja i mogu poverit’ – ‘Therefore I can trust him’

  • Da emu ja mogu poverit’‘Well, I can surely trust him’

  • Xot’ emu ja mogu poverit’ – ‘At least I can trust him’



Active use of particles distinguishes L1 speakers from L2 learners (Nikolaeva 1985: 7)

  • Active use of particles distinguishes L1 speakers from L2 learners (Nikolaeva 1985: 7)

  • Relevant for other languages too.

  • Heinrichs, W. 1981. Die Modalpartkeln im Deutschen und Schwedischen. Tübingen.

  • L2 German speaker:

    • Bitte geben Sie mir das Buch.
  • L1 German speaker:

    • Können Sie mir vielleicht mal das Buch da geben?
    • Ach, geben Sie nur doch bitte mal das Buch.


Particles in Russian

  • Particles in Russian

    • Extent
    • Distribution in the corpus
  • Our data: 9 words

    • Database
  • Analysis

    • Alternative annotation scheme & guidelines
  • Experiments 1 and 2

    • Training a tagger to disambiguate between uses


Langacker (2013: 96) on parts of speech: “Traditional terms lack precise definition, are inconsistent in their applications, and are generally inadequate”

  • Langacker (2013: 96) on parts of speech: “Traditional terms lack precise definition, are inconsistent in their applications, and are generally inadequate”

  • Croft (2001: 63-107) Parts of speech are partly language-specific: the “same” categories might not coincide exactly across languages, though the focal points of certain categories, such as noun, pronoun, verb are typologically similar

  • Part of speech categories can be complex and can overlap:



Langacker (2013: 96) on parts of speech: “Traditional terms lack precise definition, are inconsistent in their applications, and are generally inadequate”

  • Langacker (2013: 96) on parts of speech: “Traditional terms lack precise definition, are inconsistent in their applications, and are generally inadequate”

  • Croft (2001: 63-107) Parts of speech are partly language-specific: the “same” categories might not coincide exactly across languages, though the focal points of certain categories, such as noun, pronoun, verb are typologically similar

  • Part of speech categories can be complex and can overlap:



Formal characteristics: morphological classes, e.g., nouns inflected for case, verbs for tense and person

  • Formal characteristics: morphological classes, e.g., nouns inflected for case, verbs for tense and person

  • Distributional characteristics: e.g., adpositions contiguous with noun phrases, pronouns substitute for nouns, conjunctions bind phrases

  • Semantic characteristics: e.g., nouns signify entities, verbs signify situations

  • Ideally, a classification should take into consideration all three types of characteristics



Formal characteristics: morphological classes, e.g., nouns inflected for case, verbs for tense and person

  • Formal characteristics: morphological classes, e.g., nouns inflected for case, verbs for tense and person

  • Distributional characteristics: e.g., adpositions contiguous with noun phrases, pronouns substitute for nouns, conjunctions bind phrases

  • Semantic characteristics: e.g., nouns signify entities, verbs signify situations

  • Ideally, a classification should take into consideration all three types of characteristics



Automatic Part of Speech taggers are trained on a gold standard corpus

  • Automatic Part of Speech taggers are trained on a gold standard corpus

  • 1 Part of Speech error can foul up the parsing of a whole sentence

  • Manning 2011: Penn Treebank of English yields 97% accuracy in automatic Part of Speech tagging, but

    • This yields only 57% sentence parsing accuracy!
    • Main culprit is Part of Speech tagging errors
  • Accurate tagging is important not only for Natural Language Processing, but for all tools sourced by NLP:

    • spelling and grammar checkers
    • intelligent computer-assisted language learning
    • linguistic corpora
    • machine translation


Automatic Part of Speech taggers are trained on a gold standard corpus

  • Automatic Part of Speech taggers are trained on a gold standard corpus

  • 1 Part of Speech error can foul up the parsing of a whole sentence

  • Manning 2011: Penn Treebank of English yields 97% accuracy in automatic Part of Speech tagging, but

    • This yields only 57% sentence parsing accuracy!
    • Main culprit is Part of Speech tagging errors
  • Accurate tagging is important not only for Natural Language Processing, but for all tools sourced by NLP:

    • spelling and grammar checkers
    • intelligent computer-assisted language learning
    • linguistic corpora
    • machine translation


Particle is not a valid category.

  • Particle is not a valid category.

  • Russian particles have no coherent profile.

  • “Particle” looks like a garbage category that is used when one feels uncertain about how to classify a word.

  • Particle is not a classification but rather a failure to classify a word.



Estimates of the number of Russian particles vary:

  • Estimates of the number of Russian particles vary:

  • Zaliznjak (1980) designates over 100 Russian words as particles.

  • Nikolaeva (1985: 8) lists the following alternative counts:

    • 131 particles in the 17-volume Academy dictionary
    • 110 particles in the 4-volume Academy dictionary
    • 84 particles in Ušakov’s dictionary
    • 75 particles in Ožegov’s dictionary
  • Starodumova (1997: 8-9) claims that Russian is among the most “particle-rich” languages in the world, with approximately 300 particles.



Estimates of the number of Russian particles vary:

  • Estimates of the number of Russian particles vary:

  • Zaliznjak (1980) designates over 100 Russian words as particles.

  • Nikolaeva (1985: 8) lists the following alternative counts:

    • 131 particles in the 17-volume Academy dictionary
    • 110 particles in the 4-volume Academy dictionary
    • 84 particles in Ušakov’s dictionary
    • 75 particles in Ožegov’s dictionary
  • Starodumova (1997: 8-9) claims that Russian is among the most “particle-rich” languages in the world, with approximately 300 particles.













1. Adverbial conjunction (ADVCNJ) – syntactically optional, usually preposed.

    • 1. Adverbial conjunction (ADVCNJ) – syntactically optional, usually preposed.
  • Konečno, sgorel – nel’za že v polden’ ležat’ na solncepeke.

  • ‘Of course, you got a sunburn! After all, you can’t lie in the hot sun in the middle of the day!’

  • 2. Coordinating conjunction (CNJCOO) – usually postposed, obligatory for creating an explicit contrast between syntactic constituents:

  • Satira i jumor. Odni ix rezko razdeljajut ... , drugie že vidjat v jumore ... raznovidnost’ satiry.

  • ‘Satire and humor: some people keep them strictly distinct ... , others however see humor as a form of satire.’

  • 3. Emphasizer (EMPH) – syntactically optional, follows a phrasal stress-bearing word and brings it in focus of attention

  • Seli s kraju ― i tut že iz veščmeška Vovka izvlek butylku portvejna. ‘They sat down and right away then Vovka pulled a bottle of portwine out of the supply bag.’



Interjection – ‘yes’

  • Interjection – ‘yes’

    • Da, zavtra ja priedu. ‘Yes, I will come tomorrow’
  • Coordinating conjunction – ‘and’, ‘but’

    • Ded da baba ‘grandfather and grandmother’
  • Adverb – ‘after all, well’

  • Predicative – stands for entire proposition, carries stress:

    • Neobxodimo ustanovit’, želaet li on vozvratit’sja. Esli da, to kogda.
    • ‘It is necessary to find out whether he wants to come back. If so, when’.
  • Modal verb – unstressed, used with imperatives, infinitives, present tense finite forms:

    • Da budet svet! Let there be light!’


Source: Russian National Corpus gold standard, using the tags manually assigned there

  • Source: Russian National Corpus gold standard, using the tags manually assigned there

  • Database: 100 randomly-selected sentences for each of nine high-frequency particles (= 900 sentences)

  • Method: Hidden Markov Model (HMM), 10-fold cross-validation, each time using 90 sentences as training set and 10 sentences as test set



Source: Russian National Corpus gold standard, using the tags manually assigned there

  • Source: Russian National Corpus gold standard, using the tags manually assigned there

  • Database: 100 randomly-selected sentences for each of nine high-frequency particles (= 900 sentences)

  • Method: Hidden Markov Model (HMM), 10-fold cross-validation, each time using 90 sentences as training set and 10 sentences as test set











Same source, database, and method as Experiment 1, but using our tags for the nine words instead of those in the RNC gold standard

  • Same source, database, and method as Experiment 1, but using our tags for the nine words instead of those in the RNC gold standard











Can we eliminate particles from the part-of-speech classification in Russian?

  • Can we eliminate particles from the part-of-speech classification in Russian?

    • Yes, “particle” is not a classification but a failure to classify a word.
    • It is possible to reclassify the words commonly classed as “particles”.
  • What are the practical benefits of this approach?

    • Particle-free annotation, where all categories are meaningful and useful for further applications.
    • Analysis that is descriptively more precise.
  • Our methods

    • Usage-based analysis of corpus data: 9 high-frequency “particles”.
    • Experiment: training an automatic tagger to disambiguate uses.




Anna Wierzbicka. 1997. Jazyk. Kul’tura. Poznanie. Moscow: Russkie slovari.

  • Anna Wierzbicka. 1997. Jazyk. Kul’tura. Poznanie. Moscow: Russkie slovari.

  • Croft, William. 2001. Radical Construction Grammar: Syntactic Theory in Typological Perspective. Oxford: Oxford University Press.

  • Heinrichs, W. 1981. Die Modalpartkeln im Deutschen und Schwedischen. Tübingen.

  • Kasatkina, R.F. 2004. “Častica že v roli tekstovogo konnektora (na materiale russkoj dialektnoj reči).” In Nikolaeva T.M. (ed.) Verbal’naja i neverbal’naja opory prostranstva mežfrazovyx svjazej. Moskva. Pp. 71-83.

  • Langacker, Ronald W. 2013. Essentials of Cognitive Grammar. Oxford: Oxford University Press.



Manning, C. D. 2011. “Part-of-Speech Tagging from 97% to 100%: Is It Time for Some Linguistics?” In Gelbukh, A. (ed.), Computational Linguistics and Intelligent Text Processing, 12th International Conference, CICLing 2011, Proceedings, Part I. Lecture Notes in Computer Science 6608, pp. 171-189.

  • Manning, C. D. 2011. “Part-of-Speech Tagging from 97% to 100%: Is It Time for Some Linguistics?” In Gelbukh, A. (ed.), Computational Linguistics and Intelligent Text Processing, 12th International Conference, CICLing 2011, Proceedings, Part I. Lecture Notes in Computer Science 6608, pp. 171-189.

  • McCoy, S. 2003. “Unifying the meaning of multifunctional particles: The case of Russian ŽE.” In University of Pennsylvania Working Papers in Linguistics: Vol. 9.1. Pp. 123-135.

  • Nikolaeva, T. M. 1985. Funkcii častic v vyzkazyvanii (na materiale slavjanskix jazykov). Moscow.

  • Nikolaeva, T. M. 2008. Neparadigmatičeskaja lingvistika. Istorija “bluždajuščix častic”. Moscow: Jazyki slavjanskix kul’tur.



Sičinava, D. V. 2005. Obrabotka tekstov s grammatičeskoj razmetkoj: instrukcija razmetčika. http://ruscorpora.ru/sbornik2005/09sitch.pdf

  • Sičinava, D. V. 2005. Obrabotka tekstov s grammatičeskoj razmetkoj: instrukcija razmetčika. http://ruscorpora.ru/sbornik2005/09sitch.pdf

  • Starodumova, E. A. 1997. Russkie časticy (pis’mennaja monologičeskaja reč’). Avtoreferat doktorskoj dissertacii.

  • Švedova, N. Ju. et al. 1980. Russkaja grammatika, tom I. Moscow: Nauka.

  • Vasilyeva, N.A. 1972. Particles in Colloquial Russian. Moscow: Progress Publishers.

  • Zaliznjak, A. A. 1980. Grammatičeskij slovar’ russkogo jazyka. Moscow: Russkij jazyk.



Common claim: the higher use of particles is characteristic of spontaneous spoken Russian (Vasilyeva 1971)

  • Common claim: the higher use of particles is characteristic of spontaneous spoken Russian (Vasilyeva 1971)

  • Is it true for our data?

  • The difference is statistically significant:

  • Chi-squared= 3709, degrees of freedom=1, p-value> 2.2e-16

  • The effect size is very small: Cramer’s V=0.026

    • The minimum standard value for reportable small effect is 0.1
  • We combine the results for both types of data.

  • Possible explanation: underrepresentation of informal dialog (only 7.8%) in the spoken subcorpus.



Že never appears clause-initially.

  • Že never appears clause-initially.

  • Že is a clitic that forms a prosodic unit with a stressed lexeme, to which it is either preposed or postposed.

  • We suggest that the position of že with regard to its prosodic head is associated with different functions of že.

  • We differentiate between 3 uses of že:

    • Emphasizer
    • Adverbial conjunction
    • Coordinating conjunction


Yüklə 523 b.

Dostları ilə paylaş:




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə