Corpus – based lexicography



Yüklə 33,79 Kb.
səhifə2/2
tarix03.06.2022
ölçüsü33,79 Kb.
#88656
1   2
3 tayyor

Cognitive Linguistics, Logical Semantics and Corpus Linguistics
People normally—if they are not linguists, that is—listen to or read texts because of their meaning. They are interested in the syntactic features of phrases, sentences or texts only insofar as is necessary for understanding them. Meaning is the core feature of natural language, and this is the reason why semantics is the central linguistic discipline. Still, regardless of the enormous progress that phonology, syntax and many other disciplines have made, when it comes to explaining and describing the meaning of phrases, sentences, and texts, we are far from a consensus.
As said above, corpus linguistics regards language as a social phenomenon. This implies a strict division between meaning and understanding. Is it really the task of linguistics to investigate how the speaker and the listener understand the words, sentences or texts that they utter or perceive? Understanding is a psychological, a mental, or—in modern words—a cognitive phenomenon. This is why no bond exists between cognitive linguistics and corpus linguistics. Language as a social phenomenon is laid down in texts and only there. If we, as corpus linguists, wish to find out how a text is understood, we have to ask the listeners for paraphrases; these paraphrases, being texts themselves, again become part of the discourse and can become the object of linguistic analysis.
The difference between cognitive linguistics and corpus linguistics lies in how each deals with the unique property of language to signify. Any text element is inevitably both form (expression) and meaning. If you delete the form, the meaning is deleted as well. There is no meaning without form, without an expression. Text elements and segments are symbols, and being symbols, linguistic signs, they can be analysed in principle under two aspects: the form aspect or the meaning aspect. The consequence of this stance is that the only way to express the meaning of a text element or a text segment is to interpret it, that is, to paraphrase it. This is the stance of hermeneutic philosophy, as opposed to analytic philosophy (cf. Keller 1995, J¨ager [2000]).
In cognitive linguistics, which is embedded in analytic philosophy, meaning and understanding is seen as one. Here, text elements and text segments correspond to conceptual representations on the mental level. Within this system, however, it is not clear what the term ‘representation’ means. Does it refer to content linked with a form (what we could call presentations) or does it refer to pure content disconnected from form (what we could call ideations)? This ambiguity is of vast consequence (Janik and Toulmin 1973: 133), as presentations themselves are signs, that is, symbols, and thus need to be understood, that is, interpreted. Cognitive linguistics, however, does not tell us how this is to happen. Rather, it describes the manipulation of mental representations as a process (whereas an interpretation is an act, presupposing intentionality). Processes themselves are meaningless. It is only the act of interpretation that assigns meaning to them. Both Daniell Dennet and John Searle point out this aporia of the cognitive approach. In their opinion, the mental processes would again require a central meaner (Dennet 1998: 287f.) or homunculus (Searle 1992: 212f.) on a level higher than cognition, that is, for understanding mental representations, and the same would then apply for that level, too, and so on, ad infinitum.
On the other hand, if we translate ‘representation’ with ‘ideation,’ we dismiss the assumption of the symbolic character of language. The meaning of a word, a sentence or a text would then correspond to something immaterial, something without form, formulated in a so-called ‘mental language,’ whose elements would consist of either complex or atomistic concepts, depending whether one refers to Anna Wierzbicka and the early Jerry Fodor (Wierzbicka 1996, Fodor 1975) or to the later Jerry Fodor (Fodor 1998). On a large scale, these concepts of cognitive linguistics seem to correspond to words, but the difference lies in the fact that they are not material symbols which call for interpretation, but instead they are pure astral ideation, not contaminated by any form (cf. Teubert 1999).
In practice, particularly in artificial intelligence and automatic translation, this cognitive approach has failed. Alan Melby gave a plausible explanation why it was due to fail no matter which formal language had been defined for encoding the conceptual representations: “The real problem could be that the language-independent universal sememes we were looking for do.not exist. . . [O]ur approach to word senses was dead wrong.” (Melby 1995:48.)
It seems that the idea behind cognitive linguistics is the transduction or translation of phrases, sentences and texts in natural language, that is, of symbolic units, into an obviously language-independent ‘language of thought’ or ‘mentalese,’ which is non-symbolic and is exclusively defined by syntax.
This transduction or translation is seen as a process and does not involve intentionality. Cognitive linguistics is committed to the computational model of mind. According to this theory, mental representations are seen as structures consisting of what is called uninterpreted symbols, while mental processes are caused by the manipulation of these representations according to rule-based, that is, exclusively syntactic, algorithms. But does it really make sense to use the term ‘symbols’ for these mental representation units, just as we call words ‘linguistic signs’? On a cognitive (or computative) level, those entities are only symbols inasmuch as a content can become assigned to them from the outside of the mental (or computational) calculus. This content or meaning, however, does not affect the permissibility of manipulations with regard to their representation.
The content of a text consisting of linguistic signs, on the other hand, is something inherent to the text itself (and not assigned from the outside), a feature we can and must investigate if we want to make sense of a text. As Rudi Keller has pointed out, the symbols of natural language are suitable for and in need of interpretation (Keller 1995).
What appeals to many researchers of semantics is the fact that in cognitive semantics the meaning of a text is expressed through a calculation whose expressions are based exclusively on syntactic rules, or in other words, that semantics is transformed into syntax. They take it for granted that this is possible, as they claim that both natural and formal language are working with symbols. But in natural language, these symbols need to be interpreted whereas symbols in formal languages work without being assigned a certain (external) definition. Whether a formal language, a calculus, permits a certain permutation of symbols or not has nothing to do with the meaning or the definition of these symbols, it is just a question of syntax. As early as 1847, George Boole stated: “Those who are acquainted with the present state of the theory of Symbolic Algebra, are aware that the validity of the processes of analysis does not depend upon the interpretation of the symbols which are employed, but solely upon the laws of their combination.” Richard Montague also believes in the possibility of describing natural language semantics the same way as formal language semantics: “There is in my opinion no important theoretical difference between natural languages and the artificial languages of logigicians; indeed, I consider it possible to comprehend the syntax and semantics of both kinds of languages within a single natural and mathematically precise theory. On this point I differ from a number of philosophers, but agree, I believe, with Chomsky and his associates.” (Both quotes from Devlin, 1997: 73 and 117.)
From the point of view of corpus linguistics, the meaning of natural

language symbols, of text elements or text segments is negotiated by the discourse participants and can be found in the paraphrases they offer, and it is contained in language usage, that is, in context patterns. Natural language symbols refer not so much to language-external facts, but rather they create semantic links to other language signs. The meaning of a text segment is the history of the use of its constituents.


Linguistic signs always require interpretation. Whoever understands a text is able to interpret it. This interpretation can be communicated as a text in itself, a paraphrase of the original text. The act of interpretation requires intentionality, and therefore, cannot be reduced to a rule-based, algorithmic,‘mathematically precise’ procedure. If we see language as a social phenomenon, natural language semantics can leave aside the mental or cognitive level. Everything that can be said about the meaning of words, phrases or sentences will be found in the discourse. Anything that cannot be paraphrased in natural language has nothing to do with meaning. In a nutshell, this is the core programme that distinguishes corpus linguistics from cognitive linguistics.
Collocation and Meaning
In traditional linguistics, it is rather difficult to pinpoint the difference between a collocation such as harte Auseinandersetzung (hefty discussion) and a free combination such as harte Matratze (hard mattress). In corpus linguistics, on the other hand, it is possible to trace this awareness among the members of a language community of a distinct semantic cohesion between the lexical elements of a collocation by statistic means, that is, by detecting a significant co-occurrence of these elements within a sufficiently large corpus. Before it was possible to procedurally and systematically process large amounts of language data, syntactic rules had been the only way to describe the complex behaviour of co-occurrence between textual elements (i.e., words). Such rules describe the relation between different classes of elements, for instance, between nouns and modifying adjectives. Still, syntactic descriptions such as ‘Adjective +Noun’ are not specific enough to detect collocations as distinct types of semantic relationships.
Traditional lexicology fails to come up with a feasible definition for collocations that would allow their automatic identification in a corpus. To classify certain co-occurring textual elements as semantic units, that is, as collocations, it is necessary to recognise these text segments as recurrent phenomena, which is only possible within a sufficiently large corpus. Therefore, we must complement the intratextual perspective with its intertextual counterpart. By applying probabilistic.methods, it is possible to measure recurrence within a virtual universe of discourse, or more precisely, within a real corpus.
Collocation dictionaries in the strict sense are always corpus-based. Even so, the speaker’s competence is still needed to check statistically determined collocation candidates for their relevant semantic cohesion. The following case study aims to illustrate the potential of the corpus linguistic approach:
Corpus linguistics aims to analyse the meaning of words within texts, or rather, within their individual context. First and foremost, words are text elements, not lexicon or dictionary entries. Corpus linguistics is interested in text segments whose elements exhibit an inherent semantic cohesion which can be made visible through quantitative analyses of discourse or corpus (Biber, Conrad and Reppen 1998).
If the research focus is shifted from single words to text segments,.the distinction between linguistic and encyclopaedic knowledge gradually becomes fuzzy. The word Machtergreifung (seizure of power), outside its context, may be described as an incident where a certain group, previously excluded from political influence, seizes the power by its own force and without democratic legitimation. However, we will interpret text segments such as braune Machtergreifung or die Machtergreifung im Jahre 1933 as referring to the ‘seizure of power by the Nazis’ without hesitation. Is this because these texts refer to a extralingual reality, to a language-independentknowledge?
Although the majority of linguists would agree with this assumption, there may well be another, simpler, explanation: we have learned from a large number of citations, whenever braune Machtergreifung or Machtergrei fung im Jahre 1933 is mentioned, this refers to the seizure of power by the Nazis and to nothing else. There is a co-occurrence between both expressions that may result, for instance, in an anaphoric situation: the expressions are paraphrases of each other.
When translating a text into another language, we paraphrase the source text. The translation represents the meaning of the original text just like a paraphrase within the source language. Translation requires understanding and thus intentionality. Only if we understand a text can we interpret or even paraphrase it. This implies that different translations will yield different versions of the same text, which again shows that translation or paraphrasing cannot be reduced to algorithmic procedures.
The universe of discourse, containing all texts ever translated along with their translations, is the empirical base for multilingual corpus linguistics. It is a virtual universe, and it can be realised by multilingual parallel corpora (or a collection of bilingual parallel corpora). Parallel corpora consist of source texts along with their translations into other languages, whereas reciprocal parallel corpora contain the source texts in two languages along with their translations into the target languages.
Just as in monolingual corpus linguistics, meaning is also seen as a strictly linguistic (or better, textual) term here. Meaning is paraphrase. The entire meaning of a text segment within a multilingual universe of discourse is enclosed in the history of all translation equivalents of the segment.
The translation unit, that is, the text segment completely represented by the translation equivalent, is the base unit of multilingual corpus semantics.
Translation units, consisting of a single word or of several words, are the minimal units of translation. If they consist of several words, they are translated as a whole and not word by word. Therefore, translation equivalents correspond to the text segments of monolingual corpus linguistics.
Within the framework of multilingual corpus linguistics, we take that the meaning of translation units is contained in their translation equivalents in other languages. This corresponds to the base assumption of corpus linguistics, which does not regard semantic cohesion as something fixed but as belonging to a large spectrum reaching from inalterable units to text segments whose elements can be varied, expanded or omitted. Identifying these translation units (or text segments) again involves interpretation. The transla tion shows us whether a given co-occurrence of words is a single translation equivalent or a combination of them, that is, merely a chain of text elements.
This leads to two consequences. What can be seen as an integral translation.equivalent in one target language may be a simple word-by-word translation in another. This may even be the case within a single target language, depending on the stylistic preferences of different translators. In fact, it is the community of translators (along with the translation critics) who in their daily practice decide what is the translation equivalent, just as the monolingual language community decides what is a text segment.
Multilingual Corpus Linguistics in Practice
Neither a lexicon derived from a bilingual dictionary nor the supposedly language-neutral conceptual ontologies applied within Artificial Intelligence will solve the problem of machine translation of general language texts.
Meanwhile, this fact is acknowledged by the experts. Therefore, they focus on the machine translation of texts written in a controlled documentation language, which is a more or less formal language in which all technical terms are defined unambiguously along with a syntax that rejects all ambiguous.expressions as non-grammatical.
General language texts written in natural languages cannot be translated.without interpretation. Here, multilingual corpus linguistics steers clear of this obstacle in an elegant way. Unlike disciplines such as Artificial Intelligence.and Machine Translation, which are based on cognitive linguistics, it does not try to model and emulate mental processes, but instead tries to support the translator by processing parallel corpora. They contain the practice of previous human translation. In these corpora, those translation equivalents that are proven to be reliable and accepted will outweigh equivalents that have been dismissed as inadequate in the long run. If, for instance, pros´euchomai is translated as to make my prayers three times out of eight, it may well be assumed that it is an accepted—albeit not the ideal—equivalent within the given context.
Parallel corpora are translation repositories. They link translation units with their equivalents. As first studies have shown (Steyer and Teubert 1998), we may assume that 90 percent of all translation units along with their relevant equivalents may be found in a carefully compiled corpus of about 20.million words per language, provided that the text to be translated is sufficiently close to the corpus with regard to text type and genre.
Multilingual corpus linguistics does not pretend to solve the problem of machine translation of general language. But it may help the human translator in finding a suitable equivalent for the unit to be translated more efficiently than traditional bilingual dictionaries, because it includes the context even in those cases where the translation equivalent is not a syntagmatically defined collocation but a certain textual element within a sequence. The goal is to select from among all given elements the one whose contextual profile is closest to that of the textual segment to be translated.

REGERENCES:


1. Medium Sprache.” In: Werner Kallmeyer (Ed.): Sprache und neue Medien. Jahrbuch 1999 des Instituts f¨ur Deutsche Sprache. Berlin/New York: de Gruyter, 9–30.
2. Janik, Allen; Toulmin, Stephen. 1973. Wittgenstein’s Vienna. New York: Schuster & Schuste
3. Cowie, A.P. and Howarth, P. (1996). Phraseological Competence and Written Proficiency. In G.M. Blue and R. Mitchell (eds.), Language and Education. Oxford: Oxford University Press.
4. Crystal,D.(1981).The Ideal Dictionary, Lexicographer and User. Cambridge: Cambridge University Press.


Yüklə 33,79 Kb.

Dostları ilə paylaş:
1   2




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə