Learning to recognize features of valid textual inferences Bill MacCartney Stanford University

Learning to recognize features of valid textual inferences

Textual inference as graph alignment

Example: graph alignment

Problems with alignment models

Problem 1: non-monotonicity

Problem 2: non-locality

Problem 3: confounding alignment & inference

Solution: three-stage architecture

1. Linguistic analysis

2. Aligning dependency graphs

3. Features of valid inferences

Features: restrictive adjuncts

Features: modality

Features: factives & implicatives

Evaluation: PASCAL RTE

Results & useful features

Results for all RTE data [updated]

What we have trouble with

Conclusion

Dostları ilə paylaş:

Learning to recognize features of valid textual inferences Bill MacCartney Stanford University

Learning to recognize features of valid textual inferences

Bill MacCartney Stanford University

with Trond Grenager, Marie-Catherine de Marneffe, Daniel Cer, and Christopher D. Manning

Textual inference as graph alignment

Many efforts have converged on this approach [Haghighi et al. 05, de Salvo Braz et al. 05]

Represent P & H as typed dependency graphs

Find least-cost alignment of H to (part of) P

Use locally-decomposable cost model

Assume good alignment  valid inference

Example: graph alignment

Problems with alignment models

Alignments are important, but…

Good alignment valid inference:

Problem 1: non-monotonicity

In normal “upward monotone” contexts, broadening a concept preserves truth:

P: Some Korean historians believe the murals are of Korean origin.

H: Some historians believe the murals are of Korean origin.

But not in “downward monotone” contexts:

P: Few Korean historians doubt that Koguryo belonged to Korea.

H: Few historians doubt that Koguryo belonged to Korea.

Lots of constructs invert monotonicity!

Problem 2: non-locality

To be tractable, alignment scoring must be local

But valid inference can hinge on non-local factors:

Problem 3: confounding alignment & inference

If alignment  entailment, lexical cost model must penalize e.g. antonyms, inverses:

But aligner will seek the best alignment:

Actually, we want the first alignment, and then a separate assessment of entailment! [cf. Marsi & Krahmer 05]

Solution: three-stage architecture

1. Linguistic analysis

Typed dependencies from statistical parser [de Marneffe et al. 06]

Collocations from WordNet (Bill hung_up the phone)

Statistical named entity recognizers [Finkel et al. 05]

Canonicalization of quantity, date, and money expressions

Semantic role identification: PropBank roles [Toutanova et al. 05]

Coreference resolution:

Hand-built: acronyms, country and nationality, factive verbs

TF-IDF scores

2. Aligning dependency graphs

Beam search for least-cost alignment

Locally decomposable cost model

Lexical matching costs

Structural matching costs

3. Features of valid inferences

After alignment, extract features of inference

Extracted features  statistical model  score

(Score ≥ threshold) ?  prediction: yes/no

Features: restrictive adjuncts

Does hypothesis add/drop a restrictive adjunct?

Generate features for add/drop, monotonicity

Features: modality

Features: factives & implicatives

P: Libya has tried, with limited success, to develop its own indigenous missile, and to extend the range of its aging SCUD force for many years under the Al Fatah and other missile programs.

H: Libya has developed its own domestic missile program.

Evaluate governing verbs for implicativity class

Need to check for -monotone context here too

Evaluation: PASCAL RTE

RTE = recognizing textual “entailment” [Dagan et al. 05]

Does premise P “entail” hypothesis H?

Three annual competitions (so far)

Considerable variance from year to year

High inter-annotator agreement (~95%)

Results & useful features

Results for all RTE data [updated]

What we have trouble with

Non-entailment is easier than entailment

Lots of adjuncts, but which are restrictive?

Multiword “lexical” semantics/world knowledge

Conclusion

Alignment models promising, but flawed:

Solution: align, then judge validity of inference

We extract global-level semantic features

Still lots of room to improve…