Learning Semantic Sub-graphs for Document Summarization Jure Leskovec, Marko Grobelnik



Yüklə 369,5 Kb.
tarix02.10.2018
ölçüsü369,5 Kb.
#71621


Learning Semantic Sub-graphs for Document Summarization

  • Jure Leskovec, Marko Grobelnik

  • Jozef Stefan Institute, Slovenia

  • Natasa Milic-Frayling

  • Microsoft Research, Cambridge, UK


Outline

  • Problem statement

  • Proposed Solution

  • Experiment design and results

  • Conclusions



Document Summarization

  • The task is to produce shorter version of an original document by selecting sentences from the text

  • Approach:

    • Learn a machine learning model for selecting sentences
    • Use information about semantic structure of the document (concepts and relations among concepts)




Detailed Summarization Procedure

  • Linguistic analysis of the text

  • - Deep parsing of sentences

  • Refinement of the text parse

  • - Named-entity consolidation

  • Determine that ’George Bush’ = ‘Bush’

  • = ‘U.S. president’

  • - Anaphora resolution

  • Link pronouns with name-entities

  • Extract Subject–Predicate–Object triples



Deep Parsing – NLPWin Output

  • NLPWin parse tree is the input to procedures for anaphora resolution, name-entity consolidation and extraction of triples



Named entities consolidation

  • Consolidating different surface forms that refer to the same entities – only for names of people, places, companies, etc.

  • Example:

    • Hillary Rodham Clinton, Hillary Clinton, Hillary Rodham, Mrs. Clinton  Hillary Clinton
  • Heuristic based on the overlap in the surface form of name variances

  • Accuracy on a subset of the data set ~90%.



Pronomial anaphora resolution

  • We link pronouns with their references

      • Mary likes Paul. She went to buy him a present.
      •  Mary likes Paul. She [Mary] went to buy him [Paul] a present.
  • Method:

    • We restrict anaphora resolution to 5 pronouns: she, he, who, I, they.
    • From the pronoun, traverse the text searching for candidate references and assign a score
    • The score is based on the distance from the pronoun and semantic information
    • Note we assume that pronouns refer only to named entities found in the document
    • Problem:
      • One passenger in King's car said they had been drinking liquor.
  • Average accuracy on 1,500 hand labeled pronouns: 81.2%



Extracting triples

  • Enhanced parse tree is traversed to identify Subject–Predicate–Object triples

  • Example:

  • “Conservatives embraced the nomination while liberals were cautious or hostile”

  • Resulting triples:

  • conservative  embrace  nomination

  • liberal  is  cautious

  • liberal  is  hostile



Detailed Summarization Procedure

  • Linguistic analysis of the text

  • - Deep parsing of sentences

  • Refinement of the text parse

  • - Named-entity consolidation

  • Determine that ’George Bush’ = ‘Bush’

  • = ‘U.S. president’

  • - Anaphora resolution

  • Link pronouns with name-entities

  • Extract Subject – Predicate – Object triples



Learning: Feature construction

  • Graph consists of nodes, referred as concepts, which can be subjects or objects and edges which are predicates and capture relations among concepts.

  • We use Word net to identify and compact synonym nodes – as they correspond to the same concepts.



Triple attributes



Experiments

  • We use Linear SVM to classify triples into relevant or not-relevant for the summary

    • Positive examples are triples from the sentences which were marked as summary sentences by experts
    • Negative examples are all other triples
  • Data:

    • 147 documents from the DUC 2002 for which we had extracted summaries.
  • Evaluation:

    • Report microaveraged values of precision, recall and F1 for the extracted triples using 10-fold cross validation.


Performance for various attribute sets



Performance for various attribute sets



Performance for various attribute sets



Performance for various attribute sets



Insights



Example of automatic summary

  • Cracks Appear in U.N. Trade Embargo Against Iraq.

  • Cracks appeared Tuesday in the U.N. trade embargo against Iraq as Saddam Hussein sought to circumvent the economic noose around his country. Japan, meanwhile, announced it would increase its aid to countries hardest hit by enforcing the sanctions. Hoping to defuse criticism that it is not doing its share to oppose Baghdad, Japan said up to $2 billion in aid may be sent to nations most affected by the U.N. embargo on Iraq. President Bush on Tuesday night promised a joint session of Congress and a nationwide radio and television audience that ``Saddam Hussein will fail'' to make his conquest of Kuwait permanent. ``America must stand up to aggression, and we will,'' said Bush, who added that the U.S. military may remain in the Saudi Arabian desert indefinitely. ``I cannot predict just how long it will take to convince Iraq to withdraw from Kuwait,'' Bush said. More than 150,000 U.S. troops have been sent to the Persian Gulf region to deter a possible Iraqi invasion of Saudi Arabia. Bush's aides said the president would follow his address to Congress with a televised message for the Iraqi people, declaring the world is united against their government's invasion of Kuwait. Saddam had offered Bush time on Iraqi TV. The Philippines and Namibia, the first of the developing nations to respond to an offer Monday by Saddam of free oil _ in exchange for sending their own tankers to get it _ said no to the Iraqi leader. Saddam's offer was seen as a none-too-subtle attempt to bypass the U.N. embargo, in effect since four days after Iraq's Aug. 2 invasion of Kuwait, by getting poor countries to dock their tankers in Iraq. But according to a State Department survey, Cuba and Romania have struck oil deals with Iraq and companies elsewhere are trying to continue trade with Baghdad, all in defiance of U.N. sanctions. Romania denies the allegation. The report, made available to The Associated Press, said some Eastern European countries also are trying to maintain their military sales to Iraq. A well-informed source in Tehran told The Associated Press that Iran has agreed to an Iraqi request to exchange food and medicine for up to 200,000 barrels of refined oil a day and cash payments. There was no official comment from Tehran or Baghdad on the reported food-for-oil deal. But the source, who requested anonymity, said the deal was struck during Iraqi Foreign Minister Tariq Aziz's visit Sunday to Tehran, the first by a senior Iraqi official since the 1980-88 gulf war. After the visit, the two countries announced they would resume diplomatic relations. Well-informed oil industry sources in the region, contacted by The AP, said that although Iran is a major oil exporter itself, it currently has to import about 150,000 barrels of refined oil a day for domestic use because of damages to refineries in the gulf war. Along similar lines, ABC News reported that following Aziz's visit, Iraq is apparently prepared to give Iran all the oil it wants to make up for the damage Iraq inflicted on Iran during their conflict. Secretary of State James A. Baker III, meanwhile, met in Moscow with Soviet Foreign Minister Eduard Shevardnadze, two days after the U.S.-Soviet summit that produced a joint demand that Iraq withdraw from Kuwait. During the summit, Bush encouraged Mikhail Gorbachev to withdraw 190 Soviet military specialists from Iraq, where they remain to fulfill contracts. Shevardnadze told the Soviet parliament Tuesday the specialists had not reneged on those contracts for fear it would jeopardize the 5,800 Soviet citizens in Iraq. In his speech, Bush said his heart went out to the families of the hundreds of Americans held hostage by Iraq, but he declared, ``Our policy cannot change, and it will not change. America and the world will not be blackmailed.'' The president added: ``Vital issues of principle are at stake. Saddam Hussein is literally trying to wipe a country off the face of the Earth.'' In other developments: _A U.S. diplomat in Baghdad said Tuesday up to 800 Americans and Britons will fly out of Iraqi-occupied Kuwait this week, most of them women and children leaving their husbands behind. Saddam has said he is keeping foreign men as human shields against attack. On Monday, a planeload of 164 Westerners arrived in Baltimore from Iraq. Evacuees spoke of food shortages in Kuwait, nighttime gunfire and Iraqi roundups of young people suspected of involvement in the resistance. ``There is no law and order,'' said Thuraya, 19, who would not give her last name. ``A soldier can rape a father's daughter in front of him and he can't do anything about it.'' _The State Department said Iraq had told U.S. officials that American males residing in Iraq and Kuwait who were born in Arab countries will be allowed to leave. Iraq generally has not let American males leave. It was not known how many men the Iraqi move could affect. _A Pentagon spokesman said ``some increase in military activity'' had been detected inside Iraq near its borders with Turkey and Syria. He said there was little indication hostilities are imminent. Defense Secretary Dick Cheney said the cost of the U.S. military buildup in the Middle East was rising above the $1 billion-a-month estimate generally used by government officials. He said the total cost _ if no shooting war breaks out _ could total $15 billion in the next fiscal year beginning Oct. 1. Cheney promised disgruntled lawmakers ``a significant increase'' in help from Arab nations and other U.S. allies for Operation Desert Shield. Japan, which has been accused of responding too slowly to the crisis in the gulf, said Tuesday it may give $2 billion to Egypt, Jordan and Turkey, hit hardest by the U.N. prohibition on trade with Iraq. ``The pressure from abroad is getting so strong,'' said Hiroyasu Horio, an official with the Ministry of International Trade and Industry. Local news reports said the aid would be extended through the World Bank and International Monetary Fund, and $600 million would be sent as early as mid-September. On Friday, Treasury Secretary Nicholas Brady visited Tokyo on a world tour seeking $10.5 billion to help Egypt, Jordan and Turkey. Japan has already promised a $1 billion aid package for multinational peacekeeping forces in Saudi Arabia, including food, water, vehicles and prefabricated housing for non-military uses. But critics in the United States have said Japan should do more because its economy depends heavily on oil from the Middle East. Japan imports 99 percent of its oil. Japan's constitution bans the use of force in settling international disputes and Japanese law restricts the military to Japanese territory, except for ceremonial occasions. On Monday, Saddam offered developing nations free oil if they would send their tankers to pick it up. The first two countries to respond Tuesday _ the Philippines and Namibia _ said no. Manila said it had already fulfilled its oil requirements, and Namibia said it would not ``sell its sovereignty'' for Iraqi oil. Venezuelan President Carlos Andres Perez dismissed Saddam's offer of free oil as a ``propaganda ploy.'' Venezuela, an OPEC member, has led a drive among oil-producing nations to boost production to make up for the shortfall caused by the loss of Iraqi and Kuwaiti oil from the world market. Their oil makes up 20 percent of the world's oil reserves. Only Saudi Arabia has higher reserves. But according to the State Department, Cuba, which faces an oil deficit because of reduced Soviet deliveries, has received a shipment of Iraqi petroleum since U.N. sanctions were imposed five weeks ago. And Romania, it said, expects to receive oil indirectly from Iraq. Romania's ambassador to the United States, Virgil Constantinescu, denied that claim Tuesday, calling it ``absolutely false and without foundation.''.







Automatically generated summary graph



Conclusion

  • Experiments on the dataset used show:

    • Attributes that characterize the document semantic graph improve selection of triples for summarization.
    •  This results need to be verified on additional data sets
    •  Need to perform comparison with additional summarization methods
    •  Explore various strategies for extracting and generating summaries based on extracted triples.
  • We observe:

    • No combination of features that we examined lead to good separation of positive and negative triples in the feature space
    •  Opportunity for further investigations and improvements.


Yüklə 369,5 Kb.

Dostları ilə paylaş:




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə