Full metadata record
DC FieldValueLanguage
dc.contributorDepartment of Computingen_US
dc.creatorWang, Dayu-
dc.identifier.urihttps://theses.lib.polyu.edu.hk/handle/200/7369-
dc.languageEnglishen_US
dc.publisherHong Kong Polytechnic University-
dc.rightsAll rights reserveden_US
dc.titleA study on discourse type based information retrievalen_US
dcterms.abstractIn ad hoc information retrieval (IR), some information need (e.g., find the advantages and disadvantages of smoking) requires the explicit identification of information related to the discourse type (e.g., advantages/disadvantages) as well as to the topic entity (e.g., smoking). Such information need is not uncommon and may not be easily satisfied by using conventional retrieval methods. So we propose the retrieval methods considering the discourse type of topics. We propose IU similarity models and graph-based models to compute the similarity between a part of document (called information unit, IU in short) and a set of topic entity terms. Experimental results show that our IU similarity models with different term weighting schemes perform quite well and they are able to overcome the difficulties caused by the small size of IU. We also propose graph-based models which can compute the similarity of an IU based on topic entity terms only or based on both topic entity terms and discourse types based terms. In graph-based models, the basic unit is an edge that links two terms which are possibly two distinct topic entity terms, or a topic entity term and a discourse type term. These two models can be regarded as baselines of IU-based retrievals that do not rely on any discourse type information. In actual documents, some individual terms are not adequate to present a discourse type. We focus on text patterns that have more powerful expression ability. We use word sequences, POS-tag sequences and the mix of both to match phrases and expression in order to find the text patterns that relate with a specific discourse type. These text patterns can also be selected by regarding the different types of sequences as features in a pattern recognition application. These text patterns are used to quantify whether an IU contains the information on a specific discourse type. For evaluation, we focused on some discourse types that can easily be identified in the TREC topics that are not satisfied very well using conventional retrieval models. We evaluated the discourse type based retrieval using our novel retrieval models and based on the text patterns mined by some selection conditions or learning algorithms. We showed that our concept of discourse type and corresponding solutions are able to enhance the retrieval effectiveness for the selected TREC topics.en_US
dcterms.extentviii, 286 p. : ill. ; 30 cm.en_US
dcterms.isPartOfPolyU Electronic Thesesen_US
dcterms.issued2013en_US
dcterms.educationalLevelAll Doctorateen_US
dcterms.educationalLevelPh.D.en_US
dcterms.LCSHInformation retrieval.en_US
dcterms.LCSHDiscourse analysis.en_US
dcterms.LCSHHong Kong Polytechnic University -- Dissertationsen_US
dcterms.accessRightsopen accessen_US

Files in This Item:
File Description SizeFormat 
b26818103.pdfFor All Users4.18 MBAdobe PDFView/Open


Copyright Undertaking

As a bona fide Library user, I declare that:

  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show simple item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/7369