A study on discourse type based information retrieval

Wang, Dayu

Author:	Wang, Dayu
Title:	A study on discourse type based information retrieval
Degree:	Ph.D.
Year:	2013
Subject:	Information retrieval. Discourse analysis. Hong Kong Polytechnic University -- Dissertations
Department:	Department of Computing
Pages:	viii, 286 p. : ill. ; 30 cm.
Language:	English
Abstract:	In ad hoc information retrieval (IR), some information need (e.g., find the advantages and disadvantages of smoking) requires the explicit identification of information related to the discourse type (e.g., advantages/disadvantages) as well as to the topic entity (e.g., smoking). Such information need is not uncommon and may not be easily satisfied by using conventional retrieval methods. So we propose the retrieval methods considering the discourse type of topics. We propose IU similarity models and graph-based models to compute the similarity between a part of document (called information unit, IU in short) and a set of topic entity terms. Experimental results show that our IU similarity models with different term weighting schemes perform quite well and they are able to overcome the difficulties caused by the small size of IU. We also propose graph-based models which can compute the similarity of an IU based on topic entity terms only or based on both topic entity terms and discourse types based terms. In graph-based models, the basic unit is an edge that links two terms which are possibly two distinct topic entity terms, or a topic entity term and a discourse type term. These two models can be regarded as baselines of IU-based retrievals that do not rely on any discourse type information. In actual documents, some individual terms are not adequate to present a discourse type. We focus on text patterns that have more powerful expression ability. We use word sequences, POS-tag sequences and the mix of both to match phrases and expression in order to find the text patterns that relate with a specific discourse type. These text patterns can also be selected by regarding the different types of sequences as features in a pattern recognition application. These text patterns are used to quantify whether an IU contains the information on a specific discourse type. For evaluation, we focused on some discourse types that can easily be identified in the TREC topics that are not satisfied very well using conventional retrieval models. We evaluated the discourse type based retrieval using our novel retrieval models and based on the text patterns mined by some selection conditions or learning algorithms. We showed that our concept of discourse type and corresponding solutions are able to enhance the retrieval effectiveness for the selected TREC topics.
Rights:	All rights reserved
Access:	open access

Files in This Item:

File	Description	Size	Format
b26818103.pdf	For All Users	4.18 MB	Adobe PDF	View/Open

Copyright Undertaking

As a bona fide Library user, I declare that:

I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show full item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/7369