A study of document-context models in information retrieval

Wu, Ho-chung

Full metadata record

DC Field	Value	Language
dc.contributor	Department of Computing	en_US
dc.creator	Wu, Ho-chung	-
dc.identifier.uri	https://theses.lib.polyu.edu.hk/handle/200/6116	-
dc.language	English	en_US
dc.publisher	Hong Kong Polytechnic University	-
dc.rights	All rights reserved	en_US
dc.title	A study of document-context models in information retrieval	en_US
dcterms.abstract	In this thesis we study new retrieval models which simulate the "local" relevance decision-making for every term location in a document, these local relevance decisions are then combined as the "document-wide" relevance decision for the document. Local relevance decision for a term t occurred at the k-th location in a document is made by considering the document-context which is the window of terms centred at the term t at the k-th location. Therefore, different relevance scores (preferences) are obtained for the same term t at different locations in a document depending on its document-contexts. This differs from traditional models which term t receives the same score disregard of its locations in a document. Particularly, a hybrid document-context model is studied which is the combination of various existing effective models and techniques. It estimates the relevance decision preference of document-contexts as the log-odds and uses smoothing techniques as found in language models to solve the problem of zero probabilities. It combines the estimated preferences of document-contexts using different types of aggregation operators that comply with the relevance decision principles. The model is evaluated using retrospective experiments with full relevance information to reveal the potential of the model. The model obtained a mean average precision of 60% -80% in retrospective experiments using different TREC ad hoc English collections and the NTCIR-5 ad hoc Chinese collection. The experiments showed that the operators that are consistent with aggregate relevance principle were effective in combining the estimated preferences of document-contexts. Besides retrospective experiments, we also use top 20 documents from the initial ranked list to perform relevance feedback experiments with a probabilistic document-context model and the results are promising.	en_US
dcterms.abstract	We also showed that when the size of the document-contexts is shrunk to unity, the document-context model is simplified to a basic ranking formula that directly corresponds to the TF-IDF term weights. Thus TF-IDF term weights can be interpreted as making relevance decisions. This helps to establish a unifying perspective about information retrieval as relevance decision-making and to develop advance TF-IDF-related term weights for future elaborate retrieval models. Empirically, we show that, using four TREC ad hoc retrieval data collections, the IDF of a term t is related to the probability of randomly picking a non-relevant usage of the term t. Lastly, we apply the notion of document-context to develop a new relevance feedback algorithm. Instead of letting user to judge the documents from the top in the ranked document list, we split the ranked document list into multiple lists of document-contexts. Therefore, the judgement of relevance of the documents is not done sequentially. This is called active feedback and we show that in the experiments with various TREC data collections, our new relevance feedback algorithm using document-contexts obtained better results than the conventional relevance feedback algorithm and this is done more reliably than a maximal marginal relevance (MMR) method which does not use document-contexts. The experimental results suggest that using document-contexts can improve retrieval effectiveness.	en_US
dcterms.extent	x, 166 p. : ill. ; 30 cm.	en_US
dcterms.isPartOf	PolyU Electronic Theses	en_US
dcterms.issued	2011	en_US
dcterms.educationalLevel	All Doctorate	en_US
dcterms.educationalLevel	Ph.D.	en_US
dcterms.LCSH	Information storage and retrieval systems	en_US
dcterms.LCSH	Text processing (Computer science)	en_US
dcterms.LCSH	Hong Kong Polytechnic University -- Dissertations	en_US
dcterms.accessRights	open access	en_US

Files in This Item:

File	Description	Size	Format
b24415765.pdf	For All Users	1.16 MB	Adobe PDF	View/Open

Copyright Undertaking

As a bona fide Library user, I declare that:

I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show simple item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/6116