Full metadata record
DC FieldValueLanguage
dc.contributorDepartment of Computingen_US
dc.creatorLuo, Weidong-
dc.identifier.urihttps://theses.lib.polyu.edu.hk/handle/200/3391-
dc.languageEnglishen_US
dc.publisherHong Kong Polytechnic University-
dc.rightsAll rights reserveden_US
dc.titleAi-Times : a parallel web news retrieval systemen_US
dcterms.abstractThe explosion in the availability of online information easily accessible through the Internet is a reality. As the available information increases, the inability to process, assimilate and use such large amount of information becomes more and more apparent. Online news information suffers from these problems. Currently available web news retrieval systems face a number of problems in that web-based news retrieval requires the ability to quickly and accurately process and update very large amounts of data that is constantly being updated. In this thesis, we present the design and implementation of Ai-Times, a parallel web news retrieval system the goal of which is to accurately retrieve and organize the web news information. This version of Ai-Times introduces the following novel algorithms: A novel optimized crawler algorithm whose fetching-speed is 6 times faster than that of the traditional crawler; A keen tag based extraction algorithm which can extract the data rich content with minimal manual effort and which also allows data to be classified as important or not important so that the crawler can revisit and update important data; A modified vector space model improved using query expansion and term reweighting and the most valuable contribution, an modified MapReduce improved by estimating the execution time of each subtask, which is proven to be able to reduce the number of the unusual tasks and shorten the whole job execution time.en_US
dcterms.extentx, 88 leaves : ill. ; 30 cm.en_US
dcterms.isPartOfPolyU Electronic Thesesen_US
dcterms.issued2007en_US
dcterms.educationalLevelAll Masteren_US
dcterms.educationalLevelM.Phil.en_US
dcterms.LCSHHong Kong Polytechnic University -- Dissertations.en_US
dcterms.LCSHNews Web sites.en_US
dcterms.LCSHWeb search engines.en_US
dcterms.LCSHInformation storage and retrieval systems -- Newspapers.en_US
dcterms.accessRightsopen accessen_US

Files in This Item:
File Description SizeFormat 
b21459344.pdfFor All Users2.18 MBAdobe PDFView/Open


Copyright Undertaking

As a bona fide Library user, I declare that:

  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show simple item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/3391