Design of scalable and efficient information retrieval systems

Pao Yue-kong Library Electronic Theses Database

Design of scalable and efficient information retrieval systems

 

Author: Sun, Xiaocui
Title: Design of scalable and efficient information retrieval systems
Degree: Ph.D.
Year: 2011
Subject: Information storage and retrieval systems -- Design.
Information retrieval.
Ethernet (Local area network system)
Hong Kong Polytechnic University -- Dissertations
Department: Dept. of Computing
Pages: xx, 153 p. : ill. ; 30 cm.
Language: English
InnoPac Record: http://library.polyu.edu.hk/record=b2462565
URI: http://theses.lib.polyu.edu.hk/handle/200/6309
Abstract: As the fast growing of the Internet, scalability and efficiency become the critical issues for future Internet design. Caching is an effective way to improve the scalability and efficiency for information retrieval systems. As the media applications such as YouTube and Facebook explosively increase, video caching has gained significant attention recently. By caching video chunks at the proxy servers close to end users, the server bandwidth requirement can be significantly reduced, and hence it can greatly improve the system performance. Ethernet is a widely used layer 2 technology for information retrieval system. Re-cently, Ethernet has gained popularity to be deployed in Metropolitan Area Networks (MANs). Metro Ethernet is highly attractive mainly due to its cost effectiveness, minimal management and maintenance, easy interoperability and convenience. Clearly, deploying the same technology, such as Ethernet, in both MAN and Local Area Networks (LAN) segments can potentially reduce the complexity and cost in network design and management, and hence improve packet forwarding performance. Even though Ethernet has many advantages, to be deployed in MANs, it has to solve the scalability issues. Ethernet has poor scalability due to using a flat addressing scheme (i.e., non-hierarchical Media Access Control (MAC) addresses) and broadcasting based address resolution scheme. In a MAN composing of a large number of LAN segments, a node in the provider network needs to keep a potential large number of MAC-to-port-mapping entries for frame forwarding. This may either cause MAC learning table explosion or excessive frame flooding, thus significantly degrading the system performance. Moreover, a broadcast-based address resolution scheme in Metro Ethernet introduces a large amount of broadcast messages into the provider network. Many protocols such as Address Resolution Protocol (ARP) and Dynamic Host Configuration Protocol (DHCP) use the broadcast service as a service discovery mechanism. The broadcast based address resolution schemes make Ethernet extremely convenient and easily accomplished. However, for MANs with millions of end users, high frequency broadcast messages waste a lot of bandwidth for address resolution. Frequently broadcast frames also accelerate the replacement of the table entries in Ethernet switches, which may lead to forwarding table explosion and hence trigger excessive frame flooding. Moreover, every end user has to take resource to handle every broadcast message.
This thesis aims at developing efficient and scalable information retrieval systems: to achieve this goal, an optimized cache replacement/group algorithm for video system has been designed to improve the scalability of the video system; To make Metro Ethernet scalable and efficient, an enabled Cache effect on Forwarding Table (CFT) scheme, an End user enabled Mac-in-Mac (EMiM) encapsulation scheme and two distributed Registration based address resolution Protocol (RP) are proposed in this thesis. Firstly, an Optimized Cache Replacement (OCR) scheme for video streaming is designed. OCR groups the users into different cache groups based on their request patterns. It calculates the user density among all possible intervals, and then selects the maximized user density group to cache. Based on the optimized scheme in a single cache, we also extend OCR for cooperative cache. The simulation results show that OCR can increase the hit ratio and reduce the server load. Secondly, we describe the CFT scheme in Metro Ethernet. CFT learns the IP and MAC mapping pair in a frame and eliminates the subsequent broadcast frames asking for this mapping. To further reduce the forwarding table size of Provider Edge (PE) node, CFT can be cooperated with EMiM. The forwarding table entries in Customer Edge (CE) node and PE node in this scheme are modified to learn both the IP and MAC addresses. By receiving a frame, the CE and PE nodes cache the IP-MAC address mapping carried in the frame. Once the mapping is recorded, it can be served for its subsequent requests by searching it in the cache. Hence, the CE and PE nodes not only response for forwarding but also answer the ARP request. The proposed architecture is easy to be accomplished and fully backward compatible. The simulation results show that the proposed scheme can decrease both the communication messages for address resolution and forwarding table size in PE nodes. Thirdly, an EMiM encapsulation scheme is proposed. In the proposed scheme, a user's MAC address as well as its PE node’s MAC address are associated with its ARP entry. The modified ARP entry allows an end user to do Mac-in-Mac (MiM) encapsulation directly by adding the destination user's MAC address and PE node's MAC address in the frame when it initiates a session. Hence a PE node does not need to maintain the entries of mapping end user's MAC address to PE node's MAC address, thus significantly reducing the forwarding table size. The simulation results show that the proposed scheme could provide high scalability by reducing up to 65% maximum forwarding table size in PE nodes. Finally, we propose two RPs. In a RP, multiple ARP registers are allocated to support address resolution. Each IP address has a home register which stores its ARP entry. When an end user moves to another location but keeps its IP address, its current PE or CE node is considered to be its foreign register. A foreign register temporally caches the ARP entry for an immigrated user and is in charge of the ARP entry updating in the home register. The IP address is used as an index to locate the corresponding home register through unicast, thus eliminating the broadcast to solve an unknown address. The proposed schemes can save more than 60% messages for address resolution and reduces up to 80% forwarding table size in PE nodes.

Files in this item

Files Size Format
b24625656.pdf 2.868Mb PDF
Copyright Undertaking
As a bona fide Library user, I declare that:
  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.
By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

     

Quick Search

Browse

More Information