FREE ELECTRONIC LIBRARY - Abstract, dissertation, book

Pages:     | 1 | 2 || 4 | 5 |   ...   | 25 |

«Approximate Information Filtering in Structured Peer-to-Peer Networks Christian Zimmer Max-Planck Institute for Informatics Saarbrücken ...»

-- [ Page 3 ] --

Beyond that, new algorithms to approximate multi-key statistics by combining the statistics of arbitrary subsets are provided. Whereas hash sketches, used for compactly representing the documents, yield inaccurate results when considering continuous queries with more than two keys, the usage of very recent state-of-the-art techniques (KMV synopses) for compact representation of multisets is proposed and applied. These new structures allow the system to compute accurate synopses for multi-key queries, and improve the filtering effectiveness.

The experimental evaluation of both algorithms (USS and CSS) illustrates the filtering performance improvements in comparison to the basic MAPS approach. All experimental series use two different real-world collections for Web and blog data, and apply real-world queries from Google Zeitgeist. The evaluation also investigates filtering performance gains depending on the introduced correlation measure (conditional probability) representing a way to compute the relatedness among keys.

1.2.3 Prototype System Within this thesis, a prototype implementation [ZHTW08] has been developed. This prototype was initially meant to serve as a testing environment for the conducted experiments, in order to evaluate the novel methods for approximate filtering, but also aimed at demonstrating the feasibility of the combination of retrieval and filtering functionality in a single unifying system.

Thus, the approximate information filtering approach MAPS introduced in this thesis has been integrated into the Minerva search prototype [BMT+ 05b] such that Minerva also provides an approximate publish/subscribe functionality. In this regard, the implementation aspects concerning the extension of Minerva are investigated and some new components (to subscribe with continuous query, and to publish new documents) upgrade the existent prototype. An extensive use case describes the appliance of the extended Minerva prototype by executing sample one-time and continuous queries in detail. There, the various parts of the graphical user interface are illustrated. In this context, a short overview of existing retrieval and filtering prototypes is given.

1.2.4 Digital Library Use Case The thesis presents a digital library use case (called MinervaDL) as an application scenario to support approximate information retrieval and filtering functionality in a single

–  –  –

unifying framework. MinervaDL, as introduced in [ZTW07], is hierarchical and utilizes a distributed hash table to achieve scalability, fault-tolerance, and robustness in its routing layer. The MinervaDL architecture allows handling huge amounts of data provided by DLs in a distributed and self-organizing way, and provides an infrastructure for creating large networks of digital libraries with minimal administration costs. There are two kinds of

basic functionality that are offered in the DL architecture of MinervaDL:

• In an information retrieval scenario (or one-time querying), a user poses an one-time query and the system returns all resources matching the query (e.g., all currently available documents relevant to the requested query).

• In an information filtering scenario (or publish/subscribe or continuous querying), a user submits a continuous query (or subscription) and will later be notified from the DL system about certain events of interest that take place (i.e., about newly published documents relevant to the continuous query).

The proposed DL architecture is built upon a distributed directory storing metadata.

The thesis presents routing protocols for information filtering and retrieval that use this directory information to perform the two functionalities. MinervaDL distinguishes three

main components:

• Super-peers run the DHT protocol and form a distributed directory that maintains metadata about providers’ local knowledge in compact form. In MinervaDL, superpeers are responsible for serving information consumers and providers and act as their access point to the network. Super-peers can be deployed by large institutions like universities, research centers or content providers to provide access points for their users or digital libraries.

• Consumers are utilized by users (e.g., students, faculty or employees) to connect to the MinervaDL network, using a single super-peer as their access point. Utilizing a consumer peer allows users to pose one-time queries, receive relevant resources, subscribe to resource publications with continuous queries and receive notifications about published resources (e.g., documents) that match their interests.

• Providers are implemented by information sources that want to expose their content to the MinervaDL network. Typical examples are digital libraries deployed by larger institutions, like research centers or content providers (e.g., CiteSeer, ACM, Springer, or Elsevier). Provider peer use a directory peer (super-peer) as their access point and utilize it to distribute statistics about their local resources to the network. Providers answer one-time queries and store continuous queries submitted by consumers to match them against new documents they publish.

This thesis presents different scoring functions to select appropriate provider peers answering a one-time query or storing a continuous query for future matching publications.

The experimental evaluation investigates the influence of the individual scoring functions used to perform the two functionalities of MinervaDL. Furthermore, MinervaDL is compared to another DL architecture for P2P networks (LibraRing) providing retrieval and filtering at the same time. Unlike MinervaDL, this architecture provides exact retrieval and filtering functionality.

- 13 Chapter 1 Introduction

1.3 Thesis Outline

This thesis is comprised of eight chapters, the first being the current introductory chapter. The remainder of this thesis is organized as follows. Chapter 2 gives an background overview about existing work in the areas of P2P, information retrieval, information filtering, time series analysis, and distinct-value estimation. Chapter 3 introduces the main architecture that is used for approximate information retrieval and information filtering in structured P2P networks. This architecture extends the Minerva search architecture with the MAPS approach for approximate information filtering. This chapter also presents the services and protocols of the complete architecture but with focus on approximate filtering.

Chapter 4 focuses on publisher peer selection in the approximate information filtering approach MAPS and includes an extensive evaluation. Chapter 5 extends the MAPS approach and considers correlated keys. Two different algorithms that improve filtering quality are presented and compared to the baseline approach. Chapter 6 presents the implemented prototype system for approximate information filtering. This prototype extends the Minerva P2P search prototype allowing one-time searching. In Chapter 7, a DL use case for searching and filtering in a distributed digital library environment is put forward. The proposed DL architecture combines two functionality using the same infrastructure. This thesis is concluded in Chapter 8 by summarizing the main results and pointing out some open questions and directions for future work in this area.

- 14 Chapter 2 Background and Technical Basics Chapter 2 Background and Technical Basics This chapter introduces some background and state-of-the-art work used to develop this doctoral thesis. First, a short overview of existing P2P architectures and protocols is described in Section 2.1 including the basic underlying infrastructure. Sections 2.2 and

2.3 present necessary background knowledge about information retrieval and information filtering.

The main techniques in the area of time series analysis are explained in Section 2.4, providing the background for the publication prediction methods. Finally, Section 2.5 presents two data structures (or synopses) for distinct-value estimation on large data sets that are used to count the number of distinct data items within a multi-set.

2.1 Peer-to-Peer Systems In recent years, the peer-to-peer approach has been receiving increasing attention and has become a true hype paradigm for communication on the Internet and the World Wide Web. Although becoming popular mainly in the context of file sharing applications (e.g., Napster, Gnutella, or BitTorrent), the P2P paradigm can be applied to access any kind of distributed data. Currently, P2P is making its way into distributed data management systems and is offering possibilities for new Web applications.

In contrast, the traditional client-server approach (shown in Figure 2.1) requires a huge amount of effort and resources to meet the increasing challenges of the continuously growing resources shared over the Internet. This centralized client-server network model results in increased load for the server component, and creates scalability bottlenecks and single-points-of-failure, where a failure of one entity results in terminating the functionality of the whole system. As a consequence, such systems can easily be attacked, e.g., by denial-of-service attacks. In addition, dedicated servers are often difficult and expensive to administrate and to relocate due to their strategic placement within the Internet infrastructure.

Contrary to the client-server model, P2P computing promises to offer enormous potential benefits to important issues including scalability, security, reliability, efficiency, flexibility, and resilience to failures and dynamics. [SW05] discusses in detail the issues addressed by this fundamental paradigm shift.

An exact answer to the question What exactly is P2P? is not obvious. Especially the early P2P applications that have not even been true P2P in a strict sense, make it difficult to give a precise definition. The Internet encyclopedia Wikipedia currently defines the

following aspects of a P2P network:

• A peer-to-peer (or "P2P", or, rarely, "PtP") computer network exploits diverse connectivity between participants in a network and the cumulative bandwidth of network

–  –  –

A more technical definition of P2P systems (shown in Figure 2.2) can be found in [Ora01, SW05]: A peer-to-peer system is a self-organizing system of equal, autonomous entities (peers) which aims for the shared usage of distributed resources in a networked environment avoiding central services. Both definitions share the idea of decentralization and point at decentralized resource usage and decentralized self-organization as potential benefits.

2.1.1 The Lookup Problem How to find a certain data item in a large P2P system in a scalable manner without any centralized service? This problem is at the heart of any P2P system [BKK+ 03] and is often referred to as the lookup problem. The main reason of this challenge is caused by decentralization such that the main system strength also yields to the main system challenge.

In contrast to centralized client-server systems, where data is provided by dedicated physical entities that are explicitly referenced, P2P systems store data in multiple, distant, transient, and unreliable locations within the network. Thus, the efficient location of data stored in the network is one of the predominant challenges of a P2P system.

–  –  –

Figure 2.2: Peer-to-Peer Network Architecture.

2.1.2 P2P Architectures There are several ways to classify P2P networks. One approach considers the application a P2P network is used for (e.g., file sharing, telephony, media streaming etc.). Another approach includes the degree of centralization and distinguishes between pure P2P without central server (peers act as equals) and networks with central server keeping information on peers. In this context, the following terminology can be found: centralized, or decentralized P2P networks, structured, unstructured, or hybrid (so-called super-peer architectures) P2P networks.

The research literature commonly tries to classify the existing approaches to deal with the lookup problem. The following sections present the respective approaches in more detail.

Subsequent, two example protocols of P2P architectures are presented: Chord (2.1.3) and Pastry (2.1.4). Centralized P2P Architectures

The first occurrence of the P2P paradigm in a broader public cognition was probably Napster 1. This famous file-sharing system elegantly solved the lookup problem by utilizing an architecture in which a centralized entity provides a directory service to all participating peers (or users), effectively forming a star network. All peers joining the system have to register their data (mostly music files in the early days) with this centralized server thus allowing a comfortable way for other peers in the system to locate any data in the network by presence of a physically centralized directory. The fact that only pointers to decentralized available peers are stored at the centralized server (instead of the actual data) conceptually decreases the load at the central entity; the fact that (after relevant data has been located by means of the directory) each peer could directly communicate with other peers that store the data in a decentralized manner, completely bypassing the centralized directory entity, drives the perception of Napster as a P2P system.

1 Napster’s brand and logo continue to be used by a pay service, having been acquired by Roxio.

–  –  –

Figure 2.3: Unstructured P2P Network with Message Flooding.

However, in a pure P2P system, it should be possible to remove any entity from the network without loss of functionality. Instead, a peer should conceptually fulfill both roles as server and client, such that the functionality of the system is spread equally over all the participating peers in the network. Thus, by this definition, Napster can not be denoted as a P2P system. The shutdown of the the centralized server of Napster by legal authorities allowed the easy shutdown of the complete system. Unstructured P2P Architectures

Pages:     | 1 | 2 || 4 | 5 |   ...   | 25 |

Similar works:

«Annex 1: News AIR TRANSPORT: QUARTERLY REPORT NO.12 3rd QUARTER 2006 (July to September) A EU/REGULATORY Commission approves public financing for six regional airports in Ireland High-level conference discusses future of aviation regulation EU – African Union aviation seminar: meeting the needs of modern aviation Commission approves changes to social aid for Martinique EU and the Republic of Korea seal their agreement on GALILEO Commission opens formal enquiry into restrictions to air...»

«King Henry IV, Second Part by William Shakespeare [Chiswick edition] King Henry IV, Second Part by William Shakespeare [Chiswick edition] KING HENRY IV, SECOND PART by William Shakespeare Dramatis Personae RUMOUR, the Presenter. KING HENRY the Fourth. His sons HENRY, PRINCE OF WALES, afterwards King Henry V.THOMAS, DUKE OF CLARENCE. PRINCE JOHN OF LANCASTER. PRINCE HUMPHREY OF GLOUCESTER. EARL OF WARWICK. EARL OF WESTMORELAND. page 1 / 209 EARL OF SURREY. GOWER. HARCOURT. BLUNT. Lord Chief...»

«Начало мудрости (Псалом 110:10) Свидетельства раввинов Copyright © by Good News Society, Johannesburg, South Africa Copyright of the Israeli edition © 1992 by Keren Ahvah Meshihit P. 0. Box 10382, Jerusalem 91ЮЗ Printed in Israel, September 1992 ISBN 965-447-005-5 No part of this publication may be reproduced in any form without prior permission in writing from the publisher. Авторское право © Гуд Нью с Сисайти...»

«Brochure More information from http://www.researchandmarkets.com/reports/3173520/ The Netherlands: Prepared Anchovies Market Description: This report presents a comprehensive overview of the prepared anchovies market in the Netherlands and a forecast for its development in the next five years. It provides a detailed analysis of the market, its dynamics, structure, characteristics, main players, growth and demand drivers, etc. The purpose of the report is to describe the state of the prepared...»

«Copyright by John F. Cline The Dissertation Committee for John F. Cline Certifies that this is the approved version of the following dissertation: Permanent Underground: Radical Sounds and Social Formations in 20th Century American Musicking Committee: Mark C. Smith, Supervisor Steven Hoelscher Randolph Lewis Karl Hagstrom Miller Shirley Thompson Permanent Underground: Radical Sounds and Social Formations in 20th Century American Musicking by John F. Cline, B.A.; M.A. Dissertation Presented to...»

«FRO301232.I o... *DE019264587* STRATEGIC ELEMENTS OF STEAM CYCLE CHEMISTRY CONTROL PRACTICES AT TXU'S COMANCHE PEAK STEAM ELECTRIC STATION B. FELLERS, J. STEVENS, G. NICHOLS, All TXU ELECTRIC ABSTRACT Early industry experience defined the critical importance of Chemistry Control Practices to maintaining long-term performance of PWR steam generators. These lessons provided the impetus for a number of innovations and alternate practices at Comanche Peak. For example, advanced amine...»

«Computer Assisted Instruction In The Teaching Of Japanese As A Second Language Regarding the prospect day will steal you double self-awareness of your spot. Every employees focused and the storage is a global sales on another available sphere. You promise compared of the CRA Financial South Days March, but for our organization of the epub for the Ouachita, and there find not color effects if customer to tell in a different growth for tractors, much or also, law for a types. You never has the...»

«First Christian Reformed Church of Kingston The mission of First Christian Reformed Church of Kingston is to honour and glorify our Lord by leading people into a devoted relationship with Jesus Christ. May 18, 2014 Announcements Today’s Services and Fellowship Today we welcome Elder Elizabeth W. as she leads the service. Pastor Kevin has been invited to lead a service elsewhere. We pray the Lord’s blessing on these services this morning. Upcoming Services and Fellowship May 25 – 10:00 AM...»

«Constantinople or the Sensual Concealed. The Imagery of Sean Scully Glancing from a moving train or car at the world flashing by, at the landscape and the people, the objects distort into coloured stripes. The greater our speed the less discernible these objects appear. Leaving behind and surrendering their materiality, the stripes encapsulate in their growing deformation and abstraction the minor and major events transpiring ‘outside’. It is this intimation of the real world, of the...»

«heart rate monitor herzfrequenz pulsuhr german|engineering PM 80 Operating Instructions Gebrauchsanleitung TABLE OF CONTENTS G Scope of delivery Important Notes General Information for Training Functions of the HR monitor Transmission of signal and methods of Devices Measurement.9 Getting started General operation of the HR monitor Buttons on the HR watch Display Menus Basic settings Overview Enter personal data Set training zone Set units Pairing the signals from the device Watch settings...»

«The Russian Financial Crisis: Causes and Effects on ENI Countries Robert M. Birkenes, USAID/PPC/CDIE/DIO/ESDS John A. Pennell, USAID/PPC/CDIE/DIO/RRS June 4, 1999 Introduction: Scope and Purpose of Paper On 17 August 1998, the Russian government defaulted on its GKO Treasury Bonds, imposed a 90-day moratorium on foreign debt payments, and abandoned the ruble exchange rate corridor. Within the next couple of weeks, the Russian Central Bank announced that it would stop selling U.S. dollars,...»

«Annulment: The Process and its Meaning By Patrick Lagges, JCD For some, it is a source of healing; for others, a source of scandal. For most, it remains a dark, murky process that is heard about only through rumor and gossip. The subject of declarations of nullity in the Roman Catholic Church has been a source of misunderstanding for many Catholics, especially those who grew up in the Church prior to the Second Vatican Council. For many people with a traditional Catholic education, annulments...»

<<  HOME   |    CONTACTS
2016 www.abstract.xlibx.info - Free e-library - Abstract, dissertation, book

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.