FREE ELECTRONIC LIBRARY - Abstract, dissertation, book

Pages:     | 1 |   ...   | 15 | 16 || 18 | 19 |   ...   | 25 |

«Approximate Information Filtering in Structured Peer-to-Peer Networks Christian Zimmer Max-Planck Institute for Informatics Saarbrücken ...»

-- [ Page 17 ] --

In the US, more than 10 billion searches have been conducted at core search engines in November 2007. Here, the big players share the market as shown in Table 6.11. In Germany, Google is even more dominant with a market share of almost 90%2. Various projects have been started to build and operate a P2P Web search network (e.g., [TD04, LKP+ 05, CAPMN03, YVGM04]) including the Minerva project. Web search and Internet-scale file

content search seem to be perfect candidates for a P2P approach for several reasons:

• The Web is increasing at a much faster rate than the indexing capability of any centralized search engine [HT99, WMYL01, SE00]. In addition, the data is highly distributed by nature, residing on millions of sites.

• A P2P network could potentially outperform even the largest server farm in terms of processing power and could, thus, enable much more advanced methods (e.g., ontology-based background knowledge). A P2P Web search engine can potentially benefit from the intellectual input of a large user community, as every peer’s behavior is influenced by a human user.

• There is growing concern about the world’s dependency on a few monopolistic search engines and their susceptibility to commercial interests.

1 http://www.comscore.com 2 http://www.webhits.de

–  –  –

6.2 The Minerva Search System This section introduces the Minerva3 prototype for P2P search [BMWZ05, BMT+ 05a, BMT+ 05b]. The next sections elaborate the main principles of the Minerva search engine (Section 6.2.1), present its architecture (Section 6.2.2), and summarize some core fundamentals of its prototype implementation (Section 6.2.3).

6.2.1 System Principles In [LLH+ 03], approaches to comprehensive Web search based on a P2P network have been considered infeasible, or at least being a grand challenge, from a scalability viewpoint.

Early approaches typically spread inverted index lists across the directory such that each peer is responsible for maintaining a subset of index lists. Such systems allow for exact and complete execution of top-k style aggregation queries over the P2P network. However, bandwidth requirements and also latency issues raise concerns about their scalability claims.

Novel approaches [CW04, MTW05] try to overcome these challenges by utilizing efficient large-scale top-k query aggregation algorithms for distributed systems.

The system design of Minerva differs from the approaches mentioned above. Instead of disseminating inverted index lists across the directory, Minerva uses only pointers to promising peers (enriched with compact statistical metadata describing the index contents of that peer) and utilize these pointers to answer multi-keyword queries. Here, some fundamental design principles are listed that influenced the architecture of the Minerva search

system presented in the next section:

–  –  –

Figure 6.1: Minerva Search Architecture and Query Execution.

• Minerva does not forward requests to all possible peers in the network. The scalability principle ensures that only the most appropriate peers are involved in the query execution.

6.2.2 System Architecture The P2P Web search prototype system Minerva [BMT+ 05b] assumes a P2P collaboration in which every peer is autonomous and has a local index that can be built from the peer’s own crawls or imported from external sources representing the user’s interest profile. The local index contains inverted index lists with URLs for Web pages that contain specific keywords. A conceptually global but physically distributed directory, which is layered on top of a distributed hash table, holds compact, aggregated information about the peers’ local indexes. The DHT partitions the key space, such that each directory peer is responsible for maintaining the metadata about a randomized subset of keys. For failure resilience and availability, the metadata may be replicated across multiple directory peers. The three

steps of query execution work as follows (see Figure 6.1):

1. Directory Maintenance: Every peer publishes in step 1 a number of key-specific statistical summaries (posts) describing its local index to the directory (shown in Figure

6.1a). Posts contain contact information about the peer who published this summary together with statistical information to support appropriate query routing strategies (e.g., the size of the inverted list for the key, the maximum average score among the key’s inverted list entries, or various other statistical synopses [MBN+ 06]). The DHT is used to determine the directory peer responsible for this key.

2. Query Routing: If the results of a local query execution are unsatisfactory, the user can utilize in step 2 the distributed directory to identify more promising peers for a particular query as follows (shown in Figure 6.1b): for each query key, the query initiator identifies the peer that is currently maintaining the appropriate statistics, using the DHT functionality. Then the query initiator retrieves the relevant posts

- 107 Chapter 6 Prototype Implementation

by issuing requests directly to these peers. The statistical synopses contained in the posts are used to perform query routing, i.e., to identify the most promising peers for that particular query.

3. Query Processing: After a small subset of peers has been selected, the query is forwarded and executed based on their local indexes in step 3 (shown in Figure 6.1c).

Note that this communication is carried out in a point-to-point manner. Finally, the remote peers return their local results to the query initiator, where they are combined into a single result list (result merging).

This Minerva baseline approach can be extended such that multiple directories are utilized to maintain information beyond local index summaries, such as information about local bookmarks [BMWZ04], information about relevance assessments (e.g., from peer-specific query logs or click streams [KLFW06]), and (implicit or explicit) user feedback.

The upcoming sections discuss the two major challenges of query execution with Minerva, namely query routing in Section and result merging in Section Query Routing

Query routing is one of the key issues to make P2P search feasible. A user requesting a multi-keyword query expects a high-quality top-10 or top-100 ranked result list. Therefore, the system has to select the peers that have to answer the query. This decision is based on statistical information stored in the directory. [BMWZ05] introduces the baseline query routing approach utilizing resource selection strategies (e.g., CORI [CLC95], DTF [NF03], or GlOSS [GGM95]). These resource selection strategies (or database selection) originally been designed for distributed IR need to be adapted to the large-scale and the high dynamics of a P2P system. The work presented in [BMT+ 05a, MBTW06] introduces an overlap-aware query routing approach that takes the novelty of peers into account. Another extension [MBN+ 06] considers the correlations among keywords to improve P2P query routing. Caching strategies as shown in [ZBW08] can improve result-quality of P2P search and reduce response times by retrieving cached results of previously executed queries. In addition, aggressively reuse cached results of even subsets of a query towards an approximate caching technique can drastically reduce the bandwidth overheads.

–  –  –

Figure 6.2: Minerva Search Engine Implementation.

6.2.3 Implementation Minerva is implemented in a platform-independent way using Java 5. The software architecture of a peer is illustrated in Figure 6.2. Each peer is build on top of a globally distributed directory which is organized as a distributed hash table, e.g, Chord [SMK+ 01] or Pastry [RD01a]. Minerva utilizes the lookup functionality of the DHT to provide a mapping from keys to peers. Early versions of Minerva relied on a reimplementation of the Chord protocol, and more recent versions run as a FreePastry application, using Pastry’s network routing mechanisms and Past’s storage functionalities to become resilient to network dynamics and peer failures. The latter version works as follows: each peer maintains a PastryNode, implementing the PastryApplication interface, and is registered at a PastryEndpoint. Once registered, the PastryNode delivers incoming messages to the registered applications. There exist two different implementations of Past (PastImpl and GCPastImpl). GCPastImpl is an extension of PastImpl that offers garbage collection based on time-stamps. Minerva uses this extended version in order to prune outdated metadata objects after a specific time interval.

The DHT layer returns a Peer Descriptor object containing the contact information (e.g., IP address and port) of the peer currently responsible for a key. A Communicator is instantiated with this data to perform the communication with remote peers. Each peer runs an Event Handler listener that receives incoming messages and forwards them to the appropriate local components of a peer. Every peer has a Local Index holding the peer data using any database system capable of executing standard SQL commands (e.g., Oracle, MySQL, or Cloudscape/Derby). The index can be used for query execution by the Local Query Processor component. Additionally, the Poster component uses the local index to produce the key-specific summaries that are published to the global directory using the Communicator. Each peer implements a PeerList Processor to maintain the incoming posts, i.e., all posts from across the network regarding the subset of keys that the actual peer is currently responsible for. Notice that Past is designed to handle such inserts natively, such that recent versions of Minerva do not need to use a PeerList Processor.

- 109 Chapter 6 Prototype Implementation

When the user initiates a one-time query using Minerva, the Global Query Processor component uses the DHT to locate the peer responsible for each query key and retrieves the respective metadata using Communicator components. After appropriately processing the metadata, the Global Query Processor forwards the complete query to selected peers, which in turn process the query using their Local Query Processors and return their results.

In a final step, the Global Query Processor merges these remote results and presents the merged result to the user. There are some cases where Minerva peers communicate directly, i.e., without using the DHT lookup functionality.

6.3 The MAPS Filtering Extension The MAPS approximate information filtering approach introduced in this thesis has been integrated into the Minerva search prototype such that Minerva provides in addition to onetime searching an approximate publish/subscribe functionality in addition. Section 6.3.1 presents the implementation aspects concerning the extension of Minerva, while Section 6.3.2 explains the usage of the extended Minerva prototype by executing example one-time and continuous queries. There, the various parts of the graphical user interface (GUI) are illustrated.

6.3.1 Implementation In this section, the changes implemented to add the publish/subscribe functionality at the Minerva prototype are explained. Figure 6.3 shows how the three new components of a peer are integrated in the existing system.

Besides one-time queries, the modifications described below, allow issuing continuous queries. Thus, the Global Query Processor component is able to handle both types of queries as input. A continuous query has in addition a lifetime parameter to determine how long such a request should be valid. To process a continuous query requested by a user, the Global Query Processor utilizes the Time Analysis Storage component that maintains statistical metadata of active continuous queries. Therefore, the future behavior of publisher peers can be predicted by applying time series analysis techniques to stored metadata as described in Chapter 3 in detail.

Collecting publisher metadata is performed similarly to the one-time querying by utilizing the lookup functionality of the DHT, and asking the peers responsible for the keys in the continuous query. Having selected the most promising publisher peers for a query, the Global Query Processor uses the Communicator to send the continuous query to the remote peers.

Whenever a peer receives a continuous query (as an event of the Event Handler ), the query is stored at the Continuous Query Store. This store denotes the second new component for filtering. Each time, a peer inserts a new query, outdated queries stored at the peer are removed from the Continuous Query Store.

The third new component to integrate approximate publish/subscribe functionality to Minerva, is the Publisher module. This component receives new documents as input and adds these documents to the Local Index. In addition, the Publisher checks the Continuous Query Storage for active continuous queries matching the publication. Subscriptions that are no longer valid (e.g., because the lifetime has expired) are removed from the store.

Checking the matching continuous queries delivers a set of peers that have subscribed with one of these queries, and have to be notified about the published document. The Publisher component sends a notification message using the Communicator to inform the subscriber peers about the new document.

–  –  –

Figure 6.3: Extended MAPS Implementation with IF Components.

6.3.2 Example Use of MAPS This section showcases the usage of Minerva (respectively MAPS) with the extended approximate publish/subscribe functionality by means of screenshots taken from the latest prototype version. It also serves as a short explanation how one-time and continuous query functionality is used within Minerva.

In this showcase example, 10 Cloudscape or Derby databases are used to host about 100 documents per database. Thus, 10 peers can be created to manage one of the 10 collections each. To publish new documents, there is an additional database that hosts about 100, 000 additional documents simulating the input from a crawling component such as BINGO!

Pages:     | 1 |   ...   | 15 | 16 || 18 | 19 |   ...   | 25 |

Similar works:

«PERCEIVED VALUE OF MALAYSIAN LOW·COST AIRLINES: THE VIEWS OF THE EXISTING DOMESTIC CUSTOMERS Jennifer Chan Kim Lian and Eileen Yeoh Universiti Malaysia Sabah Locked Bag 2073 88999 Kota Kinabalu, Sabah, Malaysia ABSTRACT Paper accepted for presentation inAsia Pacific Marketing and Management Conference, 11th 9th November 2011.•. This paper aims to explore the meaning of perceived value of Malaysian low cost airlines from the perspectives of. ·~their existing domestic customers. The...»

«XIX INCOSAI Mexiko-Abkommen Präambel Auf der 55. Präsidialtagung wurden die Fachthemen für die Diskussionen des XIX. INTOSAI-Kongresses festgelegt und die für den Themenvorsitz verantwortlichen ORKB bestimmt. Da der Kongress ein privilegiertes Forum für den Erfahrungsaustausch aller Länder darstellt, in dem aufgetretene Probleme und die entsprechenden Lösungen vorgestellt werden können, verabschiedeten die Mitglieder des Präsidiums zwei Fachthemen, die im Bereich der heutigen...»

«21.12.2015 Gericht BVwG Entscheidungsdatum 21.12.2015 Geschäftszahl W211 1424913-2 Spruch W211 1424913-2/10E IM NAMEN DER REPUBLIK! Das Bundesverwaltungsgericht hat durch die Richterin Mag.a SIMMA als Einzelrichterin über die Beschwerde von XXXX, geboren am XXXX, StA. Somalia, gegen den Bescheid des Bundesasylamtes vom XXXX, Zl.XXXX, nach Durchführung einer mündlichen Verhandlung zu Recht erkannt: A) Die Beschwerde wird gemäß § 3 Abs. 1 AsylG 2005 als unbegründet abgewiesen. B) Die...»

«Institut für Tierwissenschaften Rheinische Friedrich-Wilhelms-Universität Bonn Effects of medium-chain fatty acids and ration type on in vitro ruminal methane production Inaugural-Dissertation zur Erlangung des Grades Doktor der Agrarwissenschaften (Dr. agr.) der Landwirtschaftlichen Fakultät der Rheinischen Friedrich-Wilhelms-Universität Bonn vorgelegt im Oktober 2014 von M. Sc. Taufiq Wisnu Priambodo aus Jakarta, Indonesien Referent: Prof. Dr. Karl-Heinz Südekum Korreferent: PD Dr....»

«This is a summary of the meeting of the Accounting Standards Accounting Forum (ASAF) held in London on 25-26 September 2013. It was prepared by staff of the IASB, and is a high-level summary of the discussions that took place. A full recording of the meeting is available on the IASB website. The meeting was chaired by Hans Hoogervorst, Chairman of the IASB, with Ian Mackintosh, Vice-Chairman of the IASB. ASAF Members attending Kim Bromfield for the South African Financial Reporting Standards...»

«NEFAB Project Feasibility Study Report Initiative 5 Optimisation of Ancillary Services NEFAB FEASIBILITY STUDY Appendix 5 Optimisation of Ancillary Services Page 1 of 24 TABLE OF CONTENTS 1. EXECUTIVE SUMMARY 2. DESCRIPTION OF THE INITIATIVE 2.1 Scope of the Initiative 4 2.2 AIS/AIM 5 2.3 MET 5 2.4 Briefing 6 3. RATIONALE AND PURPOSE OF THE INITIATIVE 4. DESCRIPTION OF CURRENT STATUS OF SERVICES 4.1 Aeronautical Information Service 8 4.2 MET 9 4.3 Briefing 9 5. ONGOING DEVELOPMENT 5.1 Baseline...»

«Freie und Hansestadt Hamburg Behörde für Stadtentwicklung und Umwelt A7, 6-/8-streifige Erweiterung von der AS HH-Othmarschen bis zur Landesgrenze HH/SH Verkehrskonzept für das nachgeordnete Netz während der Bauzeit Bauabschnitte Schnelsen und Stellingen Berlin, 10.08.2010 im Auftrag von: DEGES Deutsche Einheit Fernstraßenplanungsund -bau GmbH Aufgestellt: Beratung Forschung Erhebung Analyse Planung Bauleitung Stadtund Verkehrsplanung A7, 6-/8-streifige Erweiterung von der AS...»

«Scientometrics DOI 10.1007/s11192-011-0369-y Sex differences in research funding, productivity ´ and impact: an analysis of Quebec university professors ` ´ Vincent Lariviere • Etienne Vignola-Gagne • Christian Villeneuve • ´ Pascal Gelinas • Yves Gingras Received: 14 October 2010 Ó Akademiai Kiado, Budapest, Hungary 2011 ´ ´ Abstract Using the entire population of professors at universities in the province of Quebec (Canada), this article analyzes the relationship between sex and...»

«Disclaimer: This is a machine generated PDF of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace original scanned PDF. Neither Cengage Learning nor its licensors make any representations or warranties with respect to the machine generated PDF. The PDF is automatically generated AS IS and AS AVAILABLE and are not retained in our systems. CENGAGE LEARNING AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS...»

«Militarization and Identity on Guahan/Guam: Exploring intersections of indigeneity, gender and security Ronni Alexander Abstract While the Pacific island of Guahan/Guam is best known as a vacation resort, in fact much of the island has been taken to house US military bases and other facilities. Over 400 years of military colonialism has devalorized the local ‘CHamoru’ culture, making it virtually invisible not only for those outside of Guahan/Guam but also for many CHamoru people...»

«Enforcement Instructions and Guidance Chapter 48 – Making flight arrangements 48.1 Introduction 48.2. Collection of tickets 48.3. Procedures for deferral of removal 48.4. Paragraph 10 removals without prejudice 48.5. Carrier's obligations 48.6. Voluntary departures at own expense 48.6.1 Voluntary departures other than at own expense 48.7. Removing Irish Land Border illegal entrants 48.8. Removing BDTC, BOTC, BNO and BOC passport holders 48.9 Illegal entrants required as witnesses 48.10....»

«ZUSAMMENFASSUNG und EVALUATION des WORKSHOPS „Sex und Gender in Neurowissenschaft und Genetik -Grundlagen-„ 26.04. – 27.04.2012 in Essen 0 Inhaltsverzeichnis Vorwort Zusammenfassung des Workshops Ziel Aufteilung der TeilnehmerInnen Ablauf Evaluation des Workshops Organisation Inhalt Fazit Anhang 1: Zusammensetzung des Workshops Anhang 2: Programm Literaturverzeichnis 1 Vorwort Liebe TeilnehmerInnen, unser Workshop „Sex und Gender in Neurowissenschaft und Genetik -Grundlagen-“...»

<<  HOME   |    CONTACTS
2016 www.abstract.xlibx.info - Free e-library - Abstract, dissertation, book

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.