FREE ELECTRONIC LIBRARY - Abstract, dissertation, book

Pages:     | 1 |   ...   | 7 | 8 || 10 | 11 |   ...   | 25 |

«Approximate Information Filtering in Structured Peer-to-Peer Networks Christian Zimmer Max-Planck Institute for Informatics Saarbrücken ...»

-- [ Page 9 ] --

This scoring function can be based on standard resource selection approaches [LCC00] from the IR literature (e.g., CORI). However, as it will be shown in Chapter 4, these approaches alone are not sufficient in an IF setting, since they were designed for retrieval scenarios, in contrast to the IF scenario considered here, and are aimed at identifying specialized authorities. Once the peers that will store q have been determined, S uses the IP addresses associated with them to construct a message IndexQ(id(S), ip(S), q).

The message is forwarded to the peer that will store q. When a publisher P receives a message IndexQ containing q, it stores q using a local query indexing mechanism such as [TKD04, YGM99, TKD08]. This procedure is shown in step 2 of Figure 3.3, where S contacts publishers P1, P2 and P3. Filtering and peer selection are dynamic processes, therefore periodic query repositioning, based on user-set preferences, is necessary to adapt to changes in publisher’s behavior. To reposition an already indexed query q, a subscriber would re-execute the subscription protocol, to acquire new peer statistics, compute a new ranking, and appropriately modify the set of peers indexing q.

3.3.3 Publication and Notification Protocol The publication service is employed by users that want to expose their content to the network. A publisher P utilizes the directory to update statistics about the keys contained in the documents it publishes. All queries that match a published document produce appropriate notifications to interested subscribers. According to this, the procedure followed by P at publication time is as follows.

When a document d is published by P, it is matched against P ’s local query database to determine which subscribers should be notified. Then, for each subscriber S, P constructs a notification message Notify(id(P ), ip(P ), d) and sends it to S using the IP address associated with the stored query (shown in Figure 3.4). If S is not online at notification arrival, then P utilizes function send() to send the message through the DHT, by using the id(S) also associated with q. In this way, S will receive the message from its successor upon reconnection. Notice that peers publishing documents relevant to a query q, but not storing it, will produce no notification (peer P4 in Figure 3.4).

–  –  –

3.3.4 One-Time Search Protocol The one-time search protocol is similar to the subscription protocol, and is used by peers to request one-time queries to the network. It is assumed that a querying peer Q requests a multi-key or multi-keyword query q of the form k1 k2... kk with t distinct keys. Q needs to determine which content providers in the network are currently the most promising candidates to satisfy the query. To do resource selection, metadata about data sources is collected from the directory using the directory service and protocol. Thus, a ranking of these information sources is calculated based on resource selection techniques. In this theses, a full discussion of query routing in Minerva is not possible for space reasons (see [BMT+ 05a, MBN+ 06].

To collect statistics about data sources, Q needs to contact all directory peers responsible for the query keys. Similar to the subscription protocol, for each query key ki, Q computes H(ki ), which is the identifier of the peer responsible for storing statistics about other peers that publish documents containing the key ki. The same CollectStats(id(S), ip(S), ki ), is created and the function send() is used to forward the message in O(log n) hops to the peer responsible for identifier H(ki ). A peer D receiving the message searches its local store of Post messages to retrieve the peer list Li of all posts of the key. Subsequently, a message RetStats(Li, ki ) is created by D and sent to Q using its IP found in the CollectStats message. Figure 3.5 illustrates this first step where Q contacts directory peers D1 and D4. Once Q has collected all the peer lists Li for the keys contained in q, it utilizes an appropriate scoring function score(P, q) to compute a peer score with respect to q, for each one of the peers P contained in Li. Based on the score calculated for each peer, a ranking of peers is determined and the top-k ranked peers are candidates for answering the request q. In contrast to the subscription protocol, Q sends the query q to all selected content providers contained in a message RequestQ(id(S), ip(S), q). When a content provider Ci receives a message RequestQ containing q, it locally executes q on its own current document collection and creates a list Ri of result documents matching the query. This result list is returned to the query initiator Q using a RetResults(Ri, q) message. This is illustrated in step 2 of Figure 3.5.

- 46 Chapter 3 System Architecture and Protocols In the third and last step of the protocol (also illustrated in Figure 3.5, the query initiator Q merges all received local result lists Ri (e.g., using appropriate merging algorithms) to provide one combined global result list to the user. Section 6.2.2 gives some more details.

3.4 Comparison to Exact Information Filtering

This Section compares the MAPS architecture for approximate information filtering presented in this chapter with an exact IF approach named DHTrie as introduced in [TIK05b].

This comparison considers different characteristics of both approaches [TZWK07]. Table

3.1 gives an general overview concerning them. The following sections will investigate four major characteristics concerning DHTrie and MAPS in more detail: routing infrastructure, query placement, statistical information, and load balancing. Additional experimental issues (e.g., message traffic costs) are included in Chapter 4.

–  –  –

A crucial design decision in both approaches is the use of the distributed hash table (DHT) as the underlying routing infrastructure. Content-based filtering requires an efficient object location mechanism to support expressive query languages that capture the user’s specific interests. This renders Gnutella-style networks an inefficient solution and simple keyword functionality of standard DHTs an insufficient one.

To overcome this limitation of exact lookup, both approaches extend the DHT functionality to support richer data models and more expressive queries. However, to be able to efficiently support this functionality, changes and extensions in DHT protocols and data structures are necessary. DHTrie uses message grouping and extra routing tables to overcome inefficiencies, whereas MAPS uses batch posting of key summaries, piggy-backing of post messages to directory maintenance messages and decrease in the key space by using correlated multi-key sets to reduce network traffic (see Chapter 5).

3.4.2 Query Placement

Distributed publish/subscribe systems involve some sort of peer selection techniques to decide where to place a user query. This selection is critical, since future publications that may match this query should also be able to reach the peer storing it to trigger a notification in case of a match. Query placement in DHTrie is deterministic and it depends upon the keys contained in the query and the hash function provided by the DHT. To decide where a keyword query should be placed, all query keys are hashed and the query is forwarded to the peers responsible for these identifiers to ensure correctness at publication time. These query placement protocols lead to filtering performance that is exactly the same with that of a centralized system.

On the other hand, in MAPS only the most specialized and promising peers store a user query and are thus monitored. Publications produced by each peer are matched against its local query database only, since for scalability reasons no publication forwarding is used.

In this case recall is lower than that of DHTrie, but document-granularity dissemination to the network is avoided.

3.4.3 Statistical Information

Matching incoming documents with stored subscriptions involves maintenance and estimation of important global statistics such as the document frequency of a certain key, i.e., the number of distinct documents seen in the last interval that contain a specific keyword. In DHTrie this functionality is implicit; every time a new document is published by some peer, it reaches the peers responsible for the distinct keywords contained in the document, since these are the candidate peers that may store continuous queries that will potentially match the document. These peers do all the necessary bookkeeping of the statistical information, and can be queried for this when other peers need to compute a similarity score.

On the other hand, MAPS explicitly addresses this issue with the maintenance of a directory. Notice that in the case of DHTrie the statistics maintenance is a byproduct of the filtering process and it is only necessary for computing similarity scores, whereas in the MAPS case it is the cornerstone of the filtering process, since both peer selection (and thus query routing) and also similarity computation utilize it.

- 48 Chapter 3 System Architecture and Protocols 3.4.4 Load Balancing The effect of load imbalances between peers in the two approaches is different. DHTrie peers are more susceptible to load imbalances due to the nature of the subscription and publication protocol. Through the mapping of the DHT hash function, an overloaded peer or a peer with small processing capabilities may get responsible for a popular key, and thus be forced to store high volumes of user queries and process high numbers of publications.

DHTrie load balancing mechanisms are based on load-shedding and prove efficient even for highly skewed distributions.

On the other hand, MAPS has an intrinsic load balancing mechanism to cope with imbalances. Assuming that peers are willing to offer some of their resources to the community, a peer with resources to share soon becomes a hub peer for some topic and receives more subscriptions, whereas a peer with few resources will be forced to specialize more, and thus reduce the number of users that are interested in its publications. MAPS is more sensitive to filtering load balancing (i.e., load imposed by the filtering requests that need to be processed), since a directory peer responsible for a popular key will get high numbers of statistics retrieval requests. Finally, routing load (i.e., load imposed by the messages a peer has to forward due to the overlay maintenance protocols) is similar in both systems, and it mainly depends on the DHT usage.

3.5 Discussion This chapter presented the architecture with services and protocols of the MAPS approach for approximate publish/subscribe functionality in a structured P2P environment. While most information filtering approaches taken so far have the underlying hypothesis of potentially delivering notifications from every information producer, MAPS relaxes this assumption by monitoring only selected sources that are likely to publish documents relevant to the user interests in the future. A subscriber requesting a continuous query uses a distributed directory to collect metadata concerning publishers. Next chapter will focus on publisher peers selection approximate information filtering where a combination of resource selection and behavior prediction techniques is applied to the collected metadata such that the most promising publishers can be selected. The directory, subscription, publication and notification protocols regulate the interactions of peers in the P2P network.

For completeness, this chapter also provided the retrieval protocols for approximate onetime searching as presented in the Minerva P2P search architecture. The MAPS approach extends the Minerva system such that two functionalities (one-time and continuous searching) are available. In addition, this chapter compared the main characteristics of MAPS with an existing information filtering system called DHTrie realizing exact filtering functionality over P2P networks. This comparison highlighted the differences, advantages, and disadvantages of both architectural approaches. The next chapter will discuss the strategies to select the publisher peers that a subscriber should monitor.

–  –  –

Chapter 4 Publisher Peer Selection This chapter discusses appropriate strategies to select the publisher peers that a subscriber should monitor. Notice that in MAPS peer selection is a critical task since it may lead to poor filtering effectiveness [ZTB+ 08].

In Section 4.1, the peer selection problem in an IF setting is introduced. Then, Section

4.2 presents the main peer selection algorithm that combines resource selection and behavior prediction to improve filtering effectiveness. The proposed prediction function of MAPS uses time series analysis with double exponential smoothing (as described in Section to predict the future behavior of publisher peers. An extensive experimental evaluation in Section 4.3 shows the effectiveness and efficiency of the proposed algorithm. Section 4.4 develops an algorithm called MAPS Selective Method (MSM) to improve the prediction results without additional communication overhead, while Section 4.5 presents the benefits of the prediction improvement method MSM by analyzing several publishing scenarios.

Finally, Section 4.6 discusses and summarizes the results of this chapter.

4.1 Introduction

The previous chapter introduced the architecture for approximate information retrieval and information filtering including the appropriate services and protocols. This doctoral thesis does not focus on one-time search in structured P2P networks since there is a well-known understanding concerning query routing in the Minerva search architecture. More details about Minerva query routing can be found in Chapter 6 where the prototype implementation is explained. Here, approximate publish/subscribe will be presented in detail using the MAPS approach. The most critical task in MAPS is the selection of most promising publisher peers for a requested continuous query.

Pages:     | 1 |   ...   | 7 | 8 || 10 | 11 |   ...   | 25 |

Similar works:

«Geo-touristische Potenziale und Entwicklungsvorschläge in der Geopark-Region Ederbergland Im Steinbruch „Hohenäcker“ bei Frankenberg Geo-touristische Potenziale und Entwicklungsvorschläge in der Geopark-Region „Ederbergland“ Zusammenfassung und abschließende Empfehlungen Die Geopark-Region „Ederbergland“ ist in das Netzwerk des Nationalen Geoparks „GrenzWelten“ mit dem Leitmotto „Zwischen Kelten und früher Montanindustrie“ eingebunden. Zusammen mit anderen Anbietern...»

«Высшая аттестационная комиссия Министерства образования и науки Российской Федерации Перечень российских рецензируемых научных журналов, в которых должны быть опубликованы основные научные результаты диссертаций на соискание ученых степеней доктора и кандидата наук...»

«Civic Education and Charter Schools: Current Knowledge and Future Research Issues Summary of Findings Preparing young people to be good citizens has been a mission of public education since the early days of the nation. In recent years, as schools have shifted more attention to English language arts and mathematics, several groups have made a plea for renewed attention to civic education for all students. One such group is the Spencer Foundation, which promotes research to improve students’...»

«WORKING PAPER 1-12 Federaal Planbureau Economische analyses en vooruitzichten De elasticiteit van de personenbelasting Prospectieve macro-economische benadering van de nationale elasticiteit en van de elasticiteit van een geregionaliseerde personenbelasting Maart 2012 Vincent Frogneux, Michel Saintrain Kunstlaan 47-49 1000 Brussel E-mail: contact@plan.be http://www.plan.be Het Federaal Planbureau Het Federaal Planbureau (FPB) is een instelling van openbaar nut. Het FPB voert beleidsrelevant...»

«Spiritual and Social Trends and Patterns in the Christian Reformed Church in North America A report on the CRCNA 150th Anniversary Survey, 2007-2008 Rodger Rice, Neil Carlson and Christina Vanden Bosch Foreword by Gerard L. Dykstra September 2009 Prepared for the Christian Reformed Church in North America CENTER FOR SOCIAL RESEARCH A C E N T E R O F C A LV I N C O L L E G E This report is based on research conducted by the Calvin College Center for Social Research with funding and direction...»

«The Nanotube (NT) Conference Series Since their conception in 1999, the NT Conferences attempt to provide an informal setting to exchange the most current information in the rapidly evolving nanotube research field. During the initial years, the number of registered participants exceeded the planned capacity by a factor of two. This was manageable at NT'99 in East Lansing, with 120 instead of the planned 60 participants. The following conference, NT'01, was planned to accommodate 140...»

«Bulletin of the International Society of Soil Science de l'Association Internationale de la Science du Sol der Internationalen Bodenkundlichen Gesellschaft •fr de la Sociedad Internacional de la Ciencia del Suelo No. 79 1991/1 ISSN: 0374-0447. Edited and published by/rédigé el publié par/redigiert und publiziert von: INTERNATIONAL SOCIETY OF SOIL SCIENCE ASSOCIATION INTERNATIONALE DE LA SCIENCE DU SOL INTERNATIONALE BODENKUNDLICHE GESELLSCHAFT Founded/Fondée/Gegründet: 19-05-1924....»

«Alexander Steiger Elektor-Verlag GmbH Internationaler R8C-Design-Wettbewerb steigeralex@gmx.de Elektronik-Projekt Alexander Steiger MusicTree August 2006 Zusammenfassung Elektronik muss nicht immer so kompliziert sein. Besonders für Anfänger ist der Einstieg in die Mikrocontroller-Welt sehr schwer. Ohne Spass an der Sache wird sich der Einstieg in diese verborgene Welt sehr zeitraubend und sogar deprimierend gestalten. Das R8C-Board, das im Heft 12/2005 von Elektor zum ersten Mal vorgestellt...»

«22 Rahmenprogramm zur Förderung der empirischen Bildungsforschung Framework Programme for the Promotion of Empirical Educational Research Bildungsforschung Band 22 Rahmenprogramm zur Förderung der empirischen Bildungsforschung Framework Programme for the Promotion of Educational Research Impressum Herausgeber Bundesministerium für Bildung und Forschung (BMBF) Referat Bildungsforschung 11055 Berlin Bestellungen Schriftlich an den Herausgeber Postfach 30 02 35 53182 Bonn oder per Tel.:...»

«JUAN AGUERO : WASSER-FUEL MOTOR Patentanmeldung EP0405919 1. Februar 1991 Erfinder: Juan C. Aguero WASSER ANGETRIEBENE VERBRENNUNGSMOTOR-SYSTEM Bitte beachten Sie, dass dies ein Wieder formuliert Auszug aus dieser Patentanmeldung. Es wird ein Verfahren beschrieben, das behauptet wird, kann zur Steuerung einer Brennkraftmaschine aus einem Gemisch aus Wasserdampf und Wasserstoffgas. ZUSAMMENFASSUNG Dies ist ein Energietransformationssystems für Fahren, zum Beispiel ein Verbrennungsmotor, der...»

«1 Bulletin No. 119 September 2013 The first train from Salisbury to Umtali, photographed at “BBY” Station on 22 May 1899. Does anyone know where “BBY” Station was? The locomotive is Mashonaland Railways No. 1, named “Cecil J. Rhodes”. It is an ex Cape Government Railways 4th class converted Joys. Although not clear, in the photo, the tender was filled with wood. The three coaches are a mystery. They have an unusual roofline and are unlike any other coaches in Southern Africa. Can...»

«Responding to high risk cases of family and domestic violence: Guidelines for multi-agency case management SAFETY 1 in 4 ch en are wom ildr l na & en igi 1 in 2 A or bo s 35t Ab ime chil rigin dre a nl Ch ar e to ly to b ild e FDV expo like alised founr Chil referrals % ren du se 7 o incddto by po FDttend e or spit FDV childho ring d a Ve m o to od we he ut ioences lice experoenen –  –  – Government of Western Australia Department for Child Protection and Family Support Published...»

<<  HOME   |    CONTACTS
2016 www.abstract.xlibx.info - Free e-library - Abstract, dissertation, book

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.