WWW.ABSTRACT.XLIBX.INFO
FREE ELECTRONIC LIBRARY - Abstract, dissertation, book
 
<< HOME
CONTACTS



Pages:     | 1 |   ...   | 6 | 7 || 9 | 10 |   ...   | 25 |

«Approximate Information Filtering in Structured Peer-to-Peer Networks Christian Zimmer Max-Planck Institute for Informatics Saarbrücken ...»

-- [ Page 8 ] --

All the distributed pub/sub systems described above involve some sort of resource selection technique to decide where to index a user query. This selection is critical, since future publications that may match this query should also be able to reach the peer storing it to trigger a notification in case of a match. Query placement, as implemented in exact information filtering approaches such as [TX03, TIK05b], is deterministic, and depends upon the keys contained in the query and the hash function provided by the DHT. These query placement protocols lead to filtering effectiveness that is exactly the same as that of a centralized system. Compared to a centralized approach, [TX03, TIK05b] exhibit scalability, fault-tolerance, and load balancing at the expense of high message traffic at publication time.

In MAPS, only the most promising peer store a user continuous query and are thus monitored by the requestor. Publications produced by each peer are matched against its local query database only since, for scalability reasons, no publication forwarding is used.

Thus, in the case of approximate filtering, the recall achieved is lower than that of exact filtering, but document-granularity dissemination to the network is avoided. This improves scalability and system efficiency since, in a typical publish/subscribe setting, the rate of publications is expected to be high. In the case of exact matching, the network cost (and thus system performance) is directly dependent on this rate, whereas in MAPS approach it only triggers more local peer computations.

3.2 Services The architecture of MAPS distinguishes the following three types of services: directory service, publication service, and subscription service. Peers in the MAPS network can implement any (or all) types of these services. Figure 3.1 shows an high-level overview of the service architecture including information producers, information consumers, and the P2P network.

- 40 Chapter 3 System Architecture and Protocols Peers implementing the publication service are utilized by users to publish content to the network. This content may be locally created, e.g., from a Web server or an underlying content management system (CMS), or it may be gathered from the Web with a (possibly thematically focused) crawler, e.g., the BINGO! system [STSW02]. Additionally, other content providers such as digital libraries (DLs) or publishing houses (e.g., ACM, Elsevier, etc.) may also utilize publisher peers to make metadata about their resources available to the system. This setting also expects to see observer modules (as in [FFS+ 01]) for information sources that do not provide their own alerting service. These modules will query the sources for new material in a scheduled manner and inform subscribers accordingly. All this information flow will be filtered and redirected to users according to their submitted queries, by making use of different types of network peers. In this setting, information production, consumption, and flow are highly decentralized and asynchronous, making a P2P architecture a natural approach. In the following, the three service types are presented. The corresponding protocols implementing the services are shown in Section 3.3.

In addition, the one-time search service not included in Figure 3.1 completes the whole set of services. This service is implemented by all peers in the network to perform one-time searching as in the Minerva system.

3.2.1 Directory Service All peers participating in the MAPS network implement the directory service. This service provides the DHT-based routing infrastructure and is responsible for the maintenance of a distributed index storing statistics for both document and query keys. This index forms a conceptually global, but physically distributed directory, which is layered on top of a Chord-style DHT [RFH+ 01], and manages aggregated information about each peer’s local knowledge in compact form, similarly to the directory utilized in [BMT+ 05a].

The DHT partitions the key space, such that every peer is responsible for the statistics of a randomized subset of keys within the global directory. To keep IR statistics up-to-date, each peer distributes per-key summaries of its local index along with contact information to the global directory. For efficiency reasons, these messages are piggy-backed to DHT maintenance messages and batching strategies are used. A balanced key distribution among the peers forming the directory is ensured by the DHT hash function. Additionally, to reduce the key space and also improve recall, MAPS exploits correlations among query keys and considers multi-key statistics. Chapter 5 presents two algorithms that cope with correlation awareness in MAPS, and are suited for approximate IF environments. The DHT determines which peer is responsible for collecting statistics for a specific key set. This peer maintains statistics and pointers to other peers that publish documents containing this key.

The appropriate protocols are described in Section 3.3.1.

3.2.2 Subscription Service The subscription service is implemented by peers that want to monitor specific information producers to get notifications for new published documents. The subscription service is critical to the recall that will be achieved at filtering time, since it is responsible for selecting the appropriate peers that will index the query. This peer selection procedure utilizes the directory service to discover and retrieve peer statistics that will guide query indexing.





Once these statistics are retrieved, a ranking of the potential sources is performed and the user query is sent to the top-k ranked publishers. Only these publishers will be monitored for new publications, and since their publication behavior may not be consistent over time, query repositioning is necessary to achieve higher recall.

- 41 Chapter 3 System Architecture and Protocols

In the MAPS architecture, publications and subscriptions could be expressed using any appropriate IR model (e.g., Boolean, VSM, or LSI ). Because, the focus is not on the data model itself, it is assumed, for simplicity, that published documents and subscriptions are sets of words, and the Boolean model is used to decide when a document matches an active subscription. The protocols that define the behavior of subscriber peers are explained in detail in Section 3.3.2.

3.2.3 Publication Service The publication service can be used by users that want to expose their content to the MAPS network (e.g., by using a thematically focused Web crawler that locates and publishes documents on their behalf). A publisher peer utilizes the directory service to update statistics about the keys contained in the documents it publishes. Publisher peers are also responsible for storing continuous queries submitted by the users and matching them against their own publications. All queries that match a publication produce appropriate notifications to interested subscribers. Details of the publication and notification protocols are presented in Section 3.3.3.

3.2.4 One-Time Search Service Finally, the one-time search service can be used by users that want to start a one-time query to receive the best matching results currently available in the P2P system. As specified in Minerva, this service utilizes the directory service to select the most promising information providers for the requested one-time query. Having selected the top-k content providers, the multi-key query is send to these peers that locally compute the matching query results.

So, this service also covers the local search engine functionality of peers that return the best query results to the requestor. The last step merges all retrieved query results to one combined global result presented to the user. Details of the one-time search protocol are explained in Section 3.3.4.

3.3 Protocols Having introduced the service architecture of MAPS, next, the main protocols – including the directory protocol (Section 3.3.1), the subscription protocol (Section 3.3.2), and the publication and notification protocol (Section 3.3.3) – are presented. Here, the thesis also presents the IR relevant one-time search protocol (Section 3.3.4).

3.3.1 Directory Protocol The directory service provides the DHT-based routing infrastructure and is responsible for the maintenance of a distributed index storing statistics about document keys. This index forms a conceptually global, but physically distributed directory, which is layered on top of a Chord-style DHT [SMK+ 01], and manages aggregated information about each peer’s local knowledge in compact form, similarly to [BMT+ 05a]. The DHT partitions the key space, such that every peer is responsible for the statistics of a randomized subset of keys within the directory. To keep IR statistics up-to-date, each peer distributes per-key summaries of its local index along with contact information to the directory. For efficiency reasons, these messages are piggy-backed to DHT maintenance messages and batching strategies are applied.

–  –  –

Figure 3.2: The MAPS Directory Protocol.

To facilitate message sending between peers, MAPS will use the function send (msg, I) to send the message msg to the peer responsible for identifier I. Function send() is similar to the Chord function lookup(I) [SMK+ 01], and costs O(log n) overlay hops for a network of n peers (see lookup problem in Section 2.1.1). In MAPS, every publisher peer uses Post messages to distribute per-key statistics. This information is periodically updated (e.g., every k time units or every k publications) by the peer, in order to keep the directory information as up-to-date as possible.

Publisher peer P updates the global directory as follows: Let K = {k1, k2,..., kl } denote the set of all keys contained in all document publications of P occurring after the last directory update. For each key ki, where 1 ≤ i ≤ l, P computes the maximum frequency of max occurrence of key ti within the documents contained in P ’s collection (tfki ), the number of documents in the document collection of P that ki is contained in (dfki ), and the size of the document collection cs. Having collected the statistics for key ti, P creates message max Post(id(P ), ip(P ), tfki, dfki, cs, ki ), where id(P ) is the identifier of peer P and ip(P ) is the IP address of P. P then uses function send() to forward the message to the directory peer responsible for identifier H(ki ) (i.e., the peer responsible for maintaining statistics for key ki ). Once a peer D receives a Post message, it stores the statistics for P in its local statistics database to keep them available on request for any peer.

Figure 3.2 illustrates the directory protocol where a peer distributes statistics (Post messages) to directory peers.

Notice that the directory service does not have to use Chord or any other DHT; the MAPS architecture allows for the usage of any type of P2P network (structured or unstructured), given that the necessary information (i.e., the per-peer IR statistics) is made available through appropriate protocols to the rest of the services.

3.3.2 Subscription Protocol The subscription protocol of MAPS is used by subscriber peers to monitor specific information producers (publisher peers) for future matching publications. It is assumed that a subscriber peer S wants to subscribe with a multi-key or multi-keyword query q of the form k1 k2... kk with t distinct keys.

–  –  –

To do so, S needs to determine which publisher peers in the network are promising candidates to satisfy the continuous query with appropriate documents published in the future. This source ranking can be decided once appropriate statistics about data sources are collected from the directory, and a ranking of these sources is calculated based on the peer selection strategy described in the following Chapter 4 based on resource selection and behavior prediction.

To collect statistics about data sources, S needs to contact all directory peers responsible for the query keys. Thus, for each query key ki, S computes H(ki ), which is the identifier of the peer responsible for storing statistics about other peers that publish documents containing the key ki. Similar to the directory protocol, function H() denotes the Chord hash function assigning keys to peer identifiers. Subsequently, S creates message CollectStats(id(S), ip(S), ki ), and uses the function send() to forward the message in O(log n) hops to the peer responsible for identifier H(ki ). Notice that the message contains ip(S), so its recipient can directly contact S without using the DHT routing facility.

When a peer D receives a CollectStats message asking for the statistics of key ki, it searches its local store of Post messages to retrieve the peer list Li of all posts of the key.

Subsequently, a message RetStats(Li, ki ) is created by D and sent to S using its IP found in the CollectStats message. This collection of statistics is shown in step 1 of Figure 3.3, where S contacts directory peers D1 and D4. Once S has collected all the peer lists Li for the keys contained in q, it utilizes an appropriate scoring function score(P, q) to compute a peer score with respect to q, for each one of the peers P contained in Li. Based on the score calculated for each peer, a ranking of peers is determined and the highest ranked peers are candidates for storing q.

Once the peers have been ranked, S selects the highest ranked peers that will index q. Thus, only publications occurring in those peers will be matched against q and create appropriate notifications. Peers publishing documents relevant to q, but not indexing q, will not produce any notification, simply because they are not aware of q. Since only selected peers are monitored for publications, the peer ranking function becomes a critical component, which will determine the final recall achieved.

–  –  –

Figure 3.4: The MAPS Publication & Notification Protocol.



Pages:     | 1 |   ...   | 6 | 7 || 9 | 10 |   ...   | 25 |


Similar works:

«S. 331-353 Wien, Dezember 1984 Mitt, österr. geol. Ges. 3 Abb., 5 Taf. Palynostratigraphische Untersuchung eines Rhät/Lias-Profils am Fonsjoch, Achensee (Nördliche Kalkalpen, Österreich) Von Uta KARLE ::) Mit 3 Abbildungen und 5 Tafeln Kurzfassung Bei der palynologischen Bearbeitung des Rhät/Lias-Profils am Fonsjoch (Achensee) zeigte sich, daß die Basis des Jura auch hier schon unterhalb des ersten Auftretens von Psiloceras planorbis anzunehmen ist. Die oberste Mergelbank der Kössener...»

«Гинекология Handbook of Gynaecology Management SYLVIA K. ROSEVEAR MD, FRCOG Consultant Obstetrician and Gynaecologist b Blackwell Science СИЛЬВИЯ К. РОУЗВИА Гинекология Справочник практического врача Перевод с английского Под общей редакцией акад. РАМН Э.К.Айламазяна 2 е издание Москва «МЕДпресс информ» УДК 618.1 ББК 57.1 Р79 Все...»

«Führen im winterlichen Hochgebirge Zusammenfassung der Gesprächsergebnisse der Expertenrunde auf der Jamtalhütte vom 22.09.-24.09.2000 Endgültige Fassung 11.12.2000 Teilnehmer der Expertenrunde: ♦ DAV Summit Club: Schweiz: Karl Schrag ♦ Günter Sturm ♦ Werner Munter ♦ Günther Härter ♦ Emanuel Wassermann Österreich: ♦ Dr. Karl Gabl ♦ Manfred Lorenz Deutschland: Südtirol: ♦ Dr. Stefan Beulke ♦ Othmar Zingerle ♦ Franz Kröll ♦ Martin Engler ♦ Peter Geyer 1....»

«UNIVERSIDADE FEDERAL DE SANTA CATARINA DEPARTAMENTO DE LÍNGUA E LITERATURA ESTRANGEIRAS SKEPTICISM AND HUMANISM IN FORSTER'S TREATMENT OF PERSONAL RELATIONS Tese submetida à Universidade Federal de Santa Catarina para obtenção do grau de MESTRE EM LETRAS Sandia Sirangelo Maggio Florianópolis, S.C. Dezembro de 1981 Esta tese foi julgada adequada para a obtenção do grau de — MESTRE EM LETRAS — opção Inglês e Literatura Correspondente e aprovada em sua forma final pelo Programa de...»

«Bericht über das 1. Halbjahr 2015 Jänner bis Juni Halbjahresfinanzbericht zum 30. Juni 2015 Die Baumaßnahmen am Standort Wien erreichen finale Phase Leichter Umsatzzuwachs im Vergleich zum Vorjahr Wesentliche Entwicklungen • Umsatzzahlen im ersten Halbjahr 1,8% über den • Außerordentliche Effekte bestimmen das Vorjahreswerten. Halbjahresergebnis.• Das Standortprojekt, sowie die Maßnahmen in • Rohstoffkostenniveau wird auch für die Folge des Gebäudeeinsturzes schreiten nach Plan...»

«RHINO RESOURCE CENTER www.rhinoresourcecenter.com NEWSLETTER #41 DECEMBER 2015 Dear colleagues and friends, This is the 41st issue of the quarterly e-newsletter of the Rhino Resource Center. Edited by Dr Kees Rookmaaker. The total number of references in the collection of the RRC now stands at 19,850. This is an increase of 214 items in the last quarter. SUPPORT the RRC Over 19,540 references are available as PDF on RRC website CLICK ON RHINO IN THIS ISSUE: The RRC has moved p.2 Newsletter...»

«Aurel Croissant, Teresa Schächter Demokratiestrukturen in Asien Der Beitrag untersucht die Demokratiestrukturen in acht asiatischen Transformationsstaaten. Ausgangspunkt ist Lijpharts Unterscheidung von Mehrheitsund Konsensdemokratie. Die Studie zeigt, dass weder Lijpharts zweidimensionales Demokratiemuster, noch ein alternatives Muster in Asien existiert. Die Untersuchung möglicher Erklärungen für diesen Befund stützt generelle Zweifel an der Übertragbarkeit der Lijphart’schen Annahmen...»

«TRACEABILITY OF REQUIREMENTS AND SOFTWARE ARCHITECTURE FOR CHANGE MANAGEMENT ARDA GÖKNIL Ph.D. dissertation committee: Chairman and secretary: Prof. Dr. Ir. Anton J. Mouthaan, University of Twente, The Netherlands Promoter: Prof. Dr. Ir. Mehmet Akşit, University of Twente, The Netherlands Assistant promoter: Dr. Ivan Kurtev, University of Twente, The Netherlands Members: Prof. Dr. Roel Wieringa, University of Twente, The Netherlands Prof. Dr. Ir. Arend Rensink, University of Twente, The...»

«AJAX and PHP Building Responsive Web Applications Cristian Darie, Bogdan Brinzarea, Filip Chereches Tosa, Mihai Bucica AJAX Ë PHP Разработка динамических веб приложений Кристиан Дари, Богдан Бринзаре, Филип Черчез Тоза, Михай Бусика Санкт Петербург–Москва Серия «High tech» Кристиан Дари, Богдан Бринзаре, Филип Черчез Тоза, Михай...»

«“An O without a figure”: King Lear and the Mask of the Fool A Thesis Submitted to the College of Graduate Studies and Research in Partial Fulfillment of the Requirements for the Degree of Master of Arts in the Department of English University of Saskatchewan Saskatoon By Miranda Jane Traub © Copyright Miranda Jane Traub, September 2004. All rights reserved. PERMISSION TO USE In presenting this thesis in partial fulfilment of the requirements for a Postgraduate degree from the University of...»

«Azerbaijan Annual Report 2014 MAAAZ002 30 April 2015 This report covers the period from 1 January 2014 to 31 December 2014. Volunteers of the Azerbaijan Red Crescent participated at the 15th World Youth Forum organized in Casablanca city of the Moroccan Kingdom Photo: AzRC Overview All programmes and activities of the Azerbaijan Red Crescent Society (AzRC) are aligned with and contribute to the IFRC`s Strategy 2020 aiming to improve the lives and to alleviate the sufferings of the most...»

«ÇUKUROVA UNIVERSITY INSTITUTE OF NATURAL AND APPLIED SCIENCES MSc THESIS Esra MAHSERECİ KARABULUT A RESEARCH ON PERFORMANCE OF DECISION SUPPORT SYSTEMS IN DIAGNOSIS OF CORONARY ARTERY DISEASE DEPARTMENT OF ELECTRICAL AND ELECTRONICS ENGINEERING ADANA, 2012 ÇUKUROVA UNIVERSITY INSTITUTE OF NATURAL AND APPLIED SCIENCES A RESEARCH ON PERFORMANCE OF DECISION SUPPORT SYSTEMS IN DIAGNOSIS OF CORONARY ARTERY DISEASE Esra MAHSERECİ KARABULUT MSc THESIS DEPARTMENT OF ELECTRICAL AND ELECTRONICS...»





 
<<  HOME   |    CONTACTS
2016 www.abstract.xlibx.info - Free e-library - Abstract, dissertation, book

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.