FREE ELECTRONIC LIBRARY - Abstract, dissertation, book

Pages:     | 1 |   ...   | 20 | 21 || 23 | 24 |   ...   | 25 |

«Approximate Information Filtering in Structured Peer-to-Peer Networks Christian Zimmer Max-Planck Institute for Informatics Saarbrücken ...»

-- [ Page 22 ] --

The main challenge addressed in this thesis was to exploit P2P technology for efficient and approximate information filtering. While there exist several approaches to perform exact information filtering in P2P environments, the work in this thesis emphasized a novel architecture to support content-based approximate information filtering. Most information filtering approaches taken so far have the underlying hypothesis of potentially delivering notifications from all information producers. This exact pub/sub functionality has proven expensive for such distributed environments. The approximate approach in this thesis relaxed this assumption by monitoring only selected sources that are likely to publish documents relevant to the user interests in the future. Since in an IF scenario the data is originally highly distributed residing on millions of sites (e.g., with people contributing to blogs), a P2P approach seems an ideal candidate for such a setting.

1 This chapter focuses on the most important open research questions.

- 137 Chapter 8 Conclusion and Open Questions

The system presented here offers the ability to identify the top-k most appropriate publisher peers for a given continuous query, i.e., those publishers that are expected to provide the best documents matching it in the future. In this thesis, this task was referred to as publisher peer selection. To select this set of peers, this thesis introduced new strategies based on already known resource selection techniques in combination with new behavior prediction techniques. These new techniques utilize a distributed directory to predict future publishing behavior of peers by applying prediction techniques on time series of IR statistics.

8.2 Contributions The present thesis gives the following four main contributions that have been discussed in

detail in the previous chapters:

8.2.1 Approximate Information Filtering One of the main contributions of this thesis was to introduce the novel MAPS architecture to support content-based approximate information filtering in P2P environments. Most IF approaches so far have the underlying hypothesis of potentially delivering notifications from every information producer. Contrary, MAPS relaxes this assumption and monitors only selected sources that are likely to publish documents relevant to the user interests in the future. In MAPS, a user subscribes with a continuous query and only published documents from these monitored sources are forwarded to him. The system provides a network-agnostic P2P architecture with different services and its related protocols (directory, subscription, publication, and notification protocol) for supporting approximate IF functionality in a distributed P2P environment. It is the first approach that looks into the problem of approximate IF in such a setting by exploiting metadata stored in a conceptually global, but physically distributed directory. The most critical task in approximate IF is the selection of appropriate publisher peers to meet the information demand in the future. Therefore, MAPS combines existing resource selection techniques with new behavior prediction strategies. The thesis showed that existing resource selection approaches (also referred to as collection or database selection) are not sufficient in a dynamic filtering setting since resource selection can only determine appropriate authorities that have already published matching documents in the past.

A novel method aiming at behavior prediction of publisher peers complements the peer selection strategy by applying prediction techniques to time series of IR statistics. So, MAPS introduces research of time series analysis to P2P information filtering environments. The experimental evaluation of MAPS approved the effectiveness and efficiency in several settings using real Web data. Various publishing behaviors have been investigated to conclude that only the combination resource selection and behavior prediction allows to improve recall, while monitoring only a small number of publishers.

Finally, the aforementioned prediction method was further enhanced by an automatic parameter tuning component. The MAPS Selective Method introduces an automatic method to tune prediction parameters by utilizing known values to adjust the parameters used in the time-series analysis. This is achieved at no additional communication cost, and is a local task performed at each peer individually. To demonstrate the potential gaining from the usage of MAPS, the thesis compared it against an existing exact information filtering approach in the P2P setting and studied several major characteristics (e.g., load balancing issues, query placement, or routing infrastructure).

- 138 Chapter 8 Conclusion and Open Questions 8.2.2 Correlation Awareness The second contribution of this thesis covers publisher peer selection for a given continuous query with multiple keywords. For scalability reasons, MAPS’s summaries are stored in the distributed directory and have publisher (rather than document) granularity, thus capturing the best publisher for certain keys. This, together with per-key organization of the directory that disregards keyword correlations (also referred to as correlated key sets) are two of the basic reasons that may lead to low recall. However, considering statistics for all possible key sets is clearly not possible due to the explosion in the feature space. In the baseline approach, a continuous query is decomposed into individual keys, and the statistics from the directory are used to compute a combined score for each publisher. This score would represent the probability of each source to publish documents matching the information demand in the near future. This approach may lead to poor filtering quality as the topranked publishers for the complete query may not be among the top selected publishers. In the worst case, a selected publisher may deliver many documents for each single keyword, but no single document matching all keywords, since this information is not present in the directory.

To overcome his problem, this thesis introduced two approaches that use correlations among keys to improve filtering quality: (i) the USS (Unique Synopses Storage) algorithm that uses existing single-key synopses stored in the directory to estimate the publishing behavior of information sources for key sets, and (ii) the CSS (Combined Synopses Storage) that enhances the directory to explicitly maintain statistical metadata about selectively chosen key sets. Contrary to distributed IR settings for one-time searching where sources are ranked according to their document collections (i.e., using resource selection strategies), in approximate IF the publishers are ranked according to their probability to publish relevant documents in the near future, which poses different requirements for maintaining metadata. This thesis presented the first work to develop algorithms for exploiting keyword correlations in such a dynamic IF setting. Existing and self-limited approaches for two-key queries have been extended to the case of multi-key continuous queries for an arbitrary number of keys. Beyond that, new algorithms to approximate multi-key statistics by combining the statistics of arbitrary subsets have been provided. Hash sketches have been used in similar algorithms with an IR emphasis to compactly represent the documents. This choice has however yielded inaccurate results when considering continuous queries with more than two keys. For this reason the usage of very recent state-of-the-art techniques (KMV synopses) for compact representation of multisets is proposed and applied to an IF setting. These new structures allow the system to compute accurate synopses for multikey queries, and improve the filtering effectiveness. The experimental evaluation of both algorithms illustrated the filtering performance improvements in comparison to the basic MAPS approach. All experimental series used two different real-world collections for Web and blog data, and applied real-world queries from Google Zeitgeist. The evaluation also investigated filtering performance gains depending on the introduced correlation measure (conditional probability) representing a way to compute the relatedness among keys.

8.2.3 Prototype System In addition to the architecture and algorithms for approximate information filtering presented above, the thesis has also developed a prototype implementation of the approximate information filtering approach of MAPS. This implementation was meant to serve as a testing environment for conducted experiments, but also serve as a prototype that is able to demonstrate the applicability of the proposed techniques.

- 139 Chapter 8 Conclusion and Open Questions

The approximate information filtering approach introduced in this thesis has been integrated into the Minerva search prototype [BMT+ 05b] such that Minerva provides an approximate publish/subscribe functionality in addition. In this regard, the implementation aspects concerning the extension of Minerva have been investigated and new components to support IF in addition to IR have been implemented and integrated into the existing prototype. An extensive showcase has been used to describe the application of the extended Minerva prototype by executing sample one-time and continuous queries in detail.

8.2.4 Digital Library Use Case Finally, the thesis presented a digital library use case using the MinervaDL architecture as an application scenario to support approximate information retrieval and filtering functionality in a single unifying framework. MinervaDL is hierarchical and utilizes a distributed hash table to achieve scalability, fault-tolerance, and robustness in its routing layer.

The MinervaDL architecture allows handling huge amounts of data provided by DLs in a distributed and self-organizing way, and provides an infrastructure for creating large networks of digital libraries with minimal administration costs. There are two kinds of basic functionality that are offered in the DL architecture of MinervaDL: (i) an information retrieval scenario (or one-time querying, where a user poses an one-time query and the system returns all resources matching the query (e.g., all currently available documents relevant to the requested query), and (ii) an information filtering scenario (or publish/subscribe or continuous querying), where a user submits a continuous query (or subscription) and waits to be notified from the system about certain events of interest that take place (i.e., about newly published documents relevant to the continuous query).

The proposed DL architecture is built upon a distributed directory storing metadata.

The thesis presents routing protocols for information filtering and retrieval that use this directory information to perform the two functionalities. MinervaDL introduces three main components. Super-peers run the DHT protocol and form a distributed directory maintaining metadata about providers’ local knowledge in compact form. Consumers are utilized by users to connect to the MinervaDL network, using a single super-peer as their access point. Providers are implemented by information sources that want to expose their content to the MinervaDL network. Typical examples of provider peers are digital libraries of large institutions, like research centers or content providers.

Finally, this thesis presented different scoring functions to select appropriate provider peers answering a one-time query or storing a continuous query for future matching publications. The experimental evaluation investigated the influence of the individual scoring functions used to perform the two functionalities of MinervaDL. Furthermore, MinervaDL was compared to another DL architecture for P2P networks (LibraRing) providing retrieval and filtering at the same time. Unlike MinervaDL, this architecture provides exact retrieval and filtering functionality.

8.3 Open Questions In this section, a list of interesting open problems for approximate information filtering is presented. The topics discussed are relevant not only within the scope of this thesis, but are also interesting in the context of P2P data management in general. The discussion will focus on topics relevant to the efficiency and effectiveness of data management, and will not go into security or privacy issues that, although of high importance, are independent of the proposed solutions.

–  –  –

Thus, the remainder of this section will discuss open questions in the fields of load balancing, dynamics, economics, and replication & caching.

8.3.1 Load Balancing The first open problem related to the thesis is load balancing in a distributed P2P setting.

Load balancing can be defined as distributing processing and communication activity evenly across a computer network so that no single device is overwhelmed. In a P2P system, the data load of a peer is usually determined by the amount of stored data items per peer. The total data load of the P2P system is defined as the sum of the loads of all peers participating in the system. To talk about a load balanced network, the load of a peer should be around 1/n of the total load when considering a network of n peers. A peer is referred to as overloaded or heavy if it has a significantly higher load compared to the load it would have in a load balanced case. Similarly, a peer is light if it has significantly less load than the load in the load balanced case.

As presented in the previous chapters of this thesis, the MAPS approach utilizes distributed hash tables (e.g., Chord [SMK+ 01] or Pastry [RD01a]) as underlying infrastructure thus exploiting existing work on load balancing techniques for this kind of overlay network. The literature distinguishes two particular problems related to load balancing in

distributed hash tables:

Pages:     | 1 |   ...   | 20 | 21 || 23 | 24 |   ...   | 25 |

Similar works:

«Discovering an Integral Civic Consciousness in a Global Age Global Problems, Global Governance, and Denial John M. Bunzl ABSTRACT This article asks why, in an age of global crisis, global governance still remains a low priority for the integral community. It posits a civic line of development, suggesting only those possessing a world-centric level of civic awareness can fully comprehend global problems and the need for binding global governance. I argue that modern (orange altitude), postmodern...»

«Ellis Island Tie In Edition It products examining to read this live course with notification Lupin foreclosure basis, control people, person neighborhood, and harmful you need Ellis Island: Tie-in Edition used Ellis Island: Tie-in Edition that your information and in drink. At you are two otherwise he clearly do some many life of the free amount credit. In a visit is, all the inquiry would do to be many members. Although Service tracks a visit so Convenience, and is, them could be categorized...»

«Beitr. naturk. Forsch. SüdwDtl. Band 36 S.18Si26 Karlsruhe, 1. 12. 1977 Die palaearktischen Agdistis -Arten (Lepidoptera, Pterophoridae) von ERNST ARENBERGER, Wien Mit 1 Abb. u. Tafeln 1-XIV Nachdem die Literatur über die Familie Pterophoridae durch kurze Einzelpublikationen in verschiedenen Zeitschriften sehr verstreut und schwer zu überblicken ist, erscheint es notwendig, eine Zusammenfassung und taxonomische Darstellung der einzelnen Gattungen zu geben. Besonders wichtig ist eine Revision...»

«PAUL SCOTT EDUCATION AND QUALIFICATIONS Ph.D., Department of French, Durham University, UK (dissertation title: “The Martyr-Figure in French Theatre, 1596–1675”). Examiners: Ann Moss (Durham) and Henry Phillips (Manchester) M.A. (with Distinction), Centre for Seventeenth-Century Studies, Durham University B.A. (with Honors) Modern European Languages, Durham University Diploma in French, Institute of Linguists ACADEMIC POSITIONS Associate Professor with Tenure, Department of French &...»

«EURYD ICE Education, Audiovisual & Culture Executive Agency Integrating Immigrant Children into Schools in Europe Measures to foster: – Communication with immigrant families – Heritage language teaching for immigrant children April 2009 European Commission Integrating Immigrant Children into Schools in Europe Measures to foster: – Communication with immigrant families – Heritage language teaching for immigrant children April 2009 Eurydice network This document is published by the...»

«General Terms and Conditions The below terms and conditions apply to all Air India flights General Information While compiling this information, Air India has endeavored to ensure that all information is correct. However, no guarantee or representation is made to the accuracy or completeness of the information contained here. This information is subject to changes by Air India without notice. Class of Service Air India offers three classes of service: First Class Executive Class Economy First...»

«VERNEHMLASSUNGSBERICHT DER REGIERUNG ÜBER DIE NEUFASSUNG DES BAUGESETZES Ressort Bauwesen Vernehmlassungsfrist: 31. März 2004 INHALTSVERZEICHNIS Seite Zusammenfassung 3 Betroffene Amtsstellen und Institutionen 4 1. Ausgangslage 5 1.1 Anlass 5 1.2 Zielsetzung des Baugesetzes 8 1.3 Vorgehen 11 2. Schwerpunkte der Gesetzesvorlage 14 2.1 Planungsrechtliche Bestimmungen 14 2.2 Baurechtliche und bautechnische Bestimmungen 15 2.3 Baurechtliches Verfahren 18 2.4 Vollzugsbestimmungen 22 3....»

«Das Wichtigste in Kürze Liebe Leserin, lieber Leser, zunächst verweisen wir auf die Beilage zum Mehrwertsteuer-Paket, die wir als Anlage zu diesem USt-Infoletter versenden. Darin sind die Gesetzesänderungen zum 01.01.2010 noch einmal kurz und prägnant zusammengefasst. Auch berücksichtigt wurde der Entwurf eines „Gesetzes zur Umsetzung steuerrechtlicher EU-Vorgaben sowie weiterer steuerrechtlicher Regelungen“, das zum 01.07.2010 in Kraft treten soll und einige handwerkliche...»

«Article Yeah, that’s it!: Verbal Reference to Visual Information in Film Texts and Film Translations Nicole Baumgarten Meta : journal des traducteurs / Meta: Translators' Journal, vol. 53, n° 1, 2008, p. 6-25.Pour citer cet article, utiliser l'information suivante : URI: http://id.erudit.org/iderudit/017971ar DOI: 10.7202/017971ar Note : les règles d'écriture des références bibliographiques peuvent varier selon les différents domaines du savoir. Ce document est protégé par la loi...»

«IMPETUS Westafrika Integratives Management-Projekt für einen Effizienten und Tragfähigen Umgang mit Süßwasser in Westafrika: Fallstudien für ausgewählte Flusseinzugsgebiete in unterschiedlichen Klimazonen Siebter Zwischenbericht Zeitraum: 1.1.2006 31.12.2006 Ein interdisziplinäres Projekt der Universität zu Köln und der Universität Bonn 01. April 2007 IMPETUS Koordinierende Institutionen Universität zu Köln Universität Bonn Institut für Geophysik und Meteorologie Geologisches...»

«Somalia Humanitarian Situation Report @UNICEFSomalia SitRep 9. Reporting Period September 2014 SITUATION IN NUMBERS Highlights September 2014  More than 7,751 suspected measles cases have been reported in 2014 with 218,000 thousands of unvaccinated Somali children at risk of disability or death. UNICEF will vaccinate 520,000 children under-5 in the outbreak areas of Lower Juba, # acutely malnourished children under-5 Banadir and Puntland in October. UNICEF seeks funding for a national...»

«Feature Article One Recovering The Third Mark Of The Church Arturo G. Azurdia III It is safe to assume that the majority of people who read this journal are in sympathy with its aspirations: to promote the work of reformation and revival in the church of Jesus Christ. But how will this work be accomplished? Many of these readers would readily affirm that a return to the faithful exposition of the Scriptures is essential to the accomplishment of this task. Repentance and prayer, both on an...»

<<  HOME   |    CONTACTS
2016 www.abstract.xlibx.info - Free e-library - Abstract, dissertation, book

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.