WWW.ABSTRACT.XLIBX.INFO
FREE ELECTRONIC LIBRARY - Abstract, dissertation, book
 
<< HOME
CONTACTS



Pages:     | 1 || 3 | 4 |   ...   | 25 |

«Approximate Information Filtering in Structured Peer-to-Peer Networks Christian Zimmer Max-Planck Institute for Informatics Saarbrücken ...»

-- [ Page 2 ] --

-7Zusammenfassung Die in dieser Arbeit präsentierte Architektur bietet die Werkzeuge und Protokolle, um die am meisten geeigneten Anbieter für eine Daueranfrage zu ermitteln, indem Metadaten genutzt werden, die in einem konzeptionell globalen, aber physisch verteilten Verzeichnis gespeichert werden. Ein Abonnent berechnet zur Wahl der besten Anbieter Bewertungszahlen, die das bisherige Verhalten beinhalten und dieses nutzen, um zukünftiges Verhalten vorauszusagen. Diese Bewertungszahlen basieren auf einer Kombination aus Resource Selection und Behavior Prediction, um die gegebene Dynamik zu berücksichtigen. Behavior Prediction benutzt Zeitreihenanalyse mit Double Exponential Smoothing Techniken zur Vorhersage zukünftigen Verhaltens mit schnellen Veränderungen. Zusätzlich können Korrelationen innerhalb der Schlüsselwörter einer Anfrage ausgenutzt werden, um die Auswahl der Anbieter weiter zu verbessern. Aus Skalierbarkeitsgründen besitzen die Metadaten, die im Verzeichnis gespeichert werden, Anbieter-Granularität und keine Dokumenten-Granularität. Dadurch werden die besten Anbieter für einzelne Schlüsselwörter festgestellt. Auf der Hauptarchitektur aufbauend bietet diese Arbeit zwei Algorithmen, um Korrelationen unter Schlüsselwörter zur Verbesserung der Auswahl der Anbieter auszunutzen. Der erste Algorithmus benutzt individuelle Schlüsselwörter Synopsen aus dem Verzeichnis, um die Auswahl zu verbessern. Der zweite Algorithmus erweitert das verteilte Verzeichnis um Metadaten von explizit ausgewählte Mengen aus mehreren Schlüsselwörtern.

Bereits existierende Ansätze, die sich auf Mengen aus zwei Schlüsselwörtern beschränken, werden auf beliebig viele Schlüsselwörter erweitert. Darüber hinaus zeigt diese Arbeit einen Algorithmus auf, welcher Statistiken für Mengen aus mehreren Schlüsselwörtern durch die Kombination von beliebigen Teilmengenstatistiken approximiert. Durch Anwendung der vorgeschlagenen Techniken erhält approximatives Information Filtering eine höhere Skalierbarkeit. Für einen akzeptablen Verlust an Recall erhält man schnellere Antwortzeiten und einen geringeren Nachrichtenverkehr.

Schließlich präsentiert diese Doktorarbeit einen Anwendungsfall für Digitale Bibliotheken, welcher approximative Retrieval und Filtering Funktionalität in einer vereinten Architektur unterstützt, und eine prototypische Implementierung, die die Anwendbarkeit von approximativem Information Filtering herausstellt. Vergleichende Experimente auf unterschiedlichen echten Datenmengen messen die Effizienz und Effektivität der Methoden dieser Arbeit.

-8Chapter 1 Introduction Chapter 1 Introduction This introductory chapter motivates the doctoral thesis in Section 1.1 by introducing the problem of P2P IF. Section 1.2 highlights the main solution and presents the major contributions that will be investigated in the following chapters, while Section 1.3 presents the thesis outline.

1.1 Motivation The peer-to-peer (P2P) paradigm has been receiving increasing attention in recent years, mainly in the context of file-sharing applications (Gnutella, BitTorrent) or other Internet applications such as IP telephony (e.g., Skype). In the meantime, the P2P paradigm is moving towards distributed data management systems such as information retrieval (IR) or information filtering (IF). The main advantage of P2P is its ability to handle huge amounts of data in a decentralized and self-organizing manner. The characteristics of P2P offer high potential benefit for information systems powerful regarding scalability, efficiency, and resilience to failures and dynamics. Beyond that, such an information system can potentially benefit from the intellectual input of a large user community participating in the data sharing network. Finally, P2P information systems can also bring forward pluralism in informing users about Internet content, which is crucial in order to guard against information-resource monopolies and the biased visibility of content from economically powerful sources. Information censorship is often mentioned in this context.

Information filtering, also referred to as publish/subscribe or continuous querying or information push, can be seen as equally important to information retrieval (or one-time querying), since users are able to subscribe to information sources and be notified when new events of interest happen. The need for content-based push technologies is also stressed by the deployment of new applications (e.g., Google Alerts) where a user posts a subscription (or continuous query) to the system to receive notifications whenever matching events of interest take place. Information filtering and information retrieval are often referred as two sides of the same coin [BC92]. Although many of the underlying issues and goals are similar in retrieval and filtering, since in both cases a document needs to be matched against an information demand, the design issues, the techniques and algorithms devised to increase filtering efficiency differ significantly. This concept will be one of the main drivers of this thesis.

The main challenge addressed in this thesis is to exploit P2P technology for efficient and approximate information filtering. While there exist several approaches to perform exact information filtering in P2P environments, the work in this thesis emphasizes a novel architecture to support content-based approximate information filtering. Most information filtering approaches taken so far have the underlying hypothesis of potentially delivering notifications from all information producers.





-9Chapter 1 Introduction

An approximate approach relaxes this assumption by monitoring only selected sources that are likely to publish documents relevant to the user interests in the future. Since in an IF scenario the data is originally highly distributed residing on millions of sites (e.g., with people contributing to blogs), a P2P approach seems an ideal candidate for such a setting.

However, exact publish/subscribe functionality has proven expensive for such distributed environments. Contrary, approximate IF offers a natural solution to this problem, by avoiding document granularity dissemination as this presents the main scalability bottleneck for the exact IF approaches.

The targeted P2P system consists of a number n of publisher and/or subscriber peers (or nodes 1 ) dj with j = 1,..., n, forming a network. In general, peers are assumed to be autonomous publishing and requesting data. All publisher peers dj construct and store a local index structure consisting of index lists, Ij (k) over each key k (also referred to as keyword or term). A continuous query in such a filtering system is a set of multiple distinct keywords q = k1, k2,..., tm. The subscriber wants to be notified regarding new documents containing these keywords published by other peers in the network. Therefore, all peers have precomputed statistical summaries (metadata) on their local index contents organized on a per-key basis and typically including measures such as the number of documents that the peers’s local index contains for a given key, the average term frequency in these documents, and so on. Additionally, these summaries may contain compact synopses representing the documents each peer maintains. These summaries are then posted to a conceptually global, but physically distributed directory conveniently accessible by every peer, with O(log n) communication costs per key where n is the size of the network. Assuming the existence of a such a directory, the approximate information filtering system has to provide the ability to identify the top-k most appropriate publisher peers p1, p2,..., pk for a given continuous query q, i.e., those publishers that are expected to provide the best documents matching q in the future. In this thesis, this task is referred to as publisher peer selection. To select this set of peers, this thesis introduces new strategies based on already known resource selection techniques in combination with new behavior prediction techniques. These new techniques utilize the distributed directory to predict future publishing behavior of peers by applying prediction techniques based on time series of IR statistics.

1.2 Main Contributions This section summarizes the major contributions of this doctoral thesis. Section 1.2.1 discusses the general framework for supporting approximate information filtering whereas Section 1.2.2 extends the framework to consider correlations among keywords. Section 1.2.3 introduces a prototype system, and Section 1.2.4 presents a digital library (DL) use case.

1.2.1 Approximate Information Filtering This thesis presents a novel architecture, coined MAPS (Minerva Approximate Publish Subscribe) [ZTB+ 08, ZTB+ 07], to support content-based approximate information filtering in P2P environments. Most IF approaches so far have the underlying hypothesis of potentially delivering notifications from every information producer. Contrary, MAPS relaxes this assumption and monitors only selected sources that are likely to publish documents relevant to the user interests in the future. In MAPS, a user subscribes with a continuous query and only published documents from these monitored sources are forwarded to him.

1 The thesis uses the terminology peer rather than node.

- 10 Chapter 1 Introduction MAPS provides a network-agnostic P2P architecture with different services and its related protocols (directory, subscription, publication, and notification protocol) for supporting approximate IF functionality in a distributed P2P environment. It is the first approach that looks into the problem of approximate IF in such a setting. The architecture exploits metadata stored in a conceptually global, but physically distributed directory. The most critical task in approximate IF is the selection of appropriate publisher peers to meet the information demand in the future. Therefore, MAPS combines existing resource selection techniques with new behavior prediction strategies. It is shown, that existing resource selection approaches (also referred to as collection or database selection) are not sufficient in a dynamic filtering setting since resource selection can only select appropriate authorities that have already published matching documents in the past.

The novel method of behavior prediction of publisher peers completes the peer selection strategy by applying prediction techniques to time series of IR statistics. So, MAPS introduces research of time series analysis to P2P information filtering environments. The experimental evaluation of MAPS approves the effectiveness and efficiency in several settings using real Web data. Various publishing behaviors are investigated to conclude that only the combination of resource selection and behavior prediction allows to improve recall while monitoring only a small number of publishers.

MAPS also conducts an approach to improve the proposed prediction method of time series analysis with double exponential smoothing (DES). The MAPS Selective Method (MSM) does not need any additional communication to adjust prediction parameters.

There, the key concept allows to recognize individual publishing behaviors of peers for various continuous queries. In addition, MAPS is compared to an existing exact information filtering approach in the P2P setting with regard to several major characteristics (e.g., load balancing issues, query placement, or routing infrastructure) [TZWK07].

1.2.2 Correlation Awareness In MAPS, publisher peer selection for a given continuous query with multiple keywords is driven by statistical summaries (metadata) that are stored by the system. These summaries are provided to the directory by the publishers and can be managed in different ways ranging from centralized solutions like servers or server farms, to super-peer or pure peer-to-peer solutions in the form of a distributed P2P directory built on top of a distributed hash table (DHT) or other kinds of overlay networks. For scalability of MAPS, the summaries have publisher granularity, not document granularity, thus capturing the best publisher for certain keywords or keys.

This, together with per-key organization of the directory that disregards keyword correlations2 or correlated key sets are two of the basic reasons that may possibly lead to insufficient recall. Obviously, considering statistics for all possible key sets is clearly not possible due to the explosion in the feature space. The baseline approach would decompose a continuous query into the individual keys and use the statistics from the directory to compute a combined score (e.g., intersection or some other kind of aggregation of individual key scores) for each key and publisher. This score would represent the probability of each source to publish documents matching the information demand in the near future. This approach may lead to poor filtering quality as the top-ranked publishers for the complete query may not be among the top selected publishers. In the worst case, a selected publisher may deliver many documents for each single keyword, but no single document matching all keywords, since this information is not present in the directory.

2 In general statistical usage, correlation or co-relation refers to the departure of two variables from independence.

- 11 Chapter 1 Introduction

Thus, the thesis introduces two approaches as suggested in [ZTW08] that use correlations

among keys to improve filtering quality in the scenario described above:

• The USS (Unique Synopses Storage) algorithm uses existing single-key synopses stored in the directory to estimate the publishing behavior of information sources for key sets.

• The CSS (Combined Synopses Storage) algorithm enhances the directory to explicitly maintain statistical metadata about selectively chosen key sets.

Contrary to distributed IR settings for one-time searching where sources are ranked according to their document collections (i.e., using resource selection strategies), in approximate IF the publishers are ranked according to their probability to publish relevant documents in the near future, which poses different requirements for maintaining metadata.

This thesis presents the first work to develop algorithms for exploiting keyword correlations in such a dynamic IF setting. Existing and self-limited approaches for two-key queries are extended to the case of multi-key continuous queries for an arbitrary number of keys.



Pages:     | 1 || 3 | 4 |   ...   | 25 |


Similar works:

«Reauthorization of the FISA Amendments Act Edward C. Liu Legislative Attorney April 8, 2013 Congressional Research Service 7-5700 www.crs.gov R42725 CRS Report for Congress Prepared for Members and Committees of Congress Reauthorization of the FISA Amendments Act Summary On December 30, 2012, President Obama signed H.R. 5949, the Foreign Intelligence Surveillance Act (FISA) Amendments Act Reauthorization Act of 2012, which extends Title VII of FISA until December 31, 2017. Reauthorizations of...»

«2012 ARBEITSPAPIER – WORKING PAPER 132 Cornelia Günauer „The whole idea is to entertain them“ Musik im indischen Wahlkampf ARBEITSPAPIERE DES INSTITUTS FÜR ETHNOLOGIE UND AFRIKASTUDIEN WORKING PAPERS OF THE DEPARTMENT OF ANTHROPOLOGY AND AFRICAN STUDIES AP IFEAS 132/2012 Herausgegeben von / The Working Papers are edited by: Institut für Ethnologie und Afrikastudien, Johannes Gutenberg-Universität, Forum 6, D-55099 Mainz, Germany. Tel. +49-6131-3923720; Email: ifeas@uni-mainz.de;...»

«Price Movements of the Competing Airlines in the Indian Market: An Empirical Study (A) Goutam Dutta Sumitro Santra W.P. No. 2015-01-02 January 2015 The main objective of the working paper series of the IIMA is to help faculty members, research staff and doctoral students to speedily share their research findings with professional colleagues and test their research findings at the pre-publication stage. IIMA is committed to maintain academic freedom. The opinion(s), view(s) and conclusion(s)...»

«TECHNICAL FEATURES OF THE GREENKETT MULTILAYER PARQUET April 2004 TABLE OF CONTENTS TECHNICAL FEATURES OF THE GREENKETT MULTILAYER PARQUET 3 1.COMPOSITION. 1.1.WOOD 1.1.1.TOP LAYER OF NOBLE WOOD. 1.1.2.CORE LAYER MADE OF CONIFERS 1.1.3.BOTTOM LAYER OF ROTARY CUT VENEER. 1.2.ADHESIVES. 1.3.LACQUERS. 2.DIMENSIONS OF THE FINISHED PRODUCT. 3. – MACHINING. 4.PRESENTATION 4.1. PACKAGE 4.2. PALLET. 5. – BEHAVIOUR. 5.1. – DIMENSIONAL STABILITY. 5.2. – DIMENSIONAL TOLERANCES 5.3. – RESISTANCE...»

«Деменция-сервис Дженни Пауэлл Помощь в общении при деменции Russische Ausgabe von „Hilfen zur Kommunikation bei Demenz“ von Jennie Powell Heft 2 der Reihe Demenz Service gefördert von: LANDESVERBÄNDE DER PFLEGEKASSEN Informationsund Koordinierungsstelle der Информационно-координационный пункт инициативы Landesinitiative Demenz-Service Nordrhein-Westfalen федеральной земли...»

«Shakuntala Banaji Loving with irony: young Bombay viewers discuss clothing, sex and their encounters with media Article (Accepted version) (Refereed) Original citation: Banaji, Shakuntala (2006) Loving with irony: young Bombay viewers discuss clothing, sex and their encounters with media. Sex education, 6 (4). pp. 377-391. ISSN 1472-0825 DOI: 10.1080/14681810600982044 © 2006 Taylor & Francis This version available at: http://eprints.lse.ac.uk/27015/ Available in LSE Research Online: February...»

«— handbook for — Evaluating Infrastructure Regulatory Systems Ashley C. Brown, Jon Stern, and Bernard Tenenbaum with Defne Gencer Handbook for Evaluating Infrastructure Regulatory Systems Handbook for Evaluating Infrastructure Regulatory Systems Ashley C. Brown, Jon Stern, and Bernard Tenenbaum with Defne Gencer THE WORLD BANK Washington, D.C. © 2006 The International Bank for Reconstruction and Development / The World Bank 1818 H Street NW Washington DC 20433 Telephone: 202-473-1000...»

«Critical Incidents In Child Care Well set professional to deliver a other ensuring employee on while of existing your product to the great repayment. You are access and as you do larger and more for an insurance you do, etc., making resume starts or being a Call, your incentive for course is redeemed. The entrepreneurs if an worried search are behind qualified by tax of the set company, also Critical Incidents in Child Care of outs as Critical Incidents in Child Care the obtaining track. For...»

«ZUR VOLKSKUNDE DER UNGARNDEUTSCHEN. EIN LEHRUND ARBEITSBUCH FÜR DIE STUDENTEN DER NATIONALITÄTENGRUNDSCHULLEHRERUND -KINDERGÄRTNERINNENBILDUNG Zusammengestellt von ÉVA MÁRKUS Zur Volkskunde der Ungarndeutschen. Ein Lehrund Arbeitsbuch für die Studenten der Nationalitätengrundschullehrerund -kindergärtnerinnenbildung Zusammengestellt von Éva Márkus TREZOR KIADÓ Budapest, 2010 Közreadja az Eötvös Loránd Tudományegyetem Tanítóés Óvóképző Karának Idegen Nyelvi és Irodalmi...»

«Seacliff Restaurant Wedding Packages Seacliff is Wollongong’s newest and most spectacular wedding venue catering from 40 – 290 guests. We offer newly renovated, modern facilities with quality cuisine, fine service and detailed event coordination all in a superb waterfront location. The Seacliff’s food, wine, service and atmosphere all combine to make it one of Wollongong’s most enjoyable wedding experiences. Seacliff is situated high above the Pacific Ocean on Cliff Road in North...»

«Published as: Krasnikov, S. A.; Doyle, C. M.; Sergeeva, N. N.; Preobrajenski, A. B.; Vinogradov, N. A.; Sergeeva, Y. N.;Zakharov, A. A.; Senge, M. O.; Cafolla, A. A. (2011): Formation of Extended Covalently Bonded Ni Porphyrin Networks on the Au(111) Surface. Nano Research 4, 376–384. The formation of extended covalently bonded Ni porphyrin networks on the Au(111) surface Sergey A. Krasnikov, *,1 Catherine Doyle, 1 Natalia N. Sergeeva, 2 Alexei B. Preobrajenski, 3 Nikolay A. Vinogradov, 3...»

«ВЕСТНИК НП «АРФИ» НАУЧНО-ПРАКТИЧЕСКОЕ ЭЛЕКТРОННОЕ ИЗДАНИЕ ДЛЯ СПЕЦИАЛИСТОВ ПО СВЯЗЯМ С ИНВЕСТОРАМИ #8 Сентябрь / 2014 ВЕСТНИК НП «АРФИ», научно-практическое электронное издание для специалистов по связям с инвесторами, распространяется бесплатно. В электронной форме...»





 
<<  HOME   |    CONTACTS
2016 www.abstract.xlibx.info - Free e-library - Abstract, dissertation, book

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.