FREE ELECTRONIC LIBRARY - Abstract, dissertation, book

Pages:     | 1 |   ...   | 19 | 20 || 22 | 23 |   ...   | 25 |

«Approximate Information Filtering in Structured Peer-to-Peer Networks Christian Zimmer Max-Planck Institute for Informatics Saarbrücken ...»

-- [ Page 21 ] --

Contrary to approaches such as [TIK05a] that distribute the documents or the index lists among the peers to achieve exact retrieval and filtering functionality, publications in MinervaDL are kept locally at each provider. This lack of publication forwarding mechanism is a design decision that offers increased scalability in MinervaDL by trading recall. Thus, only monitored provider peers (i.e., indexing a continuous query cq) can notify a consumer C, although other providers may also publish relevant documents. As already stated, this makes the scoring function pred(P, cq) a critical component.

Thus, when a provider peer P wants to publish a new document d to the MinervaDL network, the document is only matched against P ’s local continuous query database to determine which continuous queries match d, and thus which consumers should be notified.

Additionally, at certain intervals P creates Post messages with updated statistics and sends them to its access point S.

–  –  –

7.4 Scoring Functions Selecting the appropriate provider peers to forward an one-time or a continuous query requires a ranking or scoring function that will be used to determine the most appropriate sources (providers) for a given information demand. Although, both IR and IF protocols utilize the same metadata stored in the distributed directory, the peer ranking strategies

in this use case differ significantly and have to consider varying objectives:

–  –  –

So, it is clear, that both functionalities have to consider different objectives. In the filtering scenario, the main approach of this doctoral theses is used to select the best providers. To even better illustrate this, consider the following example: Assume two providers, where the first is specialized in soccer (i.e., in the past has published a lot of documents about soccer), although now it is rarely publishing new documents. The second provider is not specialized in soccer but currently it is publishing many documents about soccer. Now, a consumer subscribes beginning 2008 for documents with the continuous query soccer Euro 2008 1. A ranking function based on resource selection algorithms would always choose the first provider. To get a higher ranking score, and thus get selected for indexing the query, the second provider has to specialize in soccer, a long procedure that is inapplicable in a filtering setting, which is by definition dynamic. The above example illustrates that the ranking formula should be able to predict the publishing behavior of providers by observing both their past and current behavior and projecting it to the future.

Section 7.4.

2 discusses the appropriate technique from this doctoral thesis.

1 The2008 UEFA European Football Championship, commonly referred to as Euro 2008, takes place in Austria and Switzerland, from 7 to 29 June 2008.

- 130 Chapter 7 Digital Library Use Case 7.4.1 Resource Selection As mentioned before, a number of resource selection methods (e.g., CORI [CLC95], GlOSS [GGMT99], CVV [YL97], language models, etc.) are available to identify good authorities with respect to a given multi-key query q. A simple but effective approach utilizes term frequencies (tf ) and document frequencies (df ) already stored in the directory to compute

a score sel(P, q) for all provider peers P :

–  –  –

The parameter β in Equation 7.1 can be chosen in the range between 0 and 1. It is used to stress the importance of df or tf max. Preliminary experiments have shown that β = 0.5 is an appropriate value in most cases [BMT+ 05a]. Using the scoring function presented above, providers can be can identified that store high-quality documents for query q, and thus achieve high recall by querying only a few providers in the MinervaDL network. To further improve recall, there are peer selection strategies that consider overlap-awareness [BMT+ 05a] and key correlations [MBN+ 06].

7.4.2 Behavior Prediction The prediction mechanism collects IR statistics from the distributed directory and treats them as time series data to perform statistical analysis over them. A statistical analysis assumes that the data points taken over time have some sort of internal structure (e.g., trend etc.), and uses this observation to analyze older values and predict future ones [Cha04].

There exist various approaches (see 2.4) that differ in their assumptions about the internal structure of the time-series (e.g., whether it exhibits a trend or seasonality). Moving average techniques as described in are a well-known group of time series prediction techniques that assign equal weights to past observations (e.g., averaging is the simplest form of moving average techniques), and thus cannot cope with trends or seasonality. In the Minerva DL setting, it is reasonable to put more emphasis on a peer’s recent behavior and thus assign higher weights to recent observations. Double exponential smoothing assigns exponentially decreasing weights to past observations and assumes that the time series data present some trend to predict future values. Since many queries are expected to be short-lived so that no seasonality will be observed in the IR statistics time series, seasonality is not considered in the predictions (and thus, triple exponential smoothing is not used).

The scoring function pred(P, cq) returns for a score representing the probability that provider P will publish in the future documents relevant to the continuous query cq. MinervaDL uses double exponential smoothing to predict the following two statistical values.

ˆ Initially, for all keys k in cq, the approach predicts the value for dfP,k (denoted as df P,k ), ˆ )) between df ˆ and uses the difference (denoted as δ(df P,k P,k and the last received value ˆ ) reflects the number of relevant from the directory to calculate the score for P. δ(df P,k documents concerning t that P will publish in the next period. Secondly, MinervaDL predicts δ(cs) as the difference in the collection size of P reflecting the provider peer’s overall ˆ expected future publishing activity. In this way two aspects of the peer’s behavior are modeled: its ability to publish relevant documents in the future and its overall expected publishing activity. The consumer peer that submits cq obtains the IR statistics that are needed as an input to the prediction mechanism by utilizing the distributed directory. The following formula is used to compute the prediction score for a provider P with respect to

a continuous query cq:

–  –  –

7.5 Experimental Evaluation This section presents the evaluation of MinervaDL. In the experimental evaluation, a total number of 1000 providers and the same number of consumers is assumed. All these peers are connected to a MinervaDL network using 400 super-peers as access points. At bootstrapping, each provider stores about 300 documents and is mainly specialized in one out of ten categories: Music, Finance, Arts, Sports, Natural Science, Health, Movies, Travel, Politics, and Nature. Thus, there are 100 specialized information sources per category.

Overall, the experimental runs use 30 queries2 containing two, three or four keys that are strong representatives of a certain document category and are used both as one-time and continuous queries. The next sections investigate search performance, filtering performance, and a message cost analysis.

7.5.1 Search Performance In Figure 7.3, the experimental results of the information retrieval functionality of MinervaDL are presented. Consumers issue the 30 one-time queries and the evaluation measures

the relative recall to a centralized search engine hosting the complete document collection:

the ratio of top-25 documents for a query included in the merged P2P query result. Thus, MinervaDL applies the resource selection strategy as described before in 7.4.1 and increase the percentage ρ of providers in the system that are requested to answer the query.

Figure 7.3 shows that the retrieval performance of MinervaDL manages to retrieve more than 90% of the top rated documents by asking only a fraction of the providers, whereas the baseline approach of random peer selection reaches a much lower recall (around 20%).

This experiment shows that a random selection of provider peers does not lead to satisfying retrieval results.

7.5.2 Filtering Performance To evaluate the filtering functionality of MinervaDL, the experiments assume that providers publish 30 documents within a specific time period (called publishing round ) and the results after ten publishing rounds are investigated. A scenario that models periods of inactivity in the publishing behavior is considered. This scenario assumes that consumers subscribe with 30 continuous queries and reposition them in each publishing round. In the experiments, the percentage ρ of monitored providers is varied and the providers are ranked using the two scoring functions approaches described in Section 7.4. Recall is used as filtering measures;

it is defined as the ratio of total number of notifications received by the consumers to the total number of published documents matching a subscription.

2 Example queries are museum modern art, or space model.

–  –  –

Figure 7.5: Approximate vs.

Exact Information Filtering.

As illustrated in Figure 7.4, the use of behavior prediction improves recall over resource selection as it manages to model more accurately the publishing behavior of providers.

In this way, the proposed scoring function manages to achieve a recall of around 80% by monitoring only 20% of the providers in the MinervaDL network. Notice that for resource selection, this number is only 35% which makes it an inapplicable solution to a filtering environment.

7.5.3 Message Costs Analysis Figures 7.5 and 7.6 show different aspects of a message costs analysis of MinervaDL. Here, in each round, the subscriptions are repositioned and the one-time queries are requested

again (e.g., by another consumer peer). In this setting, there are three types of messages:

directory, retrieval, and filtering messages. Figure 7.5 compares MinervaDL with an implementation of the exact matching protocols of LibraRing [TIK05a].

The overall message costs are considered as a function of the total number of documents published in the system for both approaches. As shown, message costs of MinervaDL are independent of the number of publications, whereas LibraRing, as with all exact matching approaches, is sensitive to this parameter since documents have to be disseminated to the network at publication time. This imposes a higher network cost especially for high publication rates as those expected in a DL setting.

Finally, in Figure 7.6, it can be observed that directory messages dominate both retrieval and filtering messages. This is expected since matching and filtering mechanisms are by design local to accommodate high publication rates. This means that building both IR and IF functionality on top of the routing infrastructure imposes no extra cost on the architecture, compared to one that supports only one type of functionality. Since other types of information can also be stored in the directory in the same fashion (e.g., QoS statistics or load balancing information) building extra functionality will increase the added value of the directory maintenance, and come at almost no extra cost.

–  –  –

1.5 Figure 7.6: Message Costs in MinervaDL.

7.6 Discussion This chapter presented a digital library use case utilizing a novel DL architecture coined MinervaDL, designed to provide both IR and IF functionality in a single unifying framework.

All introduced protocols regulate the communication between the three peer types (superpeers, providers, and consumers). The experimental evaluation of this use case investigated the influence of the individual scoring functions used to perform the two functionalities of MinervaDL. Furthermore, MinervaDL was compared against the LibraRing architecture designed to provide exact retrieval and filtering. The next chapter will conclude this thesis by summarizing the main contributions, and giving an outlook of open problems.

–  –  –

Chapter 8 Conclusion and Open Questions The last chapter of this doctoral thesis summarizes the presented work (Section 8.1), and lists the main contributions (Section 8.2). Finally, Section 8.3 discusses possible extensions to the presented work and elaborates on open research questions with regard to load balancing, dynamics in P2P networks, replication & caching, and economic aspects1.

8.1 Summary Recently, the P2P paradigm has moved towards distributed data management systems that are able to offer information retrieval or information filtering functionality. The main advantage of P2P is its ability to handle huge amounts of data in a decentralized and selforganizing manner. The characteristics of P2P offer high potential benefit for information systems powerful regarding scalability, efficiency, and resilience to failures and dynamics.

Beyond that, such an information management system can potentially benefit from the intellectual input of a large user community participating in the data sharing network. Finally, P2P information systems can also bring forward pluralism in informing users about Internet content, a crucial property that breaks information-resource monopolies and bypasses the biased visibility of content from economically powerful sources. In the above distributed and dynamic setting, information filtering, also referred to as publish/subscribe or continuous querying or information push, can be seen as equally important to information retrieval (or one-time querying), since users are able to subscribe to information sources and be notified when new events of interest happen. A user posts a subscription (or continuous query) to the system to receive notifications whenever matching events of interest take place. Information filtering and information retrieval are often referred as two sides of the same coin [BC92].

Pages:     | 1 |   ...   | 19 | 20 || 22 | 23 |   ...   | 25 |

Similar works:

«ZUR VOLKSKUNDE DER UNGARNDEUTSCHEN. EIN LEHRUND ARBEITSBUCH FÜR DIE STUDENTEN DER NATIONALITÄTENGRUNDSCHULLEHRERUND -KINDERGÄRTNERINNENBILDUNG Zusammengestellt von ÉVA MÁRKUS Zur Volkskunde der Ungarndeutschen. Ein Lehrund Arbeitsbuch für die Studenten der Nationalitätengrundschullehrerund -kindergärtnerinnenbildung Zusammengestellt von Éva Márkus TREZOR KIADÓ Budapest, 2010 Közreadja az Eötvös Loránd Tudományegyetem Tanítóés Óvóképző Karának Idegen Nyelvi és Irodalmi...»

«Karadeniz İncelemeleri Dergisi: Yıl 8, Sayı 16, Bahar 2014 39 AĞA MUHAMMED HAN’IN KAFKASYA SEFERLERİ VE OSMANLI-İRAN İLİŞKİLERİ (1795-1797) Abdurrahman ATEŞ ÖZ  Bu çalışmada, Ağa Muhammed Han’ın Kafkasya seferleri ve Osmanlıİran siyasi ilişkileri (1795-1797) ele alınarak, kaynak-ların verdiği bilgiler nispetinde bazı değerlendirmeler yapılmaya çalışılmıştır. Konunun ana mihveri çerçevesinde, giriş kısmında Ağa Muhammed Han hakkında kısaca bilgi...»

«Concept development of rotating bed chemical looping combustion reactor C.W.M. Hermans Master of Science Thesis b ( ) ( ) ( ) ( ) Supervisors: Dr. Ir. Bendiks Jan Boersma Document number:2600 Dr. Ir E.L.V. Goetheer Dr. Erin Schols 11/11/2013 Error! Use the Home tab to apply Kop 1 to the text that you want to appear here.Master thesis: “That's one small step for technology, one giant leap for students” C.W.M. Hermans Master of Science Thesis Error! Use the Home tab to apply Kop 1 to the text...»

«Conditional Ablation of the Gene Encoding Transforming Growth Factor-β1 in the Mouse Inaugural-Dissertation zur Erlangung des Doktorgrades Der Mathematisch-Naturwissenschaftlichen Fakultät der Universität zu Köln vorgelegt von Arti Gaur Berichterstatter: Prof. Dr. Klaus Rajewsky Prof. Dr. Manfred Blessing Tag der mündlichen Prüfung: 14.Mai, 2002 ABBREVIATIONS Aa amino acids Ab antibody Ag antigen APC antigen presenting cell BCR B cell receptor BM bone marrow Bp base pair(s) BSA bovine...»

«ronaldo trikot 2015 ronaldo trikot 2015 Ronaldo Trikot Jetzt das Original Trikot sichern. Jetzt das Original Trikot sichern. 0 Versandkosten! C. Ronaldo Fußballtrikots Ronaldo Fanshirts und Trikots. Ronaldo Fanshirts und Trikots. Riesenauswahl bei LadenZeile.de! Trikots von Top Marken Riesen Auswahl an Top Trikots. Riesen Auswahl an Top Trikots. Shops vergleichen Geld sparen! Ronaldo Trikot | preis.de Vergleiche finde günstige Preise. Real Madrid 2015 2016 RONALDO 7 Away Trikot 29.99 Real...»

«De Búrca Ra re Books A selection of fine, rare and important books and manuscripts Catalogue 117 Easter DE BÚRCA RARE BOOKS Cloonagashel, 27 Priory Drive, Blackrock, County Dublin. CATALOGUE 117 Easter 2015 PLEASE NOTE 1. Please order by item number: Armstrong is the code word for this catalogue which means: “Please forward from Catalogue 117: item/s.”.2. Payment strictly on receipt of books. 3. You may return any item found unsatisfactory, within seven days. 4. All items are in good...»

«PART I SUMMARY OF FACTS UPON WHICH THE PROPOSALS ARE BASED CHAPTER 1 1. DESCRIPTION OF TRACT DEALT WITH 1.1. NAME AND SITUATION: This working plan deals with the forest tracts of the Kachugaon Forest Division under the Western Assam Circle Conservancy. The territorial division is spread over parts of present Kokrajhar and Dhubri districts, on the northern bank of river Brahmaputra. The area dealt with falls between 260 19/ 33.24// N to 260 51/ 54.36// N latitude and 89051/ 18.70// E to 900 16/...»

«DAS AKUTE ABDOMEN BEIM KLEINTIER AUS CHIRURGISCHER SICHT Eine retrospektive Studie der Jahre 2000 bis 2005 CHARLOTTE GÜNTHER INAUGURAL-DISSERTATION zur Erlangung des Grades eines Dr. med. vet. beim Fachbereich Veterinärmedizin der Justus-Liebig-Universität Gießen édition scientifique VVB LAUFERSWEILER VERLAG Das Werk ist in allen seinen Teilen urheberrechtlich geschützt. Jede Verwertung ist ohne schriftliche Zustimmung des Autors oder des Verlages unzulässig. Das gilt insbesondere für...»

«CA FASHIONS AND FUNDAMENTALISMS IN ` FIN-DE-SIECLE YEMEN: Chador Barbie and Islamic Socks ANNE MENELEY Trent University, Peterborough, Ontario I had to say goodbye to my friends with whom I had lived for nearly two years in the Yemeni town of Zabid, during the tense time that followed the Iraqi invasion of Kuwait in 1990. The common connections we had forged as neighbors, friends, fictive kin, and fellow residents of Zabid were suddenly strained. Zabidis asserted that Americans (and, by...»

«Modelling airport and airline choice behaviour with the use of stated preference survey data Stephane Hess a,1 Thomas Adlerb John W. Polaka a Centre for Transport Studies, Imperial College, London SW7 2AZ, UK b Resource Systems Group, White River Jct., Vermont VT 05001, USA Abstract The majority of studies of air travel choice behavior make use of Revealed Preference (RP) data, generally in the form of survey data collected from departing passengers. While the use of RP data has certain...»

«BILLING CODE: 4810-AM-P BUREAU OF CONSUMER FINANCIAL PROTECTION 12 CFR Part 1003 Docket No. CFPB-2014-0019 RIN 3170-AA10 Home Mortgage Disclosure (Regulation C) AGENCY: Bureau of Consumer Financial Protection. ACTION: Final rule; official interpretations. SUMMARY: The Bureau of Consumer Financial Protection is amending Regulation C to implement amendments to the Home Mortgage Disclosure Act made by section 1094 of the Dodd-Frank Wall Street Reform and Consumer Protection Act (Dodd-Frank Act)....»

«Enforcement Instructions and Guidance Chapter 48 – Making flight arrangements 48.1 Introduction 48.2. Collection of tickets 48.3. Procedures for deferral of removal 48.4. Paragraph 10 removals without prejudice 48.5. Carrier's obligations 48.6. Voluntary departures at own expense 48.6.1 Voluntary departures other than at own expense 48.7. Removing Irish Land Border illegal entrants 48.8. Removing BDTC, BOTC, BNO and BOC passport holders 48.9 Illegal entrants required as witnesses 48.10....»

<<  HOME   |    CONTACTS
2016 www.abstract.xlibx.info - Free e-library - Abstract, dissertation, book

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.