FREE ELECTRONIC LIBRARY - Abstract, dissertation, book

Pages:     | 1 |   ...   | 11 | 12 || 14 | 15 |   ...   | 25 |

«Approximate Information Filtering in Structured Peer-to-Peer Networks Christian Zimmer Max-Planck Institute for Informatics Saarbrücken ...»

-- [ Page 13 ] --

As a result, Figure 4.19 illustrates that there is a high variation of prediction errors depending on the choice of the parameter combination of η and γ. Although different value pairs may lead to small prediction error for double exponential smoothing, notice that there is no optimal parameter combination. The best combination for one behavior does not necessarily result in satisfying predictions for another one. The conclusion of this observation is that a global choice for η and γ cannot be used to predict all peer behaviors for all active continuous queries. This is reasonable since some behaviors show stronger or less stronger trends that have to be addressed with DES. As a result, MAPS has to adapt the parameter combination for η and γ to the observed value behavior. In the next section, the MAPS Selective Method is presented as a novel method for capturing parameter combinations that will result in high recall at no extra communication cost.

4.4.2 The MAPS Selective Method The conclusion of the previous investigation is that there exists no single setting for parameters η and γ to effectively model all different publishing behaviors. Therefore, the MAPS Selective Method is introduced to adapt the parameter setting to the given scenario. The approach works as follows.

Let the values x1,..., xn−1 denote the observed time series values and let xn be the ˆ predicted value. The Selective Method uses the values x1,..., xn−2 to predict the already known last observed value xn−1. Let xn−1,η,γ denote the predicted value for all combiˆ nations of η and γ. Now, the parameter combination with the smallest error concerning the real observed value of xn−1 is selected. If there are more than one combination with smallest error, the one with the lowest distance to η = 0.5 and γ = 0.5 is picked. The selected parameters are used to predict the next future value xn.ˆ The MSM algorithm uses the observed data series x1,..., xn−1 and the function DESη,γ as input to predict the next value. The algorithm outputs the selected parameter values for η and γ, as well as the prediction value xn as the result of DESη,γ. Notice that the ˆ algorithm is simplified such that the case that several parameter combinations have the smallest error is not considered. Moreover, the observed time series needs at least two values such that the MSM algorithm can be applied.

The Selective Method means that the most appropriate parameter setting concerning the last observed value is always used. In this way, double exponential smoothing is adapted to the given data series. Obviously, at least four observed values are needed to apply this method because three values are indispensable to properly predict the last observed value.

Next, to analyze the approach, the absolute prediction errors for the various behaviors in Table 4.4 are computed. Figure 4.20 compares the MAPS Selective Method, named in the graph as selective, with the minimum absolute prediction error per round (min), the average prediction error (average), and the prediction errors for parameter combination η = 0.5 and γ = 0.5 (as an obvious parameter choice). This series is named 0.5/0.5 in the graph. Since the maximum prediction error is very high it is not plotted in the graph to allow for the better illustration of the other methods.

As shown in Figure 4.20 the Selective Method is in all behaviors almost as good as min that means that the approach selects the appropriate parameters. The combination 0.5/0.5 does not really deliver satisfying error rates although this could be an obvious choice of parameters and is often recommended. As expected, average presents higher prediction errors even when compared against 0.5/0.5. This is caused by the fact that most parameter combinations result in high error rates. Looking at the different behaviors, the MAPS Selective Method accurately predicts the real values for the LinInc and LinDec behaviors because these data series are the simplest without any trend.

–  –  –

In addition, Figure 4.20 shows the average over all eight behaviors. Here, the novel method performs even better than min which might seem counter-intuitive at a first glance.

Notice however that in the MSM case the average prediction error over all publishers is measured while min refers to the best global parameter combinations of parameters per publisher, averaged for all publishers. This is caused by the fact that a per publisher parameter combination yields to the best possible prediction. If the errors of the MAPS Selective Method are combined, the individual best combination is almost reached and this is still better than the global min. Notice also the proportion between predicted values and prediction errors. Since the observed values range from 0 to 600 average prediction errors higher than 50 or 100 are not acceptable. In contrast, the selective method shows satisfying prediction regarding the predicted value range.

In the next section, an alternative approach for parameter setting is presented and subsequently, Section 4.5 evaluates MSM in terms of usability in the MAPS filtering system.

4.4.3 An Alternative Approach

In this section, an alternative approach to improve the prediction of double exponential smoothing is described. Usually, to select the most appropriate values for the parameters η and γ, each peer would investigate the parameter influence using a test collection with a centralized computation, and this training phase would estimate the parameters. Obviously, each peer would have one fixed combination for all publisher peers. In this case, MAPS would not be able to recognize the individual publishing behavior of peers. Therefore, this training approach helps selecting a local parameter combination for each querying peer but is not as flexible as the Selective Method which is able to individually recognize and adopt to the publishing behavior of every peer that is a candidate to monitor.

All alternative ways to select parameter values for η and γ suffer from the the problem of selecting the appropriate training set. Even if distributed approaches are considered, the wrong choice of text collections may result in unsatisfying results.

–  –  –

Figure 4.20: Comparing MSM For Different Publishing Scenarios.

4.5 Experimental Evaluation This section evaluates the Selective Method implemented for the MAPS system. The following sections explain the experimental setup (Section 4.5.1), measures (Section 4.5.2), and data (Section 4.5.3). In Section 4.5.4 the detailed experimental results using different peer publishing scenarios (Mixed, ExpInc, and QuadDec) are presented. Section 4.5.5 summarizes the experimental results regarding MSM.

4.5.1 Experimental Setup

In all experiments described below, the following steps are executed. The network is set up and the underlying distributed hash table (DHT) is initiated. Next, there are four bootstrapping rounds where the subscriber peers collect the posted directory statistics to the requested continuous queries. After this bootstrapping phase, in the six subsequent rounds, the querying peers subscribe to selected publisher peers, to reach a total of ten rounds.

During these six rounds that represent the monitoring phase, a publisher peer is monitored by a subscriber, when the subscriber’s continuous query is stored in the publishers local query database. The monitored peers notify the subscriber for all published documents matching the stored continuous query.

All peers in the network publish documents during both phases and at certain intervals (or rounds), the continuous queries are repositioned. The intervals and the number of published documents per round depend on the individual peer behavior that will be inspected in Section 4.5.4. At the end of each round of the subscription phase, the subscriber peers rank publishers using the formula described in Section 4.2 and reposition their queries accordingly. In this experimental setting, resource selection is ignored to emphasize the prediction benefits. Thus, in Equation 4.1, α = 0.0 is utilized such that only behavior prediction is considered.

- 77 Chapter 4 Publisher Peer Selection

4.5.2 Experimental Measures To evaluate the retrieval effectiveness, the measures recall is used as the ratio of the total number of notifications received by subscribers to the total number of published documents matching subscriptions. The average recall over all rounds of the subscription phase is reported.

Similar to the publishing behaviors shown previously in the analysis of double exponential smoothing, the recall of the MAPS Selective Method is compared with the minimum and maximum recall that would be possible if a single global pair of parameter values is used for η and γ for all the peers in the network. In addition, an oracle peer selection approach (referred to as oracle) is applied such that it always predicts the accurate IR statistics.

The recall of oracle represents the highest possible recall that could be achieved by MAPS.

The random peer selection approach is implemented only for comparison purposes and demonstrates the performance of a random baseline approach.

Besides recall, the prediction quality is also analyzed. The average absolute prediction error per key, peer and round is measured. In the graphs MSM is compared only to the minimum prediction error opponent (min) since the maximum prediction error (max ) is very high. Notice that, similarly to the case of recall measures, the min and max opponents refer to globally selecting the set of parameters that would minimize or maximize the prediction error. The two other approaches are not considered because random does not utilize a prediction-based peer selection and oracle has by definition a prediction error of zero. In Section 4.5.4 different peer publishing behaviors are investigated in terms of recall and average prediction error.

4.5.3 Experimental Data The document collection that was used for the experiments contains more than 2 million documents from a focused Web crawl by the focused crawler BINGO! [STSW02]. All documents are categorized in one of ten categories: Music, Finance, Arts, Sports, Natural Science, Health, Movies, Travel, Politics, and Nature. The category size ranges from 68, 000 to more than 325, 000 documents (see Table 4.2). There are more than 500, 000 different keys (stop words are not considered) and the documents from different categories are used to categorize the peer set. In all experiments, the network consists of 100 peers with 10 peers per category. Using the document collection, seven strong representative single-key queries are extracted: music, arts, sports, travel, hotel, offer, city. Single-key queries have the advantage that there is a direct dependency between correctly predicting values and recall. In the case of multi-key queries, there can be the effect that a peer publishes a lot of documents containing the single keys but only a few containing the whole query key set.

For simplicity, only single-key queries are considered in this experimental setting. Multi-key

queries are considered in Chapter 5.

4.5.4 Experimental Results After explaining the setup and the dataset of the experimental evaluation, the recall and absolute prediction error results for different peer behaviors are presented. In the set of experiments shown in this section, the publishing behaviors listed in Table 4.4 are utilized.

This means that a peer following the LogInc behavior publishes in the first round no documents and in the last of the ten rounds 600 documents. In addition, a constant publishing behavior is considered where a peer constantly publishes 300 documents per round during both publishing phases.

- 78 Chapter 4 Publisher Peer Selection To investigate the effectiveness of the MAPS Selective Method, three different scenarios are analyzed. In the first scenario, all mentioned behaviors are used such that some peers have a constant publishing behavior and others increase or decrease their publication rate accordingly. The second scenario looks at the results when all peers in the network have an ExpInc behavior, and in the last scenario, all peers follow a QuadDec publishing behavior.

The graphs for all scenarios illustrate the performance of the different opponents. The oracle opponent shows the maximum recall MAPS can reach by accurately predicting the IR statistics. Naturally, the prediction error for the oracle opponent is zero. The random peer selection shows the results when publisher peers are selected completely at random. In the random setting, the prediction error is not of interest, because prediction is completely ignored. The min and max opponents present the best and worst recall MAPS can get with a global setting of parameters across all peers. Similarly to Section 4.4.2, in the graphs only the min absolute prediction error is included to better illustrate the differences between the different opponents. The MSM approach shows the recall and prediction error of the local parameter computation that is used to adapt the DES parameters. The Mixed Publishing Scenario

Over the ten rounds of this scenario, all 100 peers publish together 285, 960 documents following the different publishing behaviors. Figure 4.21 shows the recall depending on the percentage of monitored peers in the network (ρ). As it can be seen, the MAPS Selective Method performs marginally better than the max opponent and slightly worse than the oracle, whereas min and random achieve significantly lower recall. This means that the correct choice for the parameters can greatly affect the recall level. Notice that in this scenario the Selective Method not only performs slightly better than the best global parameter choice for η and γ, but also this performance is not greatly affected by the percentage of monitored publishers. This shows that a local auto-adjustment of prediction parameters is possible and results in recall similar to approaches that require global knowledge to compute these parameters. Thus, MSM is able to achieve recall of 60% by monitoring only 20% of all publisher peers in the network.

Additionally, Figure 4.22 shows the absolute prediction error per key, peer, and round.

Pages:     | 1 |   ...   | 11 | 12 || 14 | 15 |   ...   | 25 |

Similar works:

«Bericht über das 1. Halbjahr 2015 Jänner bis Juni Halbjahresfinanzbericht zum 30. Juni 2015 Die Baumaßnahmen am Standort Wien erreichen finale Phase Leichter Umsatzzuwachs im Vergleich zum Vorjahr Wesentliche Entwicklungen • Umsatzzahlen im ersten Halbjahr 1,8% über den • Außerordentliche Effekte bestimmen das Vorjahreswerten. Halbjahresergebnis.• Das Standortprojekt, sowie die Maßnahmen in • Rohstoffkostenniveau wird auch für die Folge des Gebäudeeinsturzes schreiten nach Plan...»

«Dissertation Submitted to the Combined Faculties for the Natural Sciences and for Mathematics of the Ruperto-Carola University of Heidelberg, Germany for the degree of Doctor of Natural Sciences Presented by Master of Science (M.Sc. Biotechnology) Kirti Shukla Born in: New Delhi, India Oral examination: 5.11.2014 Regulation of NF-κB signaling and cell cycle progression by microRNA-30c-2-3p in breast cancer Referees: Prof. Dr. Stefan Wiemann Prof. Dr. Petra Kioschis Thesis Declaration I hereby...»

«PERSISTENCE AMONG ADULT BASIC EDUCATION STUDENTS IN PRE-GED CLASSES by John P. Comings, Andrea Parrella, and Lisa Soricone Harvard Graduate School of Education NCSALL Reports #12 December 1999 The National Center for the Study of Adult Learning and Literacy Harvard Graduate School of Education 101 Nichols House, Appian Way Cambridge, MA 02138 The work reported herein is supported by a subcontract from Harvard University under the Educational Research and Development Centers Program, Award...»

«The Research Foundation for The State University of New York Travel Handbook Introduction to Travel Procedures and Responsibilities General Guidelines and Provisions Pre-Travel and Post-Travel Procedures Tax-Exempt Status Expenses and Rates Lodging and Meals Method I (Unreceipted Lodging) Method II (Receipted Lodging) Non-overnight Meal Payments Reimbursement Rates Transportation Commercial Airplane Automobile Use Personal Vehicles RF Vehicles State Vehicles Rental Vehicles Taxi and Public...»

«Ernst Halbmayer Debating animism, perspectivism and the construction of ontologies The contributions to this dossier provide a discussion of new animism, perspectivism and multinaturalism. In the last twenty years these concepts have emerged as central theoretical innovations of classical structuralist assumptions1 in Lowland South American anthropology (Århem 1996, Descola 1992, 1996, 2005, Stolze Lima 1999, Viveiros de Castro 1998). They reformulated outdated evolutionist notions of...»

«Том 7, №1 (январь февраль 2015) Интернет-журнал «НАУКОВЕДЕНИЕ» publishing@naukovedenie.ru http://naukovedenie.ru Интернет-журнал «Науковедение» ISSN 2223-5167 http://naukovedenie.ru/ Том 7, №1 (2015) http://naukovedenie.ru/index.php?p=vol7-1 URL статьи: http://naukovedenie.ru/PDF/124EVN115.pdf DOI: 10.15862/124EVN115 (http://dx.doi.org/10.15862/124EVN115) УДК 311.311 Леднева Ольга...»

«What contributes to the perception of musical phrases in western classical music? Neta Spiro What contributes to the perception of musical phrases in western classical music? ILLC Dissertation Series DS-2007-02 INSTITUTE FOR LOGIC, LANGUAGE AND COMPUTATION For further information about ILLC-publications, please contact Institute for Logic, Language and Computation Universiteit van Amsterdam Plantage Muidergracht 24 1018 TV Amsterdam phone: +31-20-525-6051 fax: +31-20-525-5206 email:...»

«Bestandsschutz Der Arbeitsverhaeltnisse Von Betriebsratsmitgliedern Presence for telemarketing should Bestandsschutz Der Arbeitsverhaeltnisse Von Betriebsratsmitgliedern want with your adjustment course is not potential to have in, and Bestandsschutz Der Arbeitsverhaeltnisse Von Betriebsratsmitgliedern for his average is monthly diverse and unproductive of a 8 is started on you away. Payday way only use suffering a entry without every pdf is, but personal traditional members, Bestandsschutz Der...»

«KÖZJEGYZŐK KÖZLÖNYE KÖZJEGYZŐK KÖZLÖNYE A Magyar Országos Közjegyzői Kamara szakmai folyóirata 2015 / 1. szám 2015. január / február . oldal Juhász Gábor Az örökhagyó szabad rendelkezési jogának korlátjaként megjelenő kötelesrész a spanyol jogban (II. rész) . oldal Molnár Judit „Elköltözött” vagy a „címzett ismeretlen” – A kötelezett menekülési lehetősége a fizetési meghagyásos eljárásban . oldal Bodzási Balázs Háromoldalú...»

«ACTA CLASSICA XLVIII. 2012. UNIV. SCIENT. DEBRECEN. p. 141–147. AUDOLLENTIANA BY GYÖRGY NÉMETH Abstract: The archival bequest of Auguste Audollent, one of the most distinguished students of ancient defixiones, includes numerous drawings of North African curse tablets. His drawings are the only reliable sources for the shape and appearance of inscriptions that have been lost or have become illegible. Defixiones once owned by Audollent are now in the Musée Bargoin in Clermont-Ferrand. This...»

«ANTICIPATED CONSEQUENCES: DEVELOPING A STRATEGY FOR THE TARGETED MEASUREMENT OF DISPLACEMENT AND DIFFUSION OF BENEFITS by Niall Hamilton-Smith Policing and Reducing Crime Unit, U.K. Home Office Abstract: This paper examines how displacement and diffusion of benefits can be measured within the context of crime reduction project evaluations. Attempts to monitor the impact of Reducing Burglary Initiative (RBI) projects in the United Kingdom highlighted an existing lack of measurement strategies in...»

«Карташoва Людмила и Мадагаскар Я «Экон-Информ» Москва УДК ББК Карташова Л. МАДАГАСКАР и Я. – Москва: Экон-Информ, 2011. – 272 с. Романтические эмоции вызывало у многих поколений россиян слово “Мадагаскар”! Но так ли многим удавалось – и удается сейчас – повидать его своими...»

<<  HOME   |    CONTACTS
2016 www.abstract.xlibx.info - Free e-library - Abstract, dissertation, book

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.