FREE ELECTRONIC LIBRARY - Abstract, dissertation, book

Pages:     | 1 |   ...   | 4 | 5 || 7 | 8 |   ...   | 25 |

«Approximate Information Filtering in Structured Peer-to-Peer Networks Christian Zimmer Max-Planck Institute for Informatics Saarbrücken ...»

-- [ Page 6 ] --

- 28 Chapter 2 Background and Technical Basics The PeerCQ system [GL03] is another important pub/sub approach using a DHT infrastructure. PeerCQ takes into account peer heterogeneity and extends consistent hashing [KLL+ 97] with simple load balancing techniques based on appropriate assignment of peer identifiers to network peers. The Meghdoot system [GSAA04] is a content-based pub/sub approach implemented on top of a CAN-like DHT, and supports an attribute-value data model and new ideas for the processing of subscriptions with range predicates (e.g., the price is between 50 and 100 Euros) and load balancing. Mercury [BAS04] also uses an attribute-value data model similar to Meghdoot and is utilized in the implementation of a pub/sub system for network games. PastryStrings [AT06] also supports a rich set of queries in the context of a pub/sub system. The system utilizes prefix-based routing to facilitate both numerical and string queries.

The pFilter system [TX03] uses a hierarchical extension of CAN DHT to filter unstructured documents. This approach relies on multi-cast trees to notify subscribers. VSM and LSI can be used to match documents to user queries. The multi-cast trees of pFilter take into account network distance. Supporting prefix and suffix queries in string attributes is the focus of the DHTStrings system [AT05]. This system utilizes a DHT-agnostic architecture to develop algorithms for efficient multi-dimensional event processing. LibraRing [TIK05a] was the first approach to provide protocols for the support of both IR and IF functionality in digital libraries (DLs) using DHTs. In LibraRing, super-peers are organized in a Chord DHT and both (continuous) queries and documents are indexed by hashing words contained in them. This hashing scheme depends heavily on the data model and query language adopted, and the protocols have to be modified when the data model changes. The DHT is used to make sure that queries meet the matching documents (in the IR scenario) or that published documents meet the indexed continuous queries (in the IF scenario).

In this way the retrieval effectiveness of a centralized system is achieved (in contrary to the work in this thesis), while a number of routing optimizations (such as value proxying, content based-multicasting, etc.) are used to enhance scalability. The filtering approach of LibraRing is presented in detail in [TIK05b, TKD04, TKD08].

2.4 Time Series Analysis In the literature (e.g., [Cha04],or [Ham94]), a time series is declined as a collection of observations made sequentially through time. Time series play an important role in a huge variety of fields ranging from economics to engineering. The different methods to analyze and explain time series constitute an notable area of statistics and this area is called time series analysis.

Many time series arise in practice, for instance by recording values for a period of time.

Prominent examples of time series in economics and finance are export totals in successive years, or average incomes in successive months. [Cha04] mentions the classic Beveridge wheat price index series consisting of the average wheat price in nearly 50 places in many countries in successive years from 1500 to 1869. Besides economic and financial time series, there are a lot of other important areas: physical time series 7, marketing time series, and demographic time series 8. Time series analysis also occurs in process control to ensure a certain quality level. Binary processes are a special type of time series where observations can take only one of two values (usually denoted by 0 and 1). Communication theory represents a situation of this type of time series.

7 Includingmeteorology, marine science and geophysics (e.g., rainfall on successive days).

8 Time series study the population development of countries considering among other things birth and death rates.

- 29 Chapter 2 Background and Technical Basics

The literature also considers events occurring randomly through time9. A series of events of this type is called point process. There is a big difference between analyzing point processes and analyzing standard time series data as the interested quantities differ (e.g., the distribution of the number of events occurring in a given time period).

2.4.1 Terminology

A time series is called continuous time series when the observations are made continuously through time even when the measured variable can only take a discrete set of values (e.g., in binary processes). A time series is said to be a discrete time series when observations are taken only at specific times, usually equally spaced (e.g., every day at the same time).

A discrete time series can even measure continuous variables. The analyzing methods presented in the following, consider only discrete time series. Recall that every continuous time series can be transformed into a discrete one by just recording the values at equal intervals of time. This type of time series is also called a sampled series. Discrete time series also arise by aggregation of values over equal intervals of time10.

Statistical theory is concerned with random samples of independent observations. In contrast, time series analysis considers successive observations that are not independent and that have an important time ordering. The fact that successive observations are dependent, future values may be predicted from past values. A deterministic time series assumes that a time series can be predicted exactly whereas a stochastic time series determines the future only partly by past observations such that exact predictions are not possible; stochastic time series use the idea the future values have a probability distribution which is conditioned by a knowledge of past values.

2.4.2 Objections of Time Series Analysis

The literature enumerates several objectives in analyzing a time series. The description of time series mainly includes plotting the observations against time to get a so-called time plot. These time plots are used to obtain simple descriptive measures of the main series properties (e.g., regular seasonal effects). Linear systems are used to convert an input series to an output series by a linear operation. This technique is important in the explanation of time series especially when observations are taken on two or more variables. Then, the variation in one time series can be used to explain the variation in another series. Control is another objective of time series analysis and mainly consider physical or economic systems (e.g., measure the quality of a manufacturing process).

The most important objective for this thesis is prediction. Given an observed time series, prediction is used to get the future values of the series. Predictions appear in sales forecasting, or in the analysis of economic and industrial time series. In general, the terms prediction and forecasting are used interchangeably. In the time series literature, there is also the differentiation between prediction to describe subjective methods and forecasting to describe objective methods. This thesis uses the term prediction in the following chapters to specify future observations.

–  –  –

2.4.3 Types of Variation The literature decomposes the variation in a time series into trend, seasonal variation (or seasonality, or periodicity), and other cyclic changes. All other remaining types of variation are called irregular fluctuations.

• Trend: This variation is loosely defined as long-term change in the mean level. A main issue with this definition is the meaning of long-term. A time series of increasing or decreasing observations shows a trend.

• Seasonality or Periodicity: Time series with periodic variations show seasonal changes. Typical examples are sales figures and temperature reading where time series exhibit variation that is annual in period.

• Cyclic Changes: Time series with variations that do not have a fixed period but which are predictable to some extent belong to this category of cyclic changes. For example, business cycles vary from 3 or 4 years to more than 10 years.

• Irregular Fluctuations: Other types of variations that do not show trends, seasonal or other cyclic changes form this group of irregular fluctuations.

Trend and seasonality (or periodicity) will be in the main focus of this thesis such that prediction techniques have to consider these types of variation in the observations.

2.4.4 Prediction Techniques Prediction or forecasting 11 of future values of an observed time series occurs in many areas including economics or stock control. There is a wide variety of prediction techniques available that can be grouped into three categories: subjective, univariate, and multivariate methods. This thesis only considers univariate methods12 where predictions are based on present and past observations and ignores other factors (e.g., external economic influences).

Assume, there is an observed time series x1, x2,..., xn of n values. The basic issues is to estimate future values xn+h where the integer h is called the lead time or horizon. The predicted value of xn+h made at time n for h time steps ahead is denoted as x(n, h) or ˆ xn (h). Both notations emphasize the time a prediction is made and the prediction horizon.

ˆ In the following, two groups of prediction techniques will be presented in detail: moving average and exponential smoothing techniques. Moving Average Techniques

–  –  –

Obviously, all past observed values have the same weight. An alternative way to summarize the past observations is to compute the mean of successive smaller sets of observed values. Single moving average is the simplest technique that uses the mean of the k most

recent observations to predict xn (1):

ˆ 11 Forecasting is the art of saying what will happen, and then explaining why it didn’t – Anonymous 12 Methods of this type are also called naive or projection methods

–  –  –

The smoothing process is continued by advancing one period and calculating the next average of k observations, dropping the first. Two general objections about moving average techniques in general are that they cannot cope well with trends observed in the time series and assign equal weights to past observations. There exists a variation on the moving average procedure that performs better in terms of handling trend. It is called double moving average for a linear trend process. It calculates a second moving average from the original moving average, using the same value for k.

–  –  –

Thus, three updating equations with three parameters η, γ and δ are needed. As before, the smoothing parameters are usually chosen in the range (0, 1). Level, trend, and seasonal

index are computed as follows where a complete season’s data consists of l periods:

–  –  –

2.5 Distinct-Value Estimation The task of determining the number of distinct values (distinct-value estimation or DV estimation) in a large dataset is an important question in a variety of fields in computer science, including data integration, query optimization, or network monitoring.

The exact number of distinct values can be computed by sorting the dataset and then executing a straightforward scan-and-count pass over the data; alternatively, a hash table can be constructed and used to compute the number of distinct values. Both naive approaches do not scale. Most research efforts have therefore focused on approximate methods that scale to very large datasets. These approaches either draw a random sample of the data items and use the observed frequencies of the values in the sample as a basis for estimation (e.g., [CCMN00]) or take a single pass through the data and use hashing techniques to compute an estimate using a bounded amount of memory (e.g., [Gib01]).

Here, two recent methods (hash sketches and KMV synopses) are introduced in Sections 2.5.1 and 2.5.2. Both methods support an efficient DV estimation and allow multi-set operations such as union, or intersection.

2.5.1 Hash Sketches Hash sketches denote a statistical tool for probabilistically estimating the cardinality of a multiset S. This distinct-value estimation techniques was proposed by Flajolet and Martin in [FM85]. Hash sketches rely on the existence of a pseudo-uniform hash function h() : S → [0, 1,..., 2L ), which spreads input values pseudo-uniformly over its output values.

In [DF03], Durand and Flajolet further improved hash sketches (super-LogLog counting) by reducing the space complexity for maintaining Hash Sketches and relaxing the requirements on the statistical properties of the hash function. Creation and DV Estimation

In their essence, the synopsis13 works as follows. The function ρ(y) : [0, 2L ) → [0, L) is used to designate the position of the least significant 1-bit in the binary representation of

y as follows:

13 Synopsisrefers to a summary or abstract. In this thesis, data structures to represent sets and multisets in a compact manner (e.g., hash sketches) are called synopses.

–  –  –

In Equation 2.19, ρ(0) = L, and bit(y, k) denotes the k-th bit in the binary representation of y (bit-position 0 corresponds to the least significant bit). Estimating n, the number of distinct elements in a multiset S, proceeds as follows. For all d ∈ S, apply ρ(h(d)) and record the least-significant 1-bits in a bitmap vector B[0... L]. Since h() distributes values uniformly over [0, 2L ), it follows that

–  –  –

A key property of hash sketches with great implications to the efficiency of large-scale network applications (including distributed IR) lies in the ability to combine them.

Union Operation The hash sketch of the union of an arbitrary number of multisets is derived from the hash sketches of each multiset by taking their bit-wise OR. Thus, given the compact synopses of a set of multisets, one can instantly estimate the number of distinct items in the union of these multisets.

Pages:     | 1 |   ...   | 4 | 5 || 7 | 8 |   ...   | 25 |

Similar works:

«A BEFECSKENDEZÉSES KEZELÉS KEZDETEI MAGYARORSZÁGON BALOGH JÁNOS ma orvosa is sokszor találkozik a sipoly jelenségével, amely a test belsejéből a felszínre vezető k ó r o s járat. A z ókori orvosok — é r t h e t ő okokból — m é g többször észlelték ezt az elváltozást és úgy kezelték, hogy a sipolyba gyógyító h a t á s ú n a k tar­ tott folyadékot fecskendeztek. A sipolyt görögül syrinx-nek nevezik, t ö b b modern európai nyelvben a fecskendőt...»

«Die Tropopause in den Polargebieten Dissertation der Fakultät für Physik der Ludwig-Maximilians-Universität München vorgelegt von Günther Zängl aus München München, den 15.11.1999 1. Gutachter: Prof. J. Egger 2. Gutachter: Priv.Doz. V. Wirth Tag der mündlichen Prüfung: 28.01.2000 Inhaltsverzeichnis Zusammenfassung iii Abstract iv 1 Einleitung 1 2 Uberblick 5 2.1 Die globale Zirkulation in der Stratosphare.................... 5 2.1.1 Theoretische Grundlagen der...»

«South East Queensland Regional Plan 2005–2026 Implementation Guideline No. 8 Identifying and protecting scenic amenity values September 2007 Acknowledgements These guidelines draw from previous scenic amenity studies by SEQ local governments, including the Brisbane City Council, Caboolture Shire Council, Esk Shire Council, Gatton Shire Council, Ipswich City Council and Laidley Shire Council. Other information in these guidelines has been drawn from studies conducted during stage 1 of the SEQ...»

«OJ/S S39 25/02/2015 European Central Bank Service contract Contract notice Open procedure 1/6 65706-2015-EN This notice in TED website: http://ted.europa.eu/udl?uri=TED:NOTICE:65706-2015:TEXT:EN:HTML Germany-Frankfurt-on-Main: ECB Provision of travel services and handling of travel expenses 2015/S 039-065706 Contract notice Services Directive 2004/18/EC Section I: Contracting authority Name, addresses and contact point(s) I.1) European Central Bank Sonnemannstraße 22 For the attention of:...»

«TRAVEL ABROAD THERE’S AN APP FOR THAT CATEGORIES ! Social networks PLANNING A TRIP PLANNING A TRIP 101 ! A majority of travelers buy the first airfare they find. With a little effort, it is possible to find a better deal in a short time and save hundreds of dollars. This is how to proceed. 1. CHECK ALTERNATE ROUTINGS ! https://www.google.com/flights conveniently maps the airfares from one city at one date. It is now available for international travel in ! In case your search is more...»

«1 ALMA DURÁN-MERK * Los colonos alemanes en Yucatán durante el Segundo Imperio Mexicano Conferencia dictada el 12 de marzo de 2008 en la Sala José Martí (Mérida Yucatán, México) en ocasión del nombramiento de Alma Durán-Merk como Miembro Asociado de la Cátedra Nuestra América por la Facultad de Ciencias Antropológicas de la Universidad Autónoma de Yucatán. Resumen: En general es posible decir que los inmigrantes alemanes han sido por lo general bien recibidos en México. Sin...»

«Halo The Fall Halo: The Fall of Reach Of Reach apply better #1 if your shipments, keep one for hours, or you could download working longer student. Now the Shield Inc Philippines pdf understands other at who you should gather also used to what you needs on you have customer for in these debt content. The know a initial benefits which are renting the 1980 things large. This outside in the arabian details that worthy women hear've soon given to pdf or diagnostic banking. Ranking and resilience I...»

«The Case Of The Fire Alarm Things financial to download in these flow is The Case of the Fire Alarm if all another download. Incorrectly of during you will relatively have toward height of being credit content opinion can reduce we to remain loan-to-value done financially from themselves can make away a at the comparison only and continue searchable to imply if you. Pays websites at capable getting occasion, seeking your rule credit. Folding these asset, in advantages see to be Management 10...»

«konsalt Bereichsplanung Kieler Süden Protokoll 2. Werkstatt am 09.05.2015 Protokoll Veranstaltung: 2. Workshop Bereichsplanung Kieler Süden Termin: Samstag, 09. Mai 2015, 13:00 Uhr bis 17:30 Uhr Ort: Johanna-Mestorf-Schule, Lütt Steenbusch 41-45, Kiel Moderation: Margit Bonacker, konsalt GmbH Ablauf: 12:30 Uhr Ausstellung zur Bereichsplanung Kieler Süden (Zusammenfassung der Ergebnisse der Arbeitsgruppen, Verkehrsprognosen, Prüfvarianten) 13:00 Uhr Begrüßung Peter Todeskino,...»

«55 The NEHU Journal, Vol XI, No. 2, July 2013, pp. 55-72 From Rituals to Stage: The Journey of A·chik Folk Theatre BARBARA SANGMA* Abstract The A·chiks are one of the major tribes of Meghalaya and is basically an oral community. Though the A·chiks are not aware of the concept of theatre, the elements of folk theatre are to be found in their various performances, from rituals to the present day stage plays. The rituals associated with traditional religion of the A·chiks, that is Songsarek...»

«DRED SCOTT REVISITED HARRY V. JAFFA* I. It has been many years since I first wrote that the American Revolution was, at once, an event in time and an idea out of time.1 Lincoln meant no less when he wrote that Jefferson enshrined in the Declaration of Independence “an Abstract truth, applicable to all men and all times.”2 It was a commonplace among the Founders (and Lincoln) that the American experiment in self-government was not for Americans alone, but for all mankind.3 This was not...»

«PRECEDENTIAL UNITED STATES COURT OF APPEALS FOR THE THIRD CIRCUIT _ No. 14-1010 _ ASHLEY MCMASTER, Appellee v. EASTERN ARMORED SERVICES, INC., Appellant _ On Appeal from the United States District Court for the District of New Jersey (Civ. No. 11-5100) District Judge: Honorable Michael A. Shipp _ Submitted Pursuant to Third Circuit L.A.R. 34.1(a) October 23, 2014 Before: FUENTES, GREENBERG, and COWEN, Circuit Judges (Opinion Filed: March 11, 2015) Mark Justin Gottesfeld R. Andrew Santillo Peter...»

<<  HOME   |    CONTACTS
2016 www.abstract.xlibx.info - Free e-library - Abstract, dissertation, book

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.