FREE ELECTRONIC LIBRARY - Abstract, dissertation, book

Pages:     | 1 |   ...   | 17 | 18 || 20 | 21 |   ...   | 25 |

«Approximate Information Filtering in Structured Peer-to-Peer Networks Christian Zimmer Max-Planck Institute for Informatics Saarbrücken ...»

-- [ Page 19 ] --

- 118 Chapter 6 Prototype Implementation 6.4.2 P2P Filtering Prototypes Scribe [RKCD01] is a large-scale event notification infrastructure for topic-based publishsubscribe applications. It supports large numbers of topics, with a potentially large number of subscribers per topic. Scribe is built on top of Pastry [RD01a], and leverages Pastry’s reliability, self-organization and locality properties. Pastry is used to create a topic (group) and to build an efficient multi-cast tree for the dissemination of events to the topic’s subscribers. Hermes [FFS+ 01] is similar to Scribe because it uses the same underlying DHT but it allows more expressive subscriptions by supporting the notion of an event type with attributes. Hermes offers sophisticated filtering capabilities preventing the user from notications about non-interesting events. From the user’s point of view Hermes integrates the providers into a single source. Its simple provider interface makes it easy for publishers to join the service and thus reaching the potential readers directly.

The pFilter system [TX03] is a global-scale decentralized information filtering and dissemination system for unstructured documents that connects potentially millions of computers in national (and international) computing Grids or ordinary desktops into a structured P2P overlay network. The pFilter system uses a hierarchical extension of the CAN DHT to filter unstructured documents and relies on multi-cast trees to notify subscribers. VSM and LSI can be used to match documents to user queries. The DHTStrings system [AT05] utilizes a DHT-agnostic architecture to develop algorithms for efficient multi-dimensional event processing. It addresses the issue of supporting efficiently queries over string-attributes involving prefix, suffix, containment, and equality operators in large-scale data networks.

PeerCQ is a totally decentralized system that performs information monitoring tasks over a network of peers with heterogeneous capabilities. It uses Continual Queries (CQs) as its primitives to express information monitoring requests. A primary objective of the PeerCQ system is to build a decentralized Internet-scale distributed information monitoring system, which is highly scalable, self-configurable and supports efficient and robust CQ processing.

PeerCQ’s most important contribution is that it takes into account peer heterogeneity and extends consistent hashing with simple load-balancing techniques based on appropriate assignment of peer identifiers to network peers.

P2P-DIET [IKT04a] utilizes an expressive query language based on IR concepts and is implemented as an unstructured P2P network with routing techniques based on shortest paths and minimum weight spanning trees. P2P-DIET has been implemented on top of the open source DIET Agents Platform 4 [HWBM02] and combines ad-hoc querying as found in other super-peer networks and also as proposed in the DIAS system [KKTR02].

Finally, the AGILE system [DFK05] presents Context-aware Information Filters (CIF ) using two input streams with messages and context updates. The system extends existing index structures such that the indexes adapt to message/update workload with satisfying performance results.

6.5 Discussion This chapter presented the current prototype implementation of the MAPS approximate information filtering approach. The prototype extends the Minerva search system with additional components to realize filtering functionality. An extensive showcase illustrated the usage of the extended prototype system. The following chapter will present a use case for digital libraries that supports retrieval and filtering functionality under a single unifying framework.

4 http://diet-agents.sourceforge.net/

–  –  –

Chapter 7 Digital Library Use Case This chapter presents a digital library use case using the MinervaDL architecture. MinervaDL is build upon a two-tier version of the MAPS architecture and designed to support approximate retrieval and filtering functionality under a single unifying framework. The architecture of MinervaDL as introduced in [ZTW07] is able to handle huge amounts of data provided by digital libraries in a distributed and self-organizing manner. The super-peer architecture and the use of the distributed hash table as the routing substrate provides an infrastructure for creating large networks of digital libraries with minimal administration costs.

Section 7.1 introduces the main characteristics of this DL use case, and discusses some related work in this area.

The high-level DL architecture including the involved components is presented in Section 7.2, whereas Section 7.3 explains in detail the appropriate protocols to ensure the two functionalities (retrieval and filtering). Section 7.4 discusses the two different scoring functions to ensure approximate search and filtering, and Section

7.5 presents the experimental evaluation of MinervaDL. Finally, Section 7.6 concludes this chapter.

7.1 Introduction This chapter presents a novel DL architecture called MinervaDL; it is designed to support approximate information retrieval and filtering functionality in a single super-peer-based architecture. In contrast to the MAPS architecture presented in the previous chapters, MinervaDL is hierarchical like the ones in [TIK05a, LC03, SMwW+ 03, TZWK07, RPTW08] and utilizes a DHT to achieve scalability, fault-tolerance, and robustness in its routing layer. The MinervaDL architecture allows handling huge amounts of data provided by DLs in a distributed and self-organizing way, and provides an infrastructure for creating large networks of digital libraries with minimal administration costs. There are two kinds of

basic functionality that are offered in MinervaDL:

• Information Retrieval : In an information retrieval scenario (also known as one-time querying), a user poses an one-time query and the system returns all resources matching the query (e.g., all currently available documents relevant to the requested query).

• Information Filtering: In an information filtering scenario (also known as publish/subscribe or continuous querying or selective dissemination of information), a user submits a continuous query (or subscription or profile) and will later be notified from the system about certain events of interest that take place (i.e., about newly published documents relevant to the continuous query).

–  –  –

Figure 7.1: DHT-based Distributed Directory to Perform IR and IF.

The proposed DL architecture is built upon a distributed directory similar to [BMT+ 05b, BMT+ 05a] that stores metadata. Routing protocols for information filtering and retrieval use this directory information to perform the two functionalities. Figure 7.1 shows this design principle. MinervaDL identifies three main components: super-peers, providers, and consumers. Providers are implemented by information sources (e.g., digital libraries) that want to expose their content to the rest of the MinervaDL network, while consumers are utilized by users to query for and subscribe to new content. Super-peers utilize the Chord DHT [SMLN+ 03] to create a conceptually global, but physically distributed directory that manages aggregated statistical information about each provider’s local knowledge in compact form. This distributed directory allows information consumers to collect statistics about information sources and rank them according to the probability to answer a specific information need. This reduces network costs and enhances scalability since only the most relevant information sources are queried. In MinervaDL, both publications and (one-time and continuous) queries are interpreted using the vector space model (VSM), but other appropriate data models and languages could also be used, e.g., latent semantic indexing (LSI) or language models.

7.1.1 A Motivating Example To give an better understanding of the potential benefits of a system that integrates both IR and IF functionality in the DL context, consider the example of John, a professor in computer science, who is interested in constraint programming. He wants to follow the work of prominent researchers in this area. He regularly uses the digital library of his department and a handful of other digital libraries to search for new papers in the area.

Even though searching for interesting papers this week turned up nothing, a search next week may turn up new information. Clearly, John would benefit from accessing a system that is able to not only provide a search functionality that integrates a big number of sources (e.g., organizational digital libraries or even libraries from big publishing houses), but also capture his long term information need (e.g., in the spirit of [TIK05a, PB02, YJ06]).

- 122 Chapter 7 Digital Library Use Case This system would be a valuable tool, beyond anything supported in current digital library systems, that would allow John to save time and effort. In the example scenario, the university John works in is comprised of three geographically distributed campuses (Literature, Sciences, and Biology) and each campus has its own local digital library. In the context of MinervaDL, each campus would maintain its own super-peer, which provides an access point for the provider representing the campus’ digital library, and the clients deployed by users such as John. Other super-peers may also be deployed by larger institutions, like research centers or content providers (e.g., CiteSeer, ACM, Springer, or Elsevier), to provide access points for their users (students, faculty or employees) and make the contents of their digital libraries available in a timely way. MinervaDL offers an infrastructure, based on concepts of P2P systems, for organizing the super-peers in a scalable, efficient and self-organizing architecture. This architecture allows seamless integration of information sources, enhances fault-tolerance, and requires minimum administration costs.

7.1.2 The Evolution of Digital Libraries In [GT02], a digital library is defined as follows: A digital library is a library in which collections are stored in digital formats (as opposed to print, microform, or other media) and accessible by computers. The digital content may be stored locally, or accessed remotely via computer networks. The term digital library was used the first time in 1988 in a report to the Corporation for National Research Initiatives. The older names electronic library or virtual library are also occasionally used, though electronic library nowadays more often refers to portals, often provided by government or public agencies (e.g., the Florida Electronic Library). Digital libraries (DLs) have been made possible, due to the integration and the use of a number of information technologies, the availability of digital content on a global scale, and a strong demand for users who are now online. They are destined to become an essential part of the information infrastructure in the 21st century (Information Age also known as Digital Age or Wireless Age).

The term digital library can be applied to a wide range of collections and organizations, but, to be considered a digital library, an online collection of information must be managed by and made accessible to a community of users. A lot of known digital libraries are older than the Web (e.g., Project Gutenberg). But, as a result of the development of the Web and its search potential, digital libraries are now moving towards Web-based environments.

Often, there is a distinction between content (created in a digital format), known as borndigital, and information (converted from a physical medium), e.g., paper, by digitizing.

Most digital libraries provide a search interface to locate resources (typically Deep Web resources that cannot be located by Web search engines). Some digital libraries create special pages or sitemaps to allow search engines to find all their resources. Digital libraries frequently use the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) [LdS01] to expose their metadata to other digital libraries, and search engines like Google,

and Yahoo. There are two general strategies for searching a federation of digital libraries:

(i) distributed searching, and (ii) searching previously harvested metadata.

Distributed searching typically involves a client sending multiple search requests in parallel to a number of servers in the federation. The results are gathered, duplicates are eliminated or clustered, and the remaining items are sorted and presented back to the client. Protocols like Z39.50 are frequently used in distributed searching. A benefit to this approach is that the resource-intensive tasks of indexing and storage are left to the respective servers in the federation. A drawback to this approach is that the search mechanism is limited by the different indexing and ranking capabilities of each database, making it difficult to assemble a combined result consisting of the most relevant found items.

- 123 Chapter 7 Digital Library Use Case

Searching over previously harvested metadata involves searching a locally stored index of information that has previously been collected from the libraries in the federation. When a search is performed, the search mechanism does not need to make connections with the digital libraries it is searching - it already has a local representation of the information. This approach requires the creation of an indexing and harvesting mechanism which operates regularly; it connects to all the digital libraries and queries the whole collection in order to discover new and updated resources. OAI-PMH is frequently used by digital libraries for allowing metadata to be harvested. A benefit to this approach is that the search mechanism has full control over indexing and ranking algorithms, possibly allowing more consistent results. A drawback is that harvesting and indexing systems are more resource-intensive and therefore expensive. Now, large scale digitization projects are underway (e.g., Google Books). With continued improvements in book handling and presentation technologies, and development of alternative depositories and business models, digital libraries are rapidly growing in popularity. Beyond that, libraries have just ventured into audio and video collections (e.g., digital libraries such as the Internet Archive).

Pages:     | 1 |   ...   | 17 | 18 || 20 | 21 |   ...   | 25 |

Similar works:

«ФИО: Макаров Владимир Анатольевич Ученая степень: доктор физико-математических наук по специальности 01.04.21 – лазерная физика Ученое звание: профессор Место работы: ФГБОУ Высшего образования «Московский государственный университет имени М.В.Ломоносова» Занимаемая...»

«AAnglistik Universität Trier Anglistik VeranstaltungskommentarWS 2007/08 A. Sprachwissenschaft Vorlesungen 2301 Semantics and Pragmatics Stubbs Vorlesung, Mo. 14:15 15:45 Beschreibung: In this course of lectures I will discuss major concepts in recent semantics and pragmatics (e.g. reference, denotation, connotation, vague language). I will discuss some major approaches rather briefly: structural semantics and semantic field theory (e.g. work by John Lyons and others from the 1960s onwards),...»

«StrålevernRapport • 2013:4 Вопросы обеспечения ядерной и радиационной безопасности при утилизации радиационных объектов, выведенных из состава военно-морского флота Российской Федерации Reference: Sneve M K, Roudak S, et. al. Dismantlement of nuclear facilities decommissioned from the Russian navy: Enhancing regulatory supervision of nuclear and...»

«Creditreform Rating-Summary zum Unternehmensrating Informationstableau Rating: PD 1-jährig: Karlsberg Brauerei GmbH BB 1,05% Erstellt am: 29.6.2015 Gültig bis: 28.6.2016 Creditreform ID: 7290466575 Ulrich Grundmann, Geschäftsführer (Vertrieb) Mitarbeiter: 170,9 Mio. € Dr. Hans-Georg Eils, Geschäftsführer (Technik) Geschäftsleitung: (2014) Ralph Breuling, Gesamtprokurist (Finanzen) Umsatz: 148,4 Mio. € (2013) Betrieb einer Brauerei, Herstellung und Vertrieb von alkoholhaltigem und...»

«offen-siv 4-2015 ! Zeitschrift für Sozialismus und Frieden 4/2015 Spendenempfehlung: 3,00 € Ausgabe Juli-August offen-siv 4-2015 Grover!Furr:! Chruschtschows*Lügen* Chruschtschow und die Entstalinisierung. Dass Stalin den Weltkrieg am Globus plante, haben schon seine Generäle widerlegt, ohne je viel Gehör zu erhalten. Dass jedoch in Chruschtschows berühmter »Geheimrede« von 1956 nicht nur diese Kolportage, sondern wortwörtlich JEDE Aussage gelogen ist und sich mit verifizierten...»

«Selektive Lasertrabekuloplastik U.-P. Best1, H. Domack1, H.-J. Hofstetter2, V. Schmidt1 Zusammenfassung: Die Selektive Lasertrabekuloplastik SLT ist eine Weiterentwicklung der Argonlasertrabekuloplastik ALT und zählt als eine potentiell früh anwendbare Lasertherapie. Die nicht-thermische Technik der SLT ist praktisch nebenwirkungsfrei und vermeidet die Komplikationen einer medikamentösen Behandlung. Für die frühen Glaukomformen stellt die SLT eine Alternative zur Tropfentherapie dar, für...»

«The Berkeley linguistic archives: Archives as community resources Andrew Garrett (garrett@berkeley.edu) with Leanne Hinton and Mark Kaiser University of California, Berkeley [ these slides: socrates.berkeley.edu/~garrett/lsa2008.pdf ] The Berkeley linguistic archives: Archives as community resources 1. Introduction 2. The Berkeley language archives 3. Revival projects 4. Integrated resources 5. Conclusions Archives: Intro • Berkeley • Revival projects • Integrated resources • End Two...»

«You Wouldn T Want To Travel With Captain Cook A Voyage You D Rather Not Make If home comfort still would was a consequence as card able proposal. Who the manner is that the property what is been the contrasting mix often throws these credit to be failures about the fortune where and though many players figure very written also in a involvement you may make You Wouldn't Want to Travel with Captain Cook!: A Voyage You'd Rather Not Make enterprisewide question. Of groups a marketing with hands can...»

«Document downloaded from: http://hdl.handle.net/10251/36621 This paper must be cited as: Adler, BT.; Alfaro, LD.; Mola Velasco, SM.; Rosso, P.; West, AG. (2011). Wikipedia vandalism detection: combining natural language, metadata, and reputation features. En Computational Linguistics and Intelligent Text Processing. Springer Verlag (Germany). 6609:277-288. doi:10.1007/978-3-642-19437-5_23. The final publication is available at http://link.springer.com/chapter/10.1007/978-3-642-19437-5_23...»

«Eduard Mörike: Gebet von Karl Heinz Weiers Herr! schicke was du willt, Ein Liebes oder Leides; Ich bin vergnügt, daß beides Aus deinen Händen quillt. Wollest mit Freuden Und wollest mit Leiden Mich nicht überschütten! Doch in der Mitten Liegt holdes Bescheiden. Gebete sind im allgemeinen Dankesworte, Lobpreisungen oder Bitten von nicht allzu persönlicher, nicht allzu privater Natur. Nur so finden sie Eingang in das Leben einer Kirchengemeinde und werden, sollte ihr Text komponiert...»

«                  Jahresbericht www.ikgb.de Jahresbericht 2014 www.ikgb.de Inhaltsverzeichnis   1 Allgemeine Informationen  1.1 Organisationsform  1.2 Institutsgeschichte .1.3 Personal 1.4 Geräte und Anlagenausstattung  2 Lehre  2.1 Statistischer Teil  2.2 Rahmenbedingungen für Lehre und Studium  2.3 Studienwerbung  2.4 Stipendiaten/Gaststudenten im IKGB  2.5 Gastwissenschaftler im IKGB  2.6 Exkursionen  3 Professur für Keramik 3.1 Publikationen  3.2...»

«Adick, Christel Modern Education in Non-Western Societies in the Light of the World Systems Approach in Comparative Education Internationale Zeitschrift für Erziehungswissenschaft/International Review of Education/Revue Internationale de Pédagogie 38 (1992) 3, S. 241-255 urn:nbn:de:0111-opus-18235 Nutzungsbedingungen pedocs gewährt ein nicht exklusives, nicht übertragbares, persönliches und beschränktes Recht auf Nutzung dieses Dokuments. Dieses Dokument ist ausschließlich für den...»

<<  HOME   |    CONTACTS
2016 www.abstract.xlibx.info - Free e-library - Abstract, dissertation, book

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.