«Approximate Information Filtering in Structured Peer-to-Peer Networks Christian Zimmer Max-Planck Institute for Informatics Saarbrücken ...»
• Subscribers are interested in receiving documents and data items matching their information demand. Thus, for each continuous query, subscribers have to collect metadata from the directory to perform publisher selection. To avoid harvesting metadata too frequently, subscribers may be charged for that service. In the second step of the payment model, subscribers send the continuous query to providers to store the information demand. Under this assumption, subscribers have to pay for storing their continuous queries to prevent them from subscribing to a high number of publishers.
If a subscriber does not receive enough or receives unsatisfying notiﬁcations from a publisher, the payment ends and the continuous query is removed.
• Publishers may also need to pay to disseminate their metadata to the directory, since this improves the probability to get selected for storing a continuous query. If a publisher stores a continuous query, and publishes a new document matching this request, a notiﬁcation is send to the subscribers. Under this model, the publisher may charge on a pay-per-view basis, and the subscriber might choose or not to view
This simple payment model needs to consider some more issues, e.g., computational power restrictions. Currently, there are eﬀorts to introduce economic interactions into the MAPS architecture.
Appendix A Abbreviations Here, the appendix presents a list of abbreviations used in the present doctoral thesis. All abbreviations of table A.1 have been introduced in the previous chapters.
Appendix B Zeitgeist Queries This chapter in the appendix lists the set of used Zeitgeist queries. the list of queries from [NBMW06] is used, and distinguishes one-, two-, and three-key queries as shown in Tables B.1, B.2, and B.3. Notice that all queries are stemmed using Porter Stemmer, and stop words are removed.
Table B.2: 41 Two-Key Queries from Zeitgeist Query-Log.
celebr love island charli chocol factori red nose day Table B.3: 3 Three-Key Queries from Zeitgeist Query-Log.
Appendix C Acknowledgements Writing a doctoral thesis needs assistance and supervision. Thus, sincere thanks are given to all that helped me to make this doctoral thesis during the last years.
• First of all, I have to thank Prof. Dr.-Ing. Gerhard Weikum for his support throughout the years of my PhD. His experiences and cooperation improved many ideas. In the last years, I enjoyed the great team work with Dr. Christos Tryfonopoulos. It was a great pleasure to work together in such a friendly atmosphere. I have to say thank you to all members of the doctoral committee, Prof. Dr. Jens Dittrich, Prof. Dr.
Manolis Koubarakis, and Dr.-Ing. Ralf Schenkel, for their comments and suggestions on improvements and extensions of this work.
• Last years, I was allowed to work in a great research group. So, many thanks to all members of the Databases and Information Systems Department for the nice and international atmosphere. A special thank goes to the secretary team and the whole Max-Planck Institute for Informatics for providing a professional research infrastructure.
• Special thanks go to my family and my parents for their love and encouragement throughout all my life endeavors. Their courage in life and their strength in diﬃculties was an extra motivation for this eﬀort.
Bibliography [Abe01] Karl Aberer. P-Grid: A Self-Organizing Access Structure for P2P Information Systems. In Carlo Batini, Fausto Giunchiglia, Paolo Giorgini, and Massimo Mecella, editors, Cooperative Information Systems, 9th International Conference, CoopIS 2001, Trento, Italy, September 5-7, 2001, Proceedings, volume 2172 of Lecture Notes in Computer Science, pages 179–194. Springer, 2001.
20, 28, 85 [Ac04] Yanif Ahmad and Ugur Çetintemel. Networked Query Processing for Distributed Stream-Based Applications. In Mario A. Nascimento, M. Tamer Özsu, Donald Kossmann, Renée J. Miller, José A. Blakeley, and K. Bernhard Schiefer, editors, (e)Proceedings of the Thirtieth International Conference on Very Large Data Bases, Toronto, Canada, August 31 - September 3 2004, pages 456–467. Morgan Kaufmann, 2004. 39 [ADH05] Karl Aberer, Anwitaman Datta, and Manfred Hauswirth. Multifaceted Simultaneous Load Balancing in DHT-Based P2P Systems: A New Game with Old Balls and Bins. In Özalp Babaoglu, Márk Jelasity, Alberto Montresor, Christof Fetzer, Stefano Leonardi, Aad P. A. van Moorsel, and Maarten van Steen, editors, Self-star Properties in Complex Information Systems, Conceptual and Practical Foundations, volume 3460 of Lecture Notes in Computer Science, pages 373–391. Springer, 2005. 141 [AF00] Mehmet Altinel and Michael J. Franklin. Eﬃcient Filtering of XML Documents for Selective Dissemination of Information. In Amr El Abbadi, Michael L. Brodie, Sharma Chakravarthy, Umeshwar Dayal, Nabil Kamel, Gunter Schlageter, and Kyu-Young Whang, editors, VLDB 2000, Proceedings of 26th International Conference on Very Large Data Bases, September 10-14, 2000, Cairo, Egypt, pages 53–64. Morgan Kaufmann, 2000. 28 [AIS93] Rakesh Agrawal, Tomasz Imielinski, and Arun N. Swami. Mining Association Rules between Sets of Items in Large Databases. In Peter Buneman and Sushil Jajodia, editors, Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington, D.C., May 26-28, 1993, pages 207–216. ACM Press, 1993. 89 [AT05] Ioannis Aekaterinidis and Peter Triantaﬁllou. Internet Scale String Attribute Publish/Subscribe Data Networks. In Otthein Herzog, Hans-Jörg Schek, Norbert Fuhr, Abdur Chowdhury, and Wilfried Teiken, editors, Proceedings of the 2005 ACM CIKM International Conference on Information and Knowledge Management, Bremen, Germany, October 31 - November 5, 2005, pages 44–
51. ACM, 2005. 29, 85, 119
[BKK+ 03] Hari Balakrishnan, M. Frans Kaashoek, David R. Karger, Robert Morris, and Ion Stoica. Looking Up Data in P2P systems. Communications of the ACM (CACM), 46(2):43–48, 2003. 16 [Blo70] Burton H. Bloom. Space/Time Trade-oﬀs in Hash Coding with Allowable Errors. Communications of the ACM (CACM), 13(7):422–426, 1970. 53 [BM96] Timothy A. H. Bell and Alistair Moﬀat. The Design of a High Performance Information Filtering System. In Hans-Peter Frei, Donna Harman, Peter Schäuble, and Ross Wilkinson, editors, Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR’96, August 18-22, 1996, Zurich, Switzerland (Special Issue of the SIGIR Forum), pages 12–20. ACM, 1996. 28 [BMPC07] Matthias Bender, Sebastian Michel, Josiane Xavier Parreira, and Tom Crecelius. P2P Web Search: Make It Light, Make It Fly (Demo). In CIDR 2007, Third Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, January 7-10, 2007, Online Proceedings, pages 164–168.
www.crdrdb.org, 2007. 105 [BMS+ 06] Holger Bast, Debapriyo Majumdar, Ralf Schenkel, Martin Theobald, and Gerhard Weikum. IO-Top-k: Index-access Optimized Top-k Query Processing. In Proceedings of the 32nd International Conference on Very Large Data Bases, Seoul, Korea, September 12-15, 2006, pages 475–486. ACM, 2006. 27 [BMT+ 05a] Matthias Bender, Sebastian Michel, Peter Triantaﬁllou, Gerhard Weikum, and Christian Zimmer. Improving Collection Selection with Overlap Awareness in P2P Search Engines. In Ricardo A. Baeza-Yates, Nivio Ziviani, Gary Marchionini, Alistair Moﬀat, and John Tait, editors, SIGIR 2005: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Salvador, Brazil, August 15-19, 2005, pages 67–74. ACM, 2005. 38, 41, 42, 46, 53, 106, 108, 122, 131 [BMT+ 05b] Matthias Bender, Sebastian Michel, Peter Triantaﬁllou, Gerhard Weikum, and Christian Zimmer. MINERVA: Collaborative P2P Search. In Klemens Böhm, Christian S. Jensen, Laura M. Haas, Martin L. Kersten, Per-Åke Larson, and Beng Chin Ooi, editors, Proceedings of the 31st International Conference on Very Large Data Bases, Trondheim, Norway, August 30 - September 2, 2005, pages 1263–1266. ACM, 2005. 12, 106, 107, 122, 140 [BMTW06] Matthias Bender, Sebastian Michel, Peter Triantaﬁllou, and Gerhard Weikum. Global Document Frequency Estimation in Peer-to-Peer Web Search. In Ninth International Workshop on the Web and Databases, WebDB 2006, Chicago, Illinois, USA, June 30, 2006, 2006. 108 [BMW06] Matthias Bender, Sebastian Michel, and Gerhard Weikum. P2P Directories for Distributed Web Search: From Each According to His Ability, to Each According to His Needs. In Roger S. Barga and Xiaofang Zhou, editors, Proceedings of the 22nd International Conference on Data Engineering Workshops, ICDE 2006, 3-7 April 2006, Atlanta, GA, USA, page 51. IEEE Computer Society, 2006. 71
[CCMN00] Moses Charikar, Surajit Chaudhuri, Rajeev Motwani, and Vivek R.
Narasayya. Towards Estimation Error Guarantees for Distinct Values. In Proceedings of the Nineteenth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, May 15-17, 2000, Dallas, Texas, USA, pages 268–279. ACM, 2000. 33 [CCR04] Miguel Castro, Manuel Costa, and Antony I. T. Rowstron. Performance and Dependability of Structured Peer-to-Peer Overlays. In 2004 International Conference on Dependable Systems and Networks (DSN 2004), 28 June - 1 July 2004, Florence, Italy, Proceedings, pages 9–18. IEEE Computer Society, 2004. 143 [CDTW00] Jianjun Chen, David J. DeWitt, Feng Tian, and Yuan Wang. NiagaraCQ: A Scalable Continuous Query System for Internet Databases. In Weidong Chen and Jeﬀrey F. Naughton and Philip A. Bernstein, editor, Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, May 16-18, 2000, Dallas, Texas, USA, pages 379–390. ACM, 2000. 39 [CF03] Sirish Chandrasekaran and Michael J. Franklin. PSoup: A System for Streaming Queries Over Streaming Data. VLDB Journals, 12(2):140–156, 2003. 39 [CFGR02] Chee Yong Chan, Pascal Felber, Minos N. Garofalakis, and Rajeev Rastogi.
Eﬃcient Filtering of XML Documents with XPath Expressions. In Proceedings of the 18th International Conference on Data Engineering, 26 February - 1 March 2002, San Jose, CA, pages 235–244. IEEE Computer Society, 2002.
[CG05] Graham Cormode and Minos N. Garofalakis. Sketching Streams Through the Net: Distributed Approximate Query Tracking. In Proceedings of the 31st International Conference on Very Large Data Bases, Trondheim, Norway, August 30 - September 2, 2005, pages 373–384. ACM, 2005. 36 [CGR05] Gianna M. Del Corso, Antonio Gulli, and Francesco Romani. Ranking a Stream of News. In Allan Ellis and Tatsuya Hagino, editors, Proceedings of the 14th international conference on World Wide Web, WWW 2005, Chiba, Japan, May 10-14, 2005, pages 97–106. ACM, 2005. 28 [Cha02] Soumen Chakrabarti. Mining the Web: Discovering Knowledge from Hypertext Data. Morgan-Kauﬀman, 2002. 24, 25 [Cha04] Christopher Chatﬁeld. The Analysis of Time Series: An Introduction. CRC Press, 2004. 29, 33, 54, 131 [CIKN04] Paul-Alexandru Chirita, Stratos Idreos, Manolis Koubarakis, and Wolfgang Nejdl. Publish/subscribe for rdf-based p2p networks. In Christoph Bussler,
John Davies, Dieter Fensel, and Rudi Studer, editors, The Semantic Web:
Research and Applications, First European Semantic Web Symposium, ESWS 2004, Heraklion, Crete, Greece, May 10-12, 2004, Proceedings, volume 3053 of Lecture Notes in Computer Science, pages 182–197. Springer, 2004. 124 [CLC95] James P. Callan, Zhihong Lu, and W. Bruce Croft. Searching Distributed Collections with Inference Networks. In Edward A. Fox, Peter Ingwersen, and Raya Fidel, editors, SIGIR’95, Proceedings of the 18th Annual International
[FFS+ 01] Daniel Faensen, Lukas Faulstich, Heinz Schweppe, Annika Hinze, and Alexander Steidinger. Hermes: A Notiﬁcation Service for Digital Libraries. In ACM/IEEE Joint Conference on Digital Libraries, JCDL 2001, Roanoke, Virginia, USA, June 24-28, 2001, Proceedings, pages 373–380. ACM, 2001.
41, 119 [FJL+ 01] Françoise Fabret, Hans-Arno Jacobsen, François Llirbat, João Pereira, Kenneth A. Ross, and Dennis Shasha. Filtering Algorithms and Implementation for Very Fast Publish/Subscribe. In ACM SIGMOD Conference 2001: Santa Barbara, CA, USA, pages 115–126. ACM, 2001. 28 [FLN01] Ronald Fagin, Amnon Lotem, and Moni Naor. Optimal Aggregation Algorithms for Middleware. In Proceedings of the Twentieth ACM SIGACTSIGMOD-SIGART Symposium on Principles of Database Systems, May 21Santa Barbara, California, USA. ACM, 2001. 26 [FM85] Philippe Flajolet and G. Nigel Martin. Probabilistic Counting Algorithms for Data Base Applications. Journal of Computer and System Sciences (JCSS), 31(2):182–209, 1985. 33 [FPC+ 99] James C. French, Allison L. Powell, James P. Callan, Charles L. Viles, Travis Emmitt, Kevin J. Prey, and Yun Mou. Comparing the Performance of Database Selection Algorithms. In SIGIR ’99: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, August 15-19, 1999, Berkeley, CA, USA, pages 238–