«Abstract. In a sensor network information from multiple nodes must usually be aggregated in order to accomplish a certain task. A natural way to view ...»
Distance-Sensitive Information Brokerage
in Sensor Networks
Stefan Funke1 Leonidas J. Guibas1 An Nguyen1 Yusu Wang2
Department of Computer Science
Stanford University, Stanford, CA 94305
Department of Computer Science and Engineering
The Ohio State University, Columbus, OH 43210
Abstract. In a sensor network information from multiple nodes must
usually be aggregated in order to accomplish a certain task. A natural
way to view this information gathering is in terms of interactions between nodes that are producers of information, e.g., those that have collected data, detected events, etc., and nodes that are consumers of information, i.e., nodes that seek data or events of certain types. Our overall goal in this paper is to construct eﬃcient schemes allowing consumer and producer nodes to discover each other so that the desired information can be delivered quickly to those who seek it. Here, eﬃciency means both limiting the redundancy of where producer information is stored, as well as bounding the consumer query times. We introduce the notion of distance-sensitive information brokerage and provide schemes for eﬃciently bringing together information producers and consumers at a cost proportional to the separation between them — even though neither the consumers nor the producers know about each other beforehand.
Our brokerage scheme is generic and can be implemented on top of several hierarchical routing schemes that have been proposed in the past, provided that they are augmented with certain key sideway links. For such augmented hierarchical routing schemes we provide a rigorous theoretical performance analysis, which further allows us to prove worst case query times and storage requirements for our information brokerage scheme. Experimental results demonstrate that the practical performance of the proposed approaches far exceeds their theoretical (worstcase) bounds. The presented algorithms rely purely on the topology of the communication graph of the sensor network and do not require any geographic location information.
1 Introduction Early sensor networks were primarily data acquisition systems, where the information collected by the sensor nodes was to be aggregated and routed to a central base station. Newer generations of sensor networks, however, act more as peer-to-peer systems, where arbitrary nodes in the network may wish to collect information about measurements and events elsewhere in the network. Furthermore, the information needed may be quite speciﬁc, with only a very small amount of sensor data being relevant for any particular query. This peer-to-peer view is necessitated as sensor networks expand to serve multiple geographically dispersed users, as more powerful mobile nodes move through a static sensor network and use it as a communication backbone to issue queries and collect data, or even to process complex queries, where sensor nodes may ﬁnd it necessary to issue sub-queries themselves. The basic problem this creates is that of information brokerage: how producers of information, e.g., nodes that have collected data, detected events, etc., and consumers of information, i.e., nodes that seek data or events of certain types, can ﬁnd out about each other and exchange the desired information.
Information brokerage is closely coupled with node naming and routing: even if we know the exact location of the information we want in the network, we still need to discover a good path for its retrieval. As sensor networks scale to larger sizes, the issue of information locality becomes more important. It is natural to expect that a consumer will be more interested in data collected near its current location. This is because such data can be accessed at lower communication cost/delay, and because in almost all sensor network applications local information has higher value and relevance to the task at hand than remote information.
The main problem studied in this paper is what we call distance-sensitive information brokerage — information brokerage where the cost for a consumer node to discover a certain piece of information is proportional to the network distance to where that information was collected. We want to have this property of distance sensitivity even though neither the consumer node nor the producer node involved in the information exchange know directly anything about the location of the other node in the network. Current common information brokerage schemes do not have this property. For example, directed diﬀusion  performs ﬂooding and thus in a 2-D network will visit O(d2 ) nodes to reach a distance d from a consumer (sink), while geographic hash tables (GHT)  may hash the information quite far away from a nearby producer-consumer pair.
Our goal in this paper is to develop a uniﬁed framework for routing and information brokerage which can provide provable guarantees for both good (almostoptimal) routing paths and distance-sensitive information brokerage. Note that in all producer-consumer brokerage schemes there is a trade-oﬀ between the time and space eﬀort for information diﬀusion when producer nodes record new data and have new detections, vs. the query time of consumer nodes to discover this information. In this work we aim at minimizing the amount of work/storage that producers have to invest so that they can be discovered within the consumers’ budget, that is, to be distance-sensitive. We focus on the static case typical for sensor networks where nodes do not move during the lifetime of the network, though links may fail or nodes may die. We also assume that, at any one time, only a small fraction of the network nodes have information to be made available to the network (such as sightings/measurements of rather exceptional events).
Related work. Hierarchies for addressing and routing within networks have been used for a long time and form the basis of the standard TCP/IP protocols. The basic idea is to deﬁne a tree-like hierarchy of node clusters, based on the inter-node distances in the network. This tree structure is then used to derive unique addresses for all nodes, based on which local routing schemes can be developed. Many previous hierarchical approaches have designated nodes as gateways to route between clusters [1, 22], causing unbalanced node traﬃc, as well as making routing sensitive to gateway node failures. Furthermore, since hierarchies partition the network, there can be nearby nodes that end up in different clusters even at the top level, causing the routing paths to be possibly much longer than the true shortest paths, violating distance sensitivity. Solutions have been proposed for this diﬃculty by introducing cross-branch links in the clustering hierarchy by Tsuchiya  and others, and empirical studies have established their eﬀectiveness. For the case where the underlying communication graph has constant doubling dimension , it has been shown very recently in  (although in a diﬀerent context) that paths of bounded dilation can be achieved. Chan et al.  present a diﬀerent hierarchical framework to achieve slightly better results, but their construction is based on a probabilistic argument and a non-trivial derandomization thereof.
Since sensor networks are embedded in the physical world, several routing schemes attempt to exploit naming and routing using this host space to gain eﬃciency and avoid expensive preprocessing. For example, geographic routing schemes [2, 15, 16] name nodes by their geographic coordinates and provide local rules for forwarding messages towards their target. Such schemes may have problems in the presence of holes, in which case packets might get stuck in local minima of the distance function to the target. Protocols like GPSR () or the one proposed by Bose et. al.  have been developed to alleviate these problems, and indeed they can guarantee delivery of the packets by performing a perimeter routing step around network holes, after an appropriate planarization of the network graph. Still, the paths might be considerably worse than the true shortest paths. In particular, Kuhn et. al. in  show that if the shortest path has distance d, any geographic routing algorithm might produce a path of length Ω(d2 ). Furthermore, in many situations, it is challenging to obtain geographic coordinates. Various approaches have been developed for cases in which either a few nodes  or no nodes [6, 21, 23] are aware of the geographic positions. However, a robust routing scheme with proven guaranteed performance for sensor networks with arbitrary underlying topology is still missing.
At the same time, in the past few years, sensor networks have started to serve more as information processing mechanisms instead of simply as data collection tools [7, 14, 18, 25, 29], requiring more sophisticated operations, such as data aggregation and range queries. From this point of view, the location of the sensor that owns information becomes less important than the information itself. This explains the rise of various data-centric information storage and retrieval schemes . A representative is the Geographic Hashing Table (GHT) approach , where each type σ of information (like the measured occurrence of substance A) is mapped to a speciﬁc node v by using a geographic hash function which depends only on the information type σ and is commonly known to every node in the network. Upon detection of A, a producer sends a message to node v, indicating its possession of some data of type σ. Any consumer can then obtain the data by ﬁrst visiting v to ﬁnd out who owns it and then retrieving it from the owner (many variations are possible). This node v, however, might be far away from both the producer and the consumer even when the producer and the consumer are very close in the network. The problem can be partly alleviated by the GLS  approach (originally proposed for providing location services on mobile nodes), where a producer performs an information diﬀusion process by sending a message to a list of server nodes determined by its location and the data type. A consumer can then retrieve this data in time proportional to the distance to the producer in the underlying hierarchy, which in the case of GLS is a positional quad-tree. But since — as in the case of hierarchy-based routing strategies — nearby nodes might be far away in the hierarchy, the consumer still does not experience distance-sensitive query times. Furthermore this assumes the availability of geographic location information and an auxiliary ID server structure that has to be precomputed to allow for information brokerage3. Very recently, an approach  combining Geographic Hash Tables with landmark-based routing via the GLIDER  scheme has been proposed. There are also several other approaches focusing on data aggregation, multi-resolutional storage, or database-like queries [10, 8, 20]. They all suﬀer from the above two problems, however.
Closest to our approach is the work of , where the authors developed a location/address lookup service called L+ for mobile nodes in a landmarkbased routing scenario aiming at distance sensitivity in the queries, that is, the time required to lookup the address of a target node should be proportional to its distance. Their construction could also be extended to an information brokerage scheme (though they did not do so). Their simulation results show the eﬀectiveness of the approach, even in a mobile scenario, but no rigorous theoretical analysis was conducted, which this paper aims to provide.
Our contributions. In this paper we analyze augmented hierarchical decomposition schemes for routing and information brokerage built on only the topology of the communication graph. While such routing hierarchies have been around for many years and a related theoretical analysis has been very recently found in a diﬀerent context , this paper is the ﬁrst to present a uniﬁed generic framework for both routing and information brokerage, and this framework as well as the accompanying analysis can be applied to various implementations of hierarchical decompositions (such as those in [1, 9, 22]).
Our framework can be applied to networks with arbitrary communication graphs and guarantees distance-sensitivity of both routing as well as information brokerage. If the shortest path metric of the communication graph has bounded doubling dimension , we can guarantee that the routing tables that The GLS paper did not address the information brokerage application, though the ID service presented there, which provides a mapping of unique node IDs to geographic coordinates, can be seen as a special case of information brokerage.
need to be installed at every node have size only O(log n) bits. A metric space has bounded doubling dimension if any ball with radius R in the metric space can be covered by a constant number of balls with radius R/2. In practice, sensor networks frequently experience low-level link and node volatility. We show through simulations that our routing protocol is robust: it performs gracefully against the failure of a small fraction of network nodes, due to the absence of any backbone or hub structure.
Our information brokerage scheme is built on top of the presented routing structure at no extra overhead. To our knowledge, this is the ﬁrst approach that works for arbitrary communication graphs and has theoretical performance guarantees in terms of the eﬀort on both the producer and consumer sides. In particular, after the producer stores references to its data at a small number (O(log n) in case of metrics of bounded doubling dimension) of nodes (in a multi-resolution manner), any consumer can retrieve it in a distance-sensitive way. Furthermore, by visiting only O(d) nodes, the consumer can collect all occurrences of a particular type of data within a neighborhood of radius d.
This kind of range query 4 can be useful for implementing eﬃcient in-network processing and data aggregation. We are not aware of any other scheme that eﬃciently supports this type of query. Our information brokerage scheme inherits both load-balanced information diﬀusion and robustness against node failures from the employed routing scheme. All these results are backed by an evaluation in simulation which indicates that the practical performance of the analyzed schemes is signiﬁcantly better than their theoretical guarantees.