# «Abstract. In a sensor network information from multiple nodes must usually be aggregated in order to accomplish a certain task. A natural way to view ...»

In other words, by visiting the information server µ(σ, L(w, d)) directly, a consumer w is guaranteed to collect all sources that have information about data item σ within roughly distance 2d to itself. If the consumer only wants to know the number of such sources (range counting), it can simply return. Otherwise, if it wishes to report all such data (range reporting), it can then route to each of these sources and fetch the data.

Hash function µ(σ, L) We still have to deﬁne the hash function µ(σ, L) that maps any given data item σ to an information server in a particular cluster L.

Ideally, in order to have a good load balance, this function should distribute the information servers uniformly to all nodes contained in L for various σ ∈ Σ.

One possible choice of this function is deployed in the GLS approach, where each sensor node in the sensor network has a unique id (an integer). Given an data item σ ∈ Σ, assume there is a function that maps σ to one of this id, say IDσ, randomly. The information server µ(σ, L) is then deﬁned as the node s ∈ L with smallest id that is greater than IDσ. However, in order to identify and reach node s, one has to build extra structure (which needs about O(h log n) bits per-node memory) on top of whatever routing structure exploited.

A simpler way to deﬁne µ(σ, L) is to use σ as the seed of some pseudorandom function and traverse the HD tree downward randomly according to that function. µ(σ, L) is then deﬁned as the leaf node reached in the process.

Note that this deﬁnition of µ(σ, L) also gives a routing path to access it. We can ﬁrst route to any node in L, using σ to determine the pseudo-random subcluster L of L that contains µ(σ, L), then recursively route toward µ(σ, L ) = µ(σ, L).

The number of hops to route from any node in L to µ(σ, L) is obviously bounded d by i=1 α · 2i ≤ α · 2d+1.

Robustness The routing components of our information brokerage scheme inherit the robustness properties of the routing scheme; in particular, messages that are unable to make progress due to node or link failures can recover by simple local rules and be eventually forwarded to their destinations. Furthermore, recovery after failure of an information server is possible by querying the information server one level higher, incurring only a constant factor overhead.

3 Experimental Results We implemented the discrete center hierarchy in java to experimentally evaluate the performance of our proposed schemes. Currently our implementation simulates the network at the graph level only. While it does not mimic network attributes like packet loss or delay, we are quite conﬁdent that the results reported here give a good indication about the usefulness of our approach in practical scenarios.

3.1 Data Generation and Implemented Algorithms All our measurements were carried out on a unit disk graph of nodes that were spread uniformly at random in a unit square, and subsequently nodes may have been removed to simulate the presence of large holes in the network topology.

Note that this “unit disk graph” is not required in our approach—our system can take an arbitrary graph as input. We also want to emphasize that in this model, as long as the average node degree is below about 10, a large fraction of the unit disk graph is not connected, and the dilation factor is rather large, i.e. the shortest path in the graph between two nodes is much larger than their Euclidean distance. This is quite diﬀerent in the ‘skewed grid’ model where the node positions are determined by randomly perturbing points on a grid by some rather small amount. There the unit-disk graph is almost always connected for any grid-width slightly smaller than 1 (which corresponds to an average degree of 4 or more) and the geometric dilation is very small. See Figure 2 for an example.

While our scheme performs much better in the skewed grid model as compared to the random model, we feel the latter provide more insight on realistic scenarios, and present all our results in this random model.

Fig. 2: Random sensor ﬁeld (left) has degree 6.26 and has many holes, some of which are quite large. Skewed grid ﬁeld (right) has lower degree, 5.18, yet is much more regular.

In the following subsections, we compare our routing scheme with GPSR, and our information brokerage scheme with GHT by assuming that the geographic locations of sensors are known (although not used in our approach).

The underlying planar graph for GPSR is the Gabriel graph.

**3.2 Evaluation of Routing Strategies**

Routing quality. In the ﬁrst experiment we evaluate the quality of the paths produced by our routing scheme. We ﬁx a sensor ﬁeld of 2000 nodes and vary the average degree from roughly 6 to 12. We then select at random 1000 pairs of nodes from the largest connected component of the sensor ﬁeld. We compute the paths between the nodes in these pairs using the HD as well as using GPSR.

For each of such path, its quality is deﬁned as the ratio between its length and the shortest distance between its two endpoints. We show in Table 1 (a) the average and standard deviation of the quality of these 1000 paths. Note that our HD routing scheme always produces near-optimal paths regardless of the node density. The practical constant is much better than the worst case bound of 4 we could prove.

Fig. 3: (a) The storage required growths slowly when the network becomes large. (b) The success rate of routing vs. node depletion for sensor ﬁelds with various average degree. (c) Number of hops to information server using HD and number of hops in a shortest path to an ideally random information server.

We note that the cost of initializing the network, i.e., the number of messages sent to establish our hierarchical decomposition, is directly proportional to the storage at each node. As the storage cost is low, the cost of network initialization is also low.

Hot spots. Even though our implementation of the HD uses cluster heads, they are not special nodes in the network In a typical route, the moment a package heading towards a cluster reaches any of its nodes, the package starts heading towards a diﬀerent cluster. So these cluster heads do not form a backbone structure, nor do they create bottlenecks in network traﬃc. Figure 4 (c) gives an example where two routes with nearby sources and destination nodes stay separate during their course. On the other hand, when large holes are present in the network, nodes close to holes will naturally have a heavier burden, as our HD paths approximate the shortest paths well. Still, our paths do not hug the holes in the sensor net as tightly as GPSR paths do, as shown in Figure 4 (a) and (b) for a sensor ﬁeld with 2000 nodes and average degree 9.5. Our HD scheme produces many fewer higher load nodes (larger dots in the picture).

Robustness. To measure the performance of our routing scheme under node depletions, we start with a graph with average degree of 7, 8, 9, or 12, build the HD routing structure on top of it, and then randomly remove a small percentage of sensors (from 2% to 20%) from the sensor ﬁeld. We then pick 1000 pairs of live nodes at random, and show the success rate of routing between these pairs in Figure 3 (b). During the routing process, if a node ﬁnds that the next sensor on its shortest path to some cluster L is dead, it locally ﬂoods a neighborhood of nodes at most 5 hops away from itself to ﬁnd a node with smaller distance to (a) (b) (c) Fig. 4: Hotspots comparisons for (a) GPSR and (b) HD scheme. Larger dots are nodes with higher traﬃc loads. In (c), two paths generated by HD scheme with nearby sources and destinations.

L. The result shows that the performance is gracefully degraded when the node failure rate increases.

3.3 Evaluation of the Information Brokerage Scheme Brokerage quality. The eﬃciency of a brokerage system includes both the number of messages (i.e., # information servers) that a producer needs to replicate, and the number of hops that a consumer needs to access before locating the data it needs (i.e., the query time). In Table 1 (b), we vary the size of the sensor ﬁeld from 200 to 10, 000 nodes. For each sensor ﬁeld, we randomly choose 1, 000 producer/consumer pairs, with each pair producing/requesting a random data item. Columns 2,3 show the average number and standard deviation of information servers for each producer. Columns 4 and 5 show the quality of the path from the consumer to the information server, deﬁned as the ratio between query time using our scheme (i.e., the path length to the respective information server) and the shortest hop distance between the producer and the consumer.

We see from the table that this ratio is always close to 1.0 (in fact, in most cases it is smaller than 1.0, since the information server can be even closer than the producer). If the data is large, the producers may not replicate their data and only leave their addresses at the information servers. In that case, a consumer after locating the desired information server must further route to the producers to get the data itself. The quality of the brokerage path is then deﬁned as the ratio between the path length from the consumer to the producer obtained using our scheme over the shortest path length between the consumer and producer.

Column 6 and 7 in Table 1 (b) show this ratio in our experiment. In all cases, it is always small, around 2.0.

While the number of replications used by a particular producer is higher in our system than in the GHT approach, the query time can be much smaller, especially when the consumer is closer to the producer. This phenomenon is illustrated in Figure 3 (c), where we compare the query time in our scheme (lower curve) with the length of the shortest path to an ideally random information server (upper curve). Note that the latter is in fact a lower bound of the query time for any scheme using GHT for information brokerage. The query time for GHT using GPSR as the underlying routing scheme may be much longer than this shortest path, due to the path quality returned by GPSR. In short, our system is attractive for scenarios where there are multiple queries for the same data, as the overhead for the producer is then amortized.

It is also important to keep the distribution of information servers for diﬀerent data items as uniform as possible. To test this, we let each sensor in the network produce a diﬀerent data item, and record for each node the number of times that it serves as an information server for some data item. The results are in Table 2.

The distribution of information servers observed is reasonably good compared to a distribution obtained by a centralized uniformly random hash function.

Table 2: The average/standard deviation of the number of times that a sensor serves as an information server for some data item for various network sizes.

Approximate range counting. One important application of our information brokerage system is for approximate range counting, such as reporting all horses detected within some distance r from a particular sensor. When r is quite small, ﬂooding is simple and eﬀective. However, the number of nodes accessed in the ﬂooding approach increases quadratically as r increases. This is illustrated in Figure 5 (a) where the query cost in our approach (lower curve) increase in a linear manner, while that for ﬂooding (upper curve) increases quadratically. The size of the sensor ﬁeld in this example is 2, 000 with an average degree of 6.1.

Fig. 5: (a) Approximate query cost is very low compared to the naive ﬂooding. (b) The success rate of information brokerage under nodes depletions.

Robustness. Again, we ﬁx a sensor ﬁeld of 2000 nodes with various average degree, compute the HD routing structure, and remove a portion of sensors randomly (from 2% to 20%) from the ﬁeld. We then randomly choose a set of producer/consumer pair, each generating/seeking a random data item. The resulting success rates are shown in Figure 5 (b). The brokerage system is slightly more robust than the routing scheme, which is not surprising: as the robustness of our information brokerage system comes partly from the robustness of the routing scheme, and partly from the fact that even if a query fails to route to a particular information server, it can simply go to the information server one level up.

**4 Conclusion and Discussion**

In this paper we have presented a uniﬁed framework for eﬃcient routing and distance-sensitive information brokerage based on augmented hierarchical decompositions of communication graphs. In particular, for communication graphs of constant doubling dimensions, the guarantee for almost optimal routing paths comes at an additional cost of only O(log n) bits of storage per network node.

Our routing scheme does not rely on a dedicated backbone or hub structure, and hence performs rather well when some network nodes fail while providing a natural load balance between routing paths.

Our novel distance-sensitive information brokerage scheme is built based on the above routing scheme. We showed how information producers can diﬀuse pointers to their information to O(log n) other locations and in turn how information consumer can exploit this stored data to retrieve the information they want in a distant sensitive manner. As an application, our brokerage infrastructure allows for range queries with speciﬁed radius that take time proportional to the radius instead of time proportional to the area of the relevant range region.

All our procedures come with rigorous proofs of their worst-case performance guarantees, and the experimental results for both routing and information brokerage show considerably better performance than the worst case considerations in the theoretical analysis, (but the latter in some way explain this good behavior in practice).