# «Abstract. In a sensor network information from multiple nodes must usually be aggregated in order to accomplish a certain task. A natural way to view ...»

We would like to emphasize again that even though a hierarchy is used in our approach, nodes high up in the hierarchy do not get any additional load comparing to nodes in lower levels. During routing, nodes in the hierarchy act as landmarks toward which packets are forwarded, but they only stay as landmarks when the packets are still far away from them, so a breakdown of a node in some sense does not hinder its function as a routing waypoint. Just like any other nodes, the failure of a node in high level of a hierarchy only has a local eﬀect on the routing and information brokerage capabilities of the sensor network.

Outline. In Section 2, we introduce the notion of a hierarchical decomposition as a generic hierarchical framework for organizing a sensor network, and then describe our routing scheme and an information brokerage system under this framework. The performance of our framework is experimentally evaluated in Section 3 by extensive simulations. Finally, we conclude and discuss possible extensions to our scheme in Section 4.

2 Distance-Sensitive Routing and Information Brokerage In this section, we ﬁrst introduce the notion of a hierarchical decomposition (HD) constructed on an arbitrary sensor network in which geographic coordinates of We remark that we use the term range query diﬀerently here from work like [8, 20], where ‘range’ refers to a range in data space and not in the space of sensor locations.

the nodes may not be available. As it becomes clear later, a HD captures all the necessary properties we need for routing and information brokerage. We then show how to use a HD of the communication graph for eﬃcient routing and for distance sensitive information brokerage.

2.1 Hierarchical Decompositions of Graphs In the following we consider an undirected, weighted, connected graph of n nodes with node distances induced by the shortest path metric. We call a tree H of height h a hierarchical decomposition (HD) of S if – each node c ∈ H is associated with a set of nodes Sc ⊆ S (cluster ), – for the root r ∈ H (which is at level h − 1) we have Sr = S, – all leaves of H have the same level 0, – for any node c ∈ H at level k, we have that the cluster Sc associated with c has diameter less than α · 2k for some constant α, – if c ∈ H has children c1... cl, we have Sc = Sci In case of an unweighted graph (i.e. edge weights are all 1), the diameter of a connected n-node graph is at most n, and one can construct a hierarchical decomposition of height at most h = 1 + log n ≤ 2 + log n. The following discussion focuses on this case for simplicity. We also remark that the constraint of all leaves being at the same level 0 is not mandatory and could be removed.

We make this assumption for simplicity of the presentation and also because the hierarchical decomposition we use in our experiments has this property.

There are diﬀerent ways to construct a HD. In our implementation we use a speciﬁc HD based on the discrete center hierarchy (DCH) from [9], which is also similar to the hierarchy in [12]. It can be built eﬃciently and distributedly on an arbitrary sensor network given only its connectivity graph (details omitted for lack of space).

2.2 Routing Using Hierarchical Decompositions First we provide a routing scheme based on a hierarchical decomposition H of

**a communication graph. Important properties which we want to ensure are:**

– scalability: the routing information that an individual node of the network has to store should be small compared to the network size; the load on the nodes should be distributed in a balanced fashion (so we disallow dedicated hub nodes or backbones).

– eﬃciency: the path generated by our routing scheme between nodes v and w should be only a constant factor worse than the optimal shortest path in the communication network.

– robustness: the impact of nodes or links failing should be limited and local.

In particular, packets that get temporarily stuck due to node or link failures should be able to recover using local rules.

The scheme we provide has all these three properties.

An addressing scheme Let ∆max be the maximum degree of H. We number the children c1,... co with o ≤ ∆max of a node c arbitrarily, and deﬁne the

**following addresses for the nodes of the tree:**

– the root r has as address the h-dimensional vector f (r) := (1, 0, 0,..., 0) (remember h is the height of H) – a node c ∈ H at level k which is child ci of parent c is assigned the address f (c ) := f (c) + i · eh−k, where eh−k is the h-dimensional unit vector with a 1 at the (h − k)-th position Essentially this constructs an IP-type address for each node of the tree and hence also for each cluster in the hierarchical decomposition. The entries in the vector are bounded by the maximum number of children ∆max.

Connecting levels and eﬃcient routing Let us now extend the addressing scheme to an eﬃcient routing protocol. For that we need the notion of neighboring

**clusters of a node:**

Deﬁnition 1. A cluster L at level k is called a neighboring cluster of a node v if there exists a node q ∈ L such that d(v, q) ≤ α · 2k+1.

Note that while the number of neighboring cluster of a given node in each level may be larger than the number ∆max in the decomposition tree, in ‘well-behaved’ decompositions, we expect this number to be small. In particular, when the hop distance between the nodes is a metric with bounded doubling dimension [11] and a DCH is used as a hierarchical decomposition, the maximum number λmax of neighboring clusters of a node in any given level is a constant, and a node has at most a total of O(h) = O(log n) neighboring clusters. From now on, we assume this is the case (so ∆max and λmax are constants) to simplify our presentation.

We let each node in the sensor network store its distances to all of its neighboring cluster. The routing of a message to a node w from v can then be done as follow. Node v ﬁrst looks up its neighboring clusters and ﬁnd the smallest cluster L containing w. v then locally forwards the message for w to any node closer to L than v.

We claim that the length of the path when the message arrives at the smallest cluster containing w is at most a constant times longer than the optimal, shortest path from v to w in the communication graph. If our hierarchical decomposition has singleton clusters associated with the leaves, this implies a complete path from v to w which is a constant factor approximation of the shortest path. In particular, we have the following result (proof omitted for lack of space).

** Lemma 1. Let v, w be two nodes in the network.**

Then the path generated by the above routing scheme from v to the cluster of the lowest level containing w has length at most 4 · dvw where dvw denotes the shortest path distance between v and w in the communication graph.

Since there are typically many close-to-shortest paths from a given node to a cluster, this routing scheme has a natural robustness against node or link failures. And even if none of the immediate neighbors is closer to the target cluster, inspecting a slightly larger local neighborhood most of the time results in a successful forwarding of the message towards the target cluster.

If every bit counts Assuming that the maximum number of neighboring clusters λmax and the maximum number of children ∆max in the HD is a constant, each node has to store address and distance of O(h) clusters. In case of a communication graph with unit edge weights and singleton clusters at the leaves, h = Ω(log n) and hence each node has to store Ω(log2 n) bits: the address of a cluster has h = Ω(log n) components and the bit-complexity of the distance value stored for a neighboring cluster might be Ω(log n) as well.

If every bit of space is relevant, we can do still better. Let us ﬁrst consider a more eﬃcient way to store the addresses of neighboring clusters of a node v. The key observation here is the trivial fact that if at level k − 1 some cluster L is a neighbor of v, then in level k its parent p(L) is also a neighbor. That means if a node has already stored the address of p(L) it can store the neighbor L at level k −1 at additional cost of only log ∆max bits. Hence for constant λmax, ∆max, the addresses of all neighbors at all levels can be stored using O(log n) bits. The need for Ω(log n) bits per distance value per neighbor can easily be reduced to a constant by just remembering one edge to an adjacent node in the communication graph that is closer to the neighboring cluster instead of the actual distance.

Corollary 1. The routing scheme can be implemented by storing O(log n) bits per node in the network.

We remark that the above addressing scheme as well as the neighboring information can be computed eﬃciently by restricted ﬂooding during the initialization stage. It only increases the construction cost slightly over that of DCH’s.

2.3 Eﬃcient Information Brokerage using Hierarchical Decompositions Given some ﬁxed HD H, we now show how to achieve eﬃcient distance-sensitive information brokerage based on the routing scheme described above. Let Σ = {σ1, σ2,..., σm } denote the discrete set of all data items possibly produced or queried in the sensor net. Some properties of a desirable information brokerage

**system include:**

– load balance: no node should have the burden of providing lookup-information for many diﬀerent types of data items;

– eﬃciency: references to a certain type of data should be stored at a small number of nodes, and the time required for node w to access data produced by node u should be proportional to the distance between u and w.

– robustness: failure of nodes or links should only increase the time to store or retrieve a certain data item, but not make storage/retrieval impossible.

Our information brokerage system exploits the routing structure described above and to meet these desiderata.

First assume that we have a hash function µ : Σ × HD → S, such that given any data item σ ∈ Σ and a cluster L ∈ HD, we can compute a unique sensor node µ(σ, L) ∈ S within this cluster. Furthermore, µ(σ, L) can be accessed from any node in cluster L (of diameter D) within 2D hops using our routing structure (we will describe one such hash function at the end of this section). We call µ(σ, L) the information server of data item σ in cluster L. Our information brokerage system relies on collecting and distributing some synopses of data items to a small set of information servers.

Information diﬀusion Suppose a node (producer) u has data item σ ∈ Σ.

Recall that u is contained in h clusters of the tree H (i.e., its ancestors), one in each level. Let L(u, d) be the cluster containing u at level d, and L1,..., Ll d d the neighboring clusters of L(u, d). The producer u sends a message (containing its own address and some synopses of σ) to the information server µ(σ, Lj ) d associated with each of these Lj ’s for all 0 ≤ d h, see Figure 1. If the maximum d number of neighbors at each level and the maximum degree of H are constants, then a producer will store a synopses of σ at O(h) = O(log n) nodes. Since the routing structure already speciﬁes how to access all these neighboring clusters, no extra per-node storage is required to implement the diﬀusion process.

Fig. 1: There are exponentially fewer information servers away from the producer (upper right). The consumer (bottom center) looks for information at information servers in exponentially bigger clusters containing itself.

The length of the paths to the information servers decreases geometrically with decreasing level in the hierarchy. Hence the total number of hops to send synopses to all information servers, i.e., the communication cost for the producer, is dominated by the length of the paths to the information servers in the highest level of the hierarchy. For graphs of constant doubling dimension, this cost is in the order of the diameter of the network. We summarize this in the following

**lemma:**

** Lemma 2. In the above diﬀusion scheme a producer of some data item σ ∈ Σ distributes σ to O(log n) nodes in the network at a total cost of O(D) hops, where D denotes the diameter of the network.**

Information retrieval When a consumer w wants to access some data item σ, it will look for it in growing neighborhoods, namely, in clusters L(w, i)’s in increasing order of i, where L(w, i) denotes the ancestor of w at level i, see Figure 1.

More precisely, it starts from w, and visits information servers µ(σ, L(w, 1)), µ(σ, L(w, 2)),..., in sequential order to check whether the data item σ is there.

Note that unlike the producer which sent out messages in diﬀerent branches to various information servers, the consumer will only follow one path, and return as soon as it ﬁnds the information sought. The following lemma guarantees the distance sensitivity of our method (proof omitted from this extended abstract).

** Lemma 3. If node w wants to retrieve the data item σ ∈ Σ associated with node u, this request can be completed in O(duw ) time steps, where duw denotes the distance between u and w.**

We would like to emphasize that the constant in the O-notation experienced in practice is very close to 1. Furthermore it frequently happens that when a node observes an event, nearby nodes also observe the same event. To prevent multiple hashing for the same data item, each node can attempt to ﬁrst retrieve the same item from its local neighborhood using the information retrieval process. If the item is not found, it can start the information diﬀusion process.

Approximate range counting and reporting Let RC(w, r; σ) denote the number of occurrences of data item σ detected by some sensor at most r hops away from w. Our information brokerage system can also be used to perform approximate range counting or range reporting for a consumer. In particular, we have the following lemma (proof omitted).

** Lemma 4. Let s be the number of distinct messages about σ received at node µ(σ, L(w, d)), where d = log(r/α), then RC(w, r; σ) ≤ s ≤ RC(w, 4r; σ).**