Gnutella Protocol Development

Our blue logo

Gnutella Protocol Development

Home :: Developer :: Press :: Research :: Servents

Hints for developers - This chapter is not related with the various 
Gnutella Protocols. Its purpose is to propose some implementation
guidelines for the message routing and duplicates checking. The goal
is to give hints to new servents developer so that they can make a 
clean servent implementation. Actually the impact of a poorly
implemented servent on the GNet can be very high.


3.6 Routing and Duplicates Tables Implementation Guidelines


3.6.1 Maintaining GUID Routing Tables

In order to route a QueryHit or a Push message, a servent MUST 
know to which source message it is a reply, and from which connection 
the source message came from. We call this connection the originator 
connection. This can be performed by maintaining routing tables
for Push and QueryHit messages.

Pong messages are not routed anymore since the wide deployment of the
Pong Caching proposal.


3.6.1.1 QueryHit messages routing

To ensure proper QueryHits routing, the servent MUST store, for each
Query message received from the GNet, its GUID along with the originator
connection. One possible implementation is to use a map (an associative
table) with the GUID as a key and the connection as a value.

Each time a QueryHit is received, its GUID is checked in the map. If 
there is no entry in the map for this GUID, this can mean two things, 
depending on the implementation of your servent: either this message
was wrongly sent to the servent, and it should be discarded, or it
is a reply to a Query sent by the local servent.

There's a simple solution to avoid this ambiguity : when a Query
is sent by the servent, store its GUID in the queryHits routing Map,
with a special value (as there's no originator connection), for 
instance a null or a dummy connection. With this scheme, when a queryHit
message is received there will be three cases : either there's a 
connection associated with the GUID, in which case the message should be
routed to that connection, or there's a dummy object associated with the
GUID, in which case this is an anwser to a query sent by the servent,
or there's no entry for this GUID in the map, meaning that this message
was wrongly sent to the servent and must be dropped (this can also be
because the entry for this GUID has expired, still meaning that the
route is lost. See below).

One very sensible issue for the GNet health is how much time the table 
will keep any entry. A servent can't store all history of messages 
seen by the servent as there can be millions of messages routed each
day. A possible solution is to store a given amount of entries, but 
this solution is not scalable. Thus, a better choice is to store each 
entry for a certain amount of time. Note that there are many ways to 
implement such a solution, with or without multithreading.

Computing how much time messages should be kept is complex, because among
other things, the flow control as well as the TCP/IP layer behavior should 
be taken in account.

A servent MUST store each entry in the queryHits routing table for
at least 10 minutes [IS THIS CORRECT ?]. It may store them more time.


3.6.1.2 Push messages routing

Push messages are important messages, because the only way (until 
push-proxy is widely deployed, see XXXXX) to download a file from a 
firewalled servent. Moreoever, as a servent may choose to send
a push message every few minutes during some time, it may happen that
a push message is received a very long time after the corresponding 
queryHit was sent.

The rate of queryHits is not very high (maybe about 5 %). Thus, the 
memory cost of storing Push message routes is much smaller
than storing Query message routes.

For these reasons, a servent SHOULD store each entry in the push route
table for at least 30 minutes [IS THIS CORRECT ?]. It may store them much
more time.

3.6.1.2.1 Standard routing

Unless for QueryHits routing, the source message GUID cannot be used
to route Push Message. This is because each Push message has its own
unique GUID which is not related to the source QueryHit GUID. Push
messages are routed using the Servent identifier which is included in
both QueryHit and Push messages. 

For each queryHit received by the servent, the Servent Identifier 
field is stored in a map along with the originator connection. When
a Push message is received, the map is checked for an entry for its
servent Identifier. If there's a connection associated with this 
Identifier, the message is sent to that connection. If there's no
object associated with it, then the message must be dropped (either
it was wrongly routed to the servent, or its entry in the route table
has expired).

For QueryHit messages created by the local servent, there's no real
need to store them in route tables (as suggested with QueryHit routing),
as there is another way to know when a Push message is a request for
the local servent: check if the Servent Identifier field of the Push 
message matches the GUID of the local servent. But the solution is 
still effective anyway.

3.6.1.2.2 Broadcasting Push messages

Given the importance of the Push messages a more advanced solution 
may be use to route Push messages. 

Instead of using one unique Push route map, each connection manages its
own route map. Hence when a Push message is received, it is sent to
all the connections that sent the same queryHit (Note that if the 
servent follows the recommandations of this document, only one of the
source queryHits will be routed, the others being dropped as duplicates,
see 3.6.2 Maintaining duplicates tables, and also 2.4 standard message
architecture).

The procedure is close from the standard Push messages routing. But
instead of an associative table, lists are used to store the
QueryHit Servent Identifer: For each connection, a list stores the
Servent Identifier of each QueryHit message received on this connection.
This MUST be done even if the QueryHit has already be seen on another
connection. Each time a push message is received, each connection is 
asked to look in its list for the Servent Identifier of this message. 
If there's an entry, then the message is sent to the connection. If no
connection has an entry for this Servent Identifier, then the message 
is dropped.

[TODO: Add comments about TTL correction for broadcasted pushes ?]



3.6.2 Maintaining duplicates tables

Another issue while implementing a servent is to manage the duplicate 
messages. See section 2.7 to get the basic information about duplicates
handling.

Like routing tables, the recommended way to manage duplicates is to
store them temporarily for a given amount of time, rather than 
using a max size for the data structure. A servent MUST store each
entity in the duplicate tables for at least 10 minutes.
[IS THIS CORRECT ?]

3.6.2.1 Ping

As there's no need anymore for pong routing tables for Pong-Caching
servents, the servent has to manage a specific global Ping duplicates 
List. Each Ping message sent by a servent MUST have a unique GUID. Thus,
to check for Ping message duplicates, one can simply store the GUID of
each ping received in a list. If a Ping appears to have its GUID already
stored, then it is a duplicate, and it must be dropped. Note that if Ping
messages use and Pong-Caching are well implemented in all servents, this
should never happen.

This feature may seem to not be important, but it SHOULD be implemented, 
because there are still many legacy servents on the GNet, which are not
Pong-Caching compliant, and thus which may send ping duplicates.

However, Ping duplicates should be very rare.

3.6.2.2 Pong

As for ping messages, there's no real need today to check for duplicates
with updated servents. But for compatibility reasons with older clients,
a servent SHOULD implement a Pong message duplicates verification. But 
the message GUID for all pongs received in response for one Ping message
(whether or not it is Pong-Caching enabled) have all the same value (the
GUID of the Ping). Thus another value should be used to check for Pong
duplicates. The simplest way is to store for each Pong received, a
String which is the concatenation of the message GUID, the IP address
and the port of the responding host.

Other solutions are possible, like implementing a hashing function for 
Pong messages, and storing the hash value associated with the pongs to
check for duplicates.

Pong duplicates should be very rare.

3.6.2.3 Query

Query messages duplicates verification is very simple: like with Ping
messages, a servent can simply store the GUID of each Query message
and drop a message whenever its GUID is already stored in the list.

However, as the servent also manages a queryHit route table, the
GUID of received Query messages are already stored. In this scheme,
we suggest to use a query duplicate table which is not the queryHits 
route table, but others implementations are possible using a unique
table for both queryHit routing and Query duplicates checking. Such
solutions will generally spare some RAM, but use more CPU.

Query duplicates are very common. They're caused mainly by cycles in 
the GNet.

3.6.2.4 QueryHit

QueryHits are the most complicated messages to check for duplicates. 
A servent receiving a message with the same
payload Message and Message ID as one it has received before, MUST
discard the message. It means the message has already been seen.

A suggested way to compute a unique identifier for each QueryHit 
received is to use a hashing function for the result set in the
message, and to store a String which is the concatenation of
the message GUID, the remote servent GUID (located at the end of
the QueryHit message), and the hash value. This ensures that any
duplicate message will be detected by the servent, and dropped.

Note that you can't simply store the message GUID and the remote
servent GUID, as the remote servent may choose to send more than
one QueryHit for one given Query, for instance to avoid sending
too big queryHits through the GNet when there are several results 
for the Query.

Like with Queries, others implementations are possible, using a 
unique table for both Push routing and QueryHits duplicates checking.

QueryHits duplicates are very common. They're caused by cycles in the
GNet and possibly by poorly implemented servents which answer to
duplicate queries instead of dropping them.

3.6.2.5 Push

Push messages are handled specially by the servent, as this is not
the message GUID which is used to store the message route, but instead,
the servent GUID (see 2.4.8 Push). Thus, each Push message has a unique
message GUID (this is not the case with QueryHits), and like with
Query messages, duplicates checking can be performed by storing a list
with the GUID of each Push message and dropping a message if its
GUID is already stored.

3.6.2.6 Bye

There is no need to check for Bye message duplicates as the servent 
which received the message MUST disconnect from the requesting 
servent.

3.6.2.7 VM

[WHAT SHOULD I SAY HERE ?]

[3.6.2.8 QRP ?]


APPENDIX : Computing a hash value.

The preferred way to compute a hash value for a compound object is to 
XOR the hash value of each of its components [IS THIS CORRECT ?], or 
alternatively, in the case of a subcomponent which is a list of
components only, to add the hash value of each object in the list.

Note for Java developpers :

The Java language includes a default hashing function for all objects. 
However this method can't be used as such to compute a hash value for
Gnutella messages. This is because it does not by default use the values
of the object's member variables, but instead, only the references. 
Thus, even if two messages are identical, the hashCode() method will
return a different value. As told in the javadoc for the 
java.lang.Object class, there's a general contract when overwriting 
hashCode(): the equals() method also needs to be overwritten, to be 
consistent with the hashCode() method.

If for the purpose of message routing or duplicates checking you store
an object (message, GUID, ...) in a Collection (List, Map, Set), then
you MUST overwrite the equals method, as it is called by many
methods of the Collection framework, for instance contains() or 
get(key).

Example : An object MyObject has three member objects _a _b and _c, and
one numeric member variable _n (primitive type). 
Suggested code :

	public int hashCode() {
		return _a.hashCode()^_b.hashCode()^_c.hashCode() ^_n;
	}
	
	public boolean equals(Object o) {
		if (!(o instanceof MyObject)) return false;
		MyObject mo = (MyObject)o;
		return _a.equals(mo.getA()) && _b.equals(mo.getB()) && 
                         _c.equals(mo.getC() && _n == mo.getN());
	}

 

 

 

Home :: Developer :: Press :: Research :: Servents

SourceForge.net Logo