Gnutella Protocol Development

Our blue logo

Gnutella Protocol Development

Home :: Developer :: Press :: Research :: Servents

3.8 Query Routing Protocol
Source - Latest draft

The Query Routing Protocol (QRP for short) is an essential part of
the Ultrapeer specification: it governs how the Ultrapeer will filter
queries and only forward those to the leaf nodes most likely to have
a match.  This is done without even knowing the resource names, by
looking the query words through a big hash table, that is sent by the
leaf node to its Ultrapeer.

The aim of the QRP is to avoid forwarding a query that cannot match,
it is not to forward only those queries that will match.

The overall operation goes thusly:

* At the leaf node level:

  + Break all the resource names into individual words.  A word is
    made of a consecutive sequence of letters and digits.

  + Hash each word with a well-known hash function and insert a
    "present" flag in the corresponding hash table slot.
    Note that this hash table is a big array, and we don't store
    the key, only the fact that a key ended up filling some slot.
    All words are lower-cased and all accents are removed from
    them, i.e. "déjà" is transformed into "deja", so that only
    ASCII characters remain.  Only those words that are made of
    at least 3 letters are retained.

  + All words are re-hashed with their trailing 1, 2, or 3 letters
    removed, provided the word length after such trimming is at
    least 3 letter long.  This is a simple attempt to remove plural
    from words.  Optionally, nodes can chop off more letters from the
    end, provided that each hashed word is at least 3 character long.

  + The "boolean vector" built at later stage is optionally 
    compressed, broken up in small messages, and sent mixed with 
    regular Gnet traffic to the ultrapeer.

* At the Ultrapeer level:

  + Until the whole "boolean vector" is received from a leaf node, 
    all queries are forwarded to that node.

  + When the "boolean vector" is fully received, it is going to be
    used as the Query Routing table for that leaf node: queries are
    broken into individual words, all accentuated letters are 
    removed.

  + For each leaf node with a Query Routing table:

    . Each word is then hashed and looked up in the Query Routing
      table.
  
    . Depending on the query matching rules (see 2.2.7.3), either ALL
      the words will be required to be found in the Query Routing, or
      only some of them, to declare a Query Routing Hit.

    . Only those queries that were declared a Hit at the previous
      stage will be forwarded to a given leaf node.

The remaining sections define the hashing function, the mechanism
used to build up the "boolean vector" and compress it, the protocol
to transmit the vector to the Ultrapeer, and finally give operating
hints for the table sizing.

[TODO: finish the QRP description --RAM]

The Query Routing Protocol (QRP) used in Ultrapeer can be found at:
http://rfc-gnutella.sourceforge.net/src/qrp.html




 

 

 

Home :: Developer :: Press :: Research :: Servents

SourceForge.net Logo