Gnutella Protocol Development
Home :: Developer :: Press :: Research :: Servents
3.8 Query Routing Protocol Source - Latest draft The Query Routing Protocol (QRP for short) is an essential part of the Ultrapeer specification: it governs how the Ultrapeer will filter queries and only forward those to the leaf nodes most likely to have a match. This is done without even knowing the resource names, by looking the query words through a big hash table, that is sent by the leaf node to its Ultrapeer. The aim of the QRP is to avoid forwarding a query that cannot match, it is not to forward only those queries that will match. The overall operation goes thusly: * At the leaf node level: + Break all the resource names into individual words. A word is made of a consecutive sequence of letters and digits. + Hash each word with a well-known hash function and insert a "present" flag in the corresponding hash table slot. Note that this hash table is a big array, and we don't store the key, only the fact that a key ended up filling some slot. All words are lower-cased and all accents are removed from them, i.e. "déjà" is transformed into "deja", so that only ASCII characters remain. Only those words that are made of at least 3 letters are retained. + All words are re-hashed with their trailing 1, 2, or 3 letters removed, provided the word length after such trimming is at least 3 letter long. This is a simple attempt to remove plural from words. Optionally, nodes can chop off more letters from the end, provided that each hashed word is at least 3 character long. + The "boolean vector" built at later stage is optionally compressed, broken up in small messages, and sent mixed with regular Gnet traffic to the ultrapeer. * At the Ultrapeer level: + Until the whole "boolean vector" is received from a leaf node, all queries are forwarded to that node. + When the "boolean vector" is fully received, it is going to be used as the Query Routing table for that leaf node: queries are broken into individual words, all accentuated letters are removed. + For each leaf node with a Query Routing table: . Each word is then hashed and looked up in the Query Routing table. . Depending on the query matching rules (see 2.2.7.3), either ALL the words will be required to be found in the Query Routing, or only some of them, to declare a Query Routing Hit. . Only those queries that were declared a Hit at the previous stage will be forwarded to a given leaf node. The remaining sections define the hashing function, the mechanism used to build up the "boolean vector" and compress it, the protocol to transmit the vector to the Ultrapeer, and finally give operating hints for the table sizing. [TODO: finish the QRP description --RAM] The Query Routing Protocol (QRP) used in Ultrapeer can be found at: http://rfc-gnutella.sourceforge.net/src/qrp.html