Gnutella Protocol Development
Home :: Developer :: Press :: Research :: Servents
Section 2. Protocol Definition
Source - GnutellaProtocol-v0.4-r1.6 by Philippe Verdy
The Gnutella protocol defines the way in which servents communicate over the network. It consists of a set of descriptors used for communicating data between servents and a set of rules governing the inter-servent exchange of descriptors. Currently, the following descriptors are defined:
Descriptor Description
Ping Used to actively discover hosts on the network. A servent receiving a Ping descriptor is expected to respond with one or more Pong descriptors.
Pong The response to a Ping. Includes the address of a connected Gnutella servent and information regarding the amount of data it is making available to the network.
Query The primary mechanism for searching the distributed network. A servent receiving a Query descriptor will respond with a QueryHit if a match is found against its local data set.
QueryHits The response to a Query. This descriptor provides the recipient with enough information to acquire the data matching the corresponding Query.
Push A mechanism that allows a firewalled servent to contribute file-based data to the network.
A Gnutella servent connects itself to the network by establishing a connection with another servent currently on the network. The acquisition of another servent’s address is not part of the protocol definition and will not be described here (Host cache services are currently the predominant way of automating the acquisition of Gnutella servent addresses).
2.1 Handshaking
Source - http://f4.grp.yahoofs.com/v1/sJ_4Pti8ctDrtfwC80WZmgVoFHlBK1bLkQHaBzqq6AKYcc_rkVS0mjoH99W5NZAjlAH-iNCaL7GRhE7kvoK4Q592tUg/Development/Gnutella%200.6%20Handshaking%20Protocol
Once the address of another servent on the network is obtained (its IPv4 address and its port number), a TCP connection to the servent is created, and the following Gnutella connection request string (ASCII encoded) may be sent:
1. The client establishes a TCP connection with the server.
2. The client sends "GNUTELLA CONNECT/0.6<cr><lf>".
3. The client sends all capability headers--except for
vendor-specific headers--each terminated by "<cr><lf>", with
an extra "<cr><lf>" at the end.
4. The server responds with "GNUTELLA/0.6 200 OK<cr><lf>".
5. The server sends all its headers, in the same format as in (3).
6. The client sends "GNUTELLA/0.6 200 OK<cr><lf>, as in (4).
7. The client sends any vendor-specific headers as needed, in the
same format as (3).
6. Both client and server send binary messages at will, using the
information gained in (3) and (5).
Here is a sample interaction between a client and a server. Data sent
from client to server is shown on the left; data sent from server to
client is shown on the right.
Client Server
-----------------------------------------------------------
GNUTELLA CONNECT/0.6<cr><lf>
User-Agent: BearShare<cr><lf>
Query-Routing: 0.2<cr><lf>
<cr><lf>
GNUTELLA/0.6 200 OK<cr><lf>
User-Agent: BearShare<cr><lf>
Query-Routing: 0.1<cr><lf>
BearShare-Data: 5ef89a<cr><lf>
<cr><lf>
GNUTELLA/0.6 200 OK<cr><lf>
BearShare-Data: a04fce<cr><lf>
<cr><lf>
[binary messags] [binary messages]
A few notes about the responses: first, the client (server) should disconnect if receiving any response other than "200 OK" at step 4 (6). There is no need to define these error codes now. Second, servents should ignore higher version numbers in steps (2), (4), and (6). For example, it is perfectly legal for a future client to connect to a server and send "GNUTELLA CONNECT/0.7". The server should respond with "GNUTELLA/0.7 200 OK" if it supports the 0.7 protocol, or "GNUTELLA/0.6 200 OK" otherwise. However, note that an old-fashioned "GNUTELLA OK" is not an appropriate response.
A few notes about the headers: servents should use standard HTTP headers whenever appropriate. For example, servents should use the standard "User-Agent" header rather than make up a "Servent-Vendor" header. However, it is perfectly legal to add new headers (e.g., "Query-Routing") when no appropriate HTTP header exists, as long as they follow HTTP syntax. Such headers should be approved by other developers on the GDF.
One difficulty with this scheme is that it does not provide automatic backwards-compatibility with existing servents. A 0.6 client that sends "GNUTELLA CONNECT/0.6" to a 0.4 server will be disconnected. For this reason, clients are encouraged to reconnect at the 0.4 level if the 0.6 handshake failed. Furthermore, servers should respond with an old-fashioned "GNUTELLA OK" if they receive "GNUTELLA CONNECT/0.4<lf><lf>" instead of the more modern "GNUTELLA CONNECT/0.6<cr><lf>". Both steps will prevent 0.4 clients from being
disconnected from 0.6 clients.
2.2 Peer-to-Peer Gnutella Packets: Descriptors
Source - http://a4.grp.yahoofs.com/v1/sJ_4PlwvJhnrtfwC3MckyJSu5-bQyioxN9S3JNywyzQUzWDngjUmSFLzR9k96qnPexO8ylKcTHbzJkFXN-xQThZd74M/Development/GnutellaProtocol-v0.4-r1.6.html
Once a servent has connected successfully to the network, it communicates with other servents by sending and receiving Gnutella protocol descriptors. Each descriptor is preceded by a Descriptor Header with the byte structure given below.
Note 1:
All fields in the following structures are in little-endian byte order unless otherwise specified.
This is differing from the traditional network-byte-order traditionally used in other networking protocols, but this has been kept for historical reasons and interoperability with existing Gnutella servents.
The traditional byte ordering ntoh(), ntol(), hton(), lton() functions used in networking libraries MUST NOT be used for descriptors as they assume a big-endian byte order for the network encoding (these functions or macros are no-operation identity only on big-endian machines, such as Motorola systems, and perform byte swaps on Intel systems). These functions MUST be replaced by providing functions like nltoh(), nltol(), htonl(), ltonl() assuming little-endianness on the network.
Note 2:
All 32-bit IP addresses in the following structures are in IPv4 format. For example, the IPv4 byte array:
Byte value 0xD0 0x11 0x32 0x04
Byte offset 0 1 2 3
represents the dotted address IPv4 "208.17.50.4". I.e. network addresses use the standard network byte-order, defined as big-endian for IPv4.
3.1. Descriptor Header
Fields Descriptor ID Payload
Descriptor TTL Hops Payload
Length
Byte offset 0...15 16 17 18 19...22
Descriptor ID
A 16-byte string uniquely identifying the descriptor on the network. Its value must be preserved when forwarding messages between servents. Its use allows detection of cycles and help reduce unnecessary traffic on the network.
When generating 128-bit Descriptor IDs, servents can use the UUID generation algorithm, or use a cryptographically strong random generator. The value of the Descriptor ID carries no signification, and should always be treated as an opaque binary string, whose byte order must be preserved when forwarding messages.
However, within Pong descriptors, which uniquely identifies a servent host identified by a stable GUID in a more reliable and persistent way than by its current IP and port number address, a special Pong marking is required for newer applications: byte 8 SHOULD be set to 0xFF (indicating that the GUID unambiguously and uniquely identifies the servent), and byte 15 SHOULD be set to 0x00 for future use.
Payload Descriptor
0x00 = Ping (see section 3.2.1)
0x01 = Pong (see section 3.2.2)
0x80 = Query (see section 3.2.3)
0x81 = QueryHits (see section 3.2.4)
0x40 = Push (see section 3.2.5)
Extension Descriptors:
0x02 = Bye (see section 3.2.6)
0x10 = IBCM (Reserved for the non-standard InBandControlMessage descriptor, but MAY cause compatibility problem with legacy, non IBCM-aware, servents)
0x30 = QRP (see section 3.2.7)
0x31 = Open Vendor Extension (see section 3.2.8)
0x32 = Standard Vendor Extension (see section 3.2.9)
Note: Other values SHOULD not be used for now, as remote servents may consider it as invalid. Their use will be specified in an higher version of the protocol than the current 0.4 protocol (or its 0.6 extension).
TTL
Time To Live. The number of times the descriptor will be forwarded by Gnutella servents before it is removed from the network.
Each servent MUST decrement the TTL before passing it on to another servent. When the TTL reaches 0, the descriptor MUST no longer be forwarded.
Note: This field is unsigned, however a value higher than 127 will very probably be considered as excessive when used on the Internet.
Hops
The number of times the descriptor has been forwarded.
Note: This field is unsigned, however a value higher than 127 will very probably be considered as excessive when used on the Internet.
As a descriptor is passed from servent to servent, the TTL and Hops fields of the header MUST satisfy the following conditions:
TTL(i) + Hops(i) = TTL(0)
TTL(i + 1) < TTL(i)
Hops(i + 1) > Hops(i)
where TTL(i) and Hops(i) are the value of the TTL and Hops fields of the header at the descriptor’s i-th hop, for i >= 0.
Payload Length
The length of the descriptor immediately following this header. The next descriptor header is located exactly Payload Length bytes from the end of this header i.e. there are no gaps or pad bytes in the Gnutella data stream.
Note: This field is unsigned however a value of 2GB or more will very probably be considered as excessive. With the current specification of the protocol, the last encoded byte of the Payload Length field SHOULD then be 0 (as Payloads won't reach 16MB when used on the Internet).
The TTL is the only mechanism for expiring descriptors on the network. Servents SHOULD carefully scrutinize the TTL field of received descriptors and lower them as necessary. Abuse of the TTL field will lead to an unnecessary amount of network traffic and poor network performance.
Note:
Some servents MAY consider excessive values for TTL+Hops as indicating desynchronization of the connection input stream. Also, a descriptor where TTL=0 and Hops=0 is invalid. All servents MUST consider that TTL+Hops values between 1 and 7 are valid (a higher range is possible but not recommended for use on the Internet). A servent MAY reduce excessive TTL value, but MUST NOT increase it when forwarding or caching Descriptors. A servent MUST NOT reduce the Hops value as this will break the discovery of shorter routes and will affect route caches. When forwarding a descriptor to remote servents connected with slow or unreliable connections, a servent MAY also count more than 1 Hop and reduce the TTL by the equivalent number, provided that the resulting TTL value does not reach 0 (in such a case the descriptor MUST be discarded).
The Payload Length field is the ONLY reliable way for a servent to find the beginning of the next descriptor in the input stream. The Gnutella protocol does NOT provide an "eye-catcher" string or any other descriptor synchronization method (it assumes that reliable TCP connections are used). Therefore, servents SHOULD rigorously validate the Payload Length field for each descriptor received (at least for fixed-length descriptors). If a servent becomes out of synch with its input stream, it SHOULD drop the connection associated with the stream since the upstream servent is either generating, or forwarding, invalid descriptors.
Note:
A desynchronization MAY be detected by the presence of an unknown value for the Payload Descriptor field in a single descriptor message, which servents are NOT required to silently discard.
For example, a new Payload Descriptor value has been proposed, the "Bye" descriptor with value 0x02, which gives the reason why a servent is being disconnected. The currently defined policy with unknown Payload Descriptors allows this because this message will not be followed by any other Descriptor, so the connection MAY still be silently dropped. This is however a proposed extension, whose payload format has still not been agreed upon among servents implementors. Its specification is not part of this document, and MAY be documented later.
3.2. Descriptor Payloads
Immediately following the descriptor header, is an optional payload, whose content and structure depends on the Descriptor Payload field in the descriptor header. The following sections detail them:
3.2.1. Ping (0x00) Descriptor Payload
Fields Optional Ping Data
Byte offset 0...L-1
Optional Ping Data
This is an optional field consisting in bytes of variable length, it is reserved for extensions of the current version of the protocol, to specify filters about expected Pong replies. Its maximum length is bounded by the Payload Length field of the header.
When used, this field SHOULD be small and agreed upon with other Gnutella servent implementors, as this field MAY be specified in a further specification of the protocol.
Standard Ping descriptors currently have no associated payload and are of zero length. A Ping is simply represented by a descriptor header whose Payload Descriptor field is 0x00 and whose Payload Length field is 0x00000000.
A servent uses Ping descriptors to actively probe the network for other servents. A servent receiving a Ping descriptor MAY elect to respond with a Pong descriptor, which contains the address of an active Gnutella servent (possibly the one sending the Pong descriptor) and the amount of data it’s sharing on the network.
This specification makes no recommendations as to the frequency at which a servent SHOULD send Ping descriptors, although servent implementers SHOULD make every attempt to minimize Ping traffic on the network.
Note:
There's no requirement to always forward any Ping request to other connected servents or with a large TTL+Hops value. So, most actual servents implement a traffic limiting policy for Ping descriptors.
3.2.2. Pong (0x01) Descriptor Payload
Fields Port IP Address Number of Files Shared Number of Kilobytes Shared Optional Pong Data
Byte offset 0...1 2...5 6...9 10...13 14...L-1
Port
The TCP port number on which the responding host can accept incoming Gnutella connections. (See section 3.2.2.2 below)
IP Address
The IPv4 address of the responding host. (See section 3.2.2.3 below)
This field is in big-endian format.
Number of Files Shared
The number of files that the servent with the given IP Address and Port is sharing on the network.
Note: An excessive number of shared files will sometimes be ignored by servents receiving it, because it is suspect or because cumulating it could produce internal overflows. This informative field can be null but some servents have local policies that restrict accesses from "freeloaders" that don't share a minimum number of files.
Number of Kilobytes Shared
The number of kilobytes of data that the servent with the given IP Address and Port is sharing on the network.
Note: An excessive total shared size (more than 2GB), or an excessive mean size per shared file, will sometimes be ignored by servents receiving it because it is suspect or because cumulating it could produce internal overflows. This informative field can be null but some servents have local policies that restrict accesses from "freeloaders" that don't share a minimum volume of files.
Optional Pong Data
This is an optional field of variable length, it is reserved for extensions of the current version of the protocol, to give other information about the servent, or to provide alternate transport protocols or addresses that allow incoming connections to the servent. Its maximum length is bounded by the Payload Length field of the header.
When used, this field SHOULD be small and agreed upon with other Gnutella servent implementors, as this field MAY be specified in a further specification of the protocol.
Pong descriptors are ONLY sent in response to an incoming Ping descriptor. Multiple Pong descriptors MAY be sent in response to a single Ping descriptor. This enables host caches to send cached servent address information in response to a Ping request.
3.2.2.1. Usage policy
1) Fields that SHOULD be preserved from incoming Pongs:
In order to reduce the network traffic used by Pong descriptors and to discover shorter or alternate routes to the same servent, the Descriptor ID field of cached Pongs SHOULD be preserved locally along with the Hops and TTL fields.
However, excessive Hops+TTL values in incoming SHOULD be reduced by keeping the Hops field. If it has the effect of producing a negative or null TTL value, the Pong MAY be marked as invalid and be discarded, as the corresponding advertized servent may be unreachable via the Gnutella network.
2) Generating the Descriptor Id for Pong descriptors:
The Descriptor Id associated to the payload information of Pong replies SHOULD be constant for all Ping requests received from the same or alternate connection, at least as long as the responding servent has an active connection to the network, unless the servent implements multiple listening IP interfaces attached to distinct networks, considered as if it was different servents.
This Descriptor Id SHOULD be globally unique for that server instance. So its generation should use the UUID algorithm or a cryptographically strong random generator. However byte 8 SHOULD be set to 0xFF (indicating that the GUID unambiguously and uniquely identifies the servent), and byte 15 SHOULD be reserved and set to 0x00 for future use.
When receiving or forwarding Pong descriptors, the Descriptor Id field MUST NOT be modified, whatever its value.
3) Responding to incoming "direct" and "browsing" Ping requests:
Each servent SHOULD respond (at least once for each connected remote servent) with a valid Pong answer to an incoming "direct" Ping request with TTL=1 and Hops=0. To allow the implementation of large Pong caches, they SHOULD also advertize (at least once) with Pong the list of their currently connected (or recently cached) accessible neighbor servents in reply to an incoming "browsing" Ping request (with TTL=2 and Hops=0).
3.2.2.2. Port numbers in standard Pong descriptors
1) Standard and default Port numbers:
Even though Gnutella servents traditionally use TCP Port number 6346 by default for incoming Gnutella connections, this is NOT a requirement. There's no "standard" port number defined and servents may use whatever valid port number between 1 and 65535 they wish for Gnutella TCP connections to reach the servent.
2) Gnutella Port number and download Port number:
The Port number advertized in Pong descriptors MAY be different from the port number advertized in QueryHits replies to enable download requests. Incoming Gnutella connections MAY as well assign the same TCP port for incoming HTTP connections used by download requests.
3) Non null Port numbers:
A non null Port number indicates support by the servent for incoming Gnutella TCP connections. Most servents SHOULD provide this field with a default value for the local host, unless the servent is discovered to be firewalled or manually configured to use an acceptable port.
The 0.4 protocol currently does not specify any procedure to check that the advertized TCP port number is accessible from other servents or to discover which port is directed to the local host by the firewall or router.
4) Null Port numbers:
However, if the servent runs on a host whose whose local IP address is on private LAN and the currently connected Host is on another subnetwork or on Internet, and if Port number has not been explicitly configured by the user for that network interface, it is expected that the default Port number will not be accessible; in that case it MAY be preconfigured to 0.
Servents that receive a null Port number in an incoming Pong SHOULD discard this Descriptor and not forward it to other servents, as it indicates that direct Gnutella connection with TCP to the sending host is not possible.
Note however that the presence of Pong Data may change this behavior, as it may provide alternate transport protocols (apart from TCP) to connect to the "firewalled" servent. Such extension is out of scope of the current specification.
5) Firewalled servents:
A firewalled servent that cannot accept incoming TCP connections SHOULD set the Port field to 0, if a Pong has to be sent in reply to a "direct" Ping whose TTL=1 and Hops=0 (this will avoid unsuccessful attempts by other remote servents to connect to the firewalled servent). The neighbor servents that accepted the incoming connection from a firewalled servent and that receives such a Pong is then informed explicitly that the connected servent does not accept incoming TCP connections, so they need not later advertize this firewalled servent in the list of servents currently connected, when answering to an incoming "browsing" Ping (i.e. with TTL=2).
Note however that the presence of Pong Data may change this behavior, as it may provide alternate transport protocols (apart from TCP) to connect to the "firewalled" servent. Such extension is out of scope of the current specification.
3.2.2.3. IPv4 Addresses in standard Pong descriptors
1) Unroutable IPv4 Addresses:
When sending a Pong descriptor reporting an IPv4 address to a remote servent, the reported address SHOULD be one that can be safely connected and is accessible to by this servent.
For example, if the remote servent to which a Pong descriptor is sent is connected via the global Internet, the local servent SHOULD NOT give him any private network Addresses (i.e. in the 10/8, 172.16/12, 192.168/16 IPv4 address blocks) that are not routable via the Internet, and SHOULD set this field to 0. The same rule SHOULD also apply if both servents are connected via distinct private networks.
If a servent receives such Pongs with unroutable Address from remote servents on Internet, the address in these Pongs SHOULD be ignored (as if it was set to 0) even if it matches an accessible address on a local private network, because the reported servents are not accessible or MAY conflict with other hosts on a local private network.
When forwarding those Pongs to any other servent, the unroutable Address field MAY be forced to 0, for example if the other servent is connected from a local private network.
2) Firewalled servents:
A firewalled servent that cannot accept incoming IPv4 connections from the network (for example the Internet) to which it wants to send Pongs SHOULD set this field to 0, if a Pong has to be sent in reply to a "direct" Ping request whose TTL=1 and Hops=0 (this will avoid unsuccessful attempts by other remote servents to connect to the firewalled servent).
Then, the neighbor servents that accepted the incoming connection from a firewalled servent and that receives such a Pong is explicitly informed that the connected servent does not accept incoming connections at an accessible IPv4 address, so they NEED NOT later advertize this firewalled servent in the list of servents currently connected, when answering to an incoming "browsing" Ping (i.e. with TTL+Hops=2).
Note however that the presence of Pong Data extension field may change this behavior, as Pong Data may provide alternate host addresses (apart from IPv4) to connect to the servent. Such extension is not described in the current specification.
3.2.3. Query (0x80) Descriptor Payload
Fields Minimum Speed Search Criteria String NUL (0x00) Terminator (Optional) Query Data
Byte offset 0...1 2...N N+1 N+2...L-1
Port
The TCP port number on which the responding host can accept incoming Gnutella connections.
Minimum Speed
The minimum speed (in kbits/second) of servents that should respond to this message. A servent receiving a Query descriptor with a Minimum Speed field of n kb/s SHOULD only respond with a QueryHits if it is able to communicate at a speed >= n kb/s.
Here are some hints on how this field MAY be set in Query descriptors:
0 = will send results regardless of available upload speed (and even if there's no available upload slot);
1 = accept any result that can be transferred at a guaranteed minimum of 1.5 kbps mean speed (i.e. modem servents SHOULD NOT report hits if they don't have available upload slots or as long as this would break a guarantee offered to other downloaders, unless they implement a reliable queueing system);
2 to 32767 = (currently available downlink bandwidth) * 70%.
32768 to 65535 = SHOULD NOT be used in 0.4 Query descriptors. Unaware legacy servents that receive such Query descriptors will VERY PROBABLY never return QueryHits.
To determine which uplink speed can be guaranteed, the replying upload servent MAY compare the Minimum Speed field value n of the Query to:
min[ (maximum uplink bandwidth) * 70%, (total unused uplink bandwidth) ]
/ [ (uploads in progress) + 2 ]
The restricted range of valid values of the Speed field in QueryHits descriptors (below 32768), and its typical value is now considered informative, and more useful information can now be transported in this Minimum Speed field of Query payloads (See the description of the Speed field in the QueryHits in section 3.2.4).
Search Criteria String
A NUL (i.e. 0x00) terminated search string. The maximum length of this string is bounded by the Payload Length field of the descriptor header.
It SHOULD use an ASCII-compatible encoding and charset. In this version of the protocol, no encoding was specified, but most servents use the ISO-8859-1 character set, but other encodings such as UTF-8 MAY also be used (possibly in conjonction with Query Data), as well as other international character sets (ISO-8859-*, KOI-8, S-JIS, Big5, ...).
It MAY consist in an ASCII SPACE (0x20 = 32) separated list of search keywords, that MAY optionally be terminated by one or more filename extensions (after an ASCII dot, 0x2e=46).
For interoperability with future revisions of the 0.4 protocol, the search Criteria field SHOULD NOT use the ASCII FS separator (0x1c = 28).
Also a Query that contains an empty Search Criteria is valid if it is followed by the required NUL terminator and by some Query Data (so the Payload Length cannot be lower than 4 bytes).
Query Data
This is an optional field, of variable length, it is reserved for extensions of the current version of the protocol. Its maximum length is bounded by the Payload Length field of the header.
When used, this field SHOULD not be excessively large and agreed upon with other Gnutella servent implementors, as this field MAY be specified in a further specification of the protocol.
Popular servent implementations use this field to specify extended search requests based on meta-data, encoded as a NUL-terminated string containing additional XML formated search criterias. Other extensions may follow this second NUL byte.
Servents SHOULD then forward these optional extensions when they are present.
3.2.4. QueryHits (0x81) Descriptor Payload
Fields Number of Hits Port IP Address Speed Result Set Optional QHD Data Servent Identifier
Byte offset 0 1...2 3...6 7...10 11…10+N 11+N...L-17 L-16...L-1
Number of Hits
The number of query hits in the Result Set field (see below).
Port
The port number on which the responding host can accept incoming connections.
IP Address
The IPv4 address of the responding host.
This field is in big-endian format.
Speed
The maximum upload speed (in kilobits/second or bits/millisecond, between 0 and 32767) of the responding host.
This legacy semantic is deprecating, as this does not guarantee a good effective upload speed, as this depends on the effective workload of the host, and its number of available upload slots (so the speed is only informative and servents should not consider it). As the typical value of this field is typically low, the most significant bit of this 16-bit field (sent in little-endian order, as all other Gnutella messages) is now used as a case selector, that allows sending more useful information:
Legacy format 0 Maximum upload speed in kbit/s
Extended format 1 Firewalled
indicator XML
meta-data Unassigned bits,
set to 0 Reserved bits,
set to 0
Bit in Speed field 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Bit in byte 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
Byte offset 1 0
When the extended format is used (bit 15 set to 1), the other fields are defined as below:
Firewalled indicator
This bit is set to 1 in Query and QueryHits payloads, to indicate that the host emitting the message is firewalled. When both the initial Query source and the receiver are firewalled, the responding servent should not respond to the Query, as its QueryHits won't be accessible, even with the Push mechanism (see section 3.2.5)
XML meta-data
This bit is set to 1 in Query payloads, to indicate the desired preference to receive extended meta-data for results sent in Query Hits, using the LimeWire XML format. Only new LimeWire servents honor this bit, and other servents implementing XML meta-data in their results, should be changed to honor this bit too. Any servent that can interpret XML meta-data should set this bit to 1 in its Query to allow receiving them in the extension field Optional QHD Data (see below). Other servents that cannot interpret XML meta-data, or servents that do not want to receive them in Optional QHD Data should clear this bit (see Appendix A.2.3).
Unassigned bits
Currently unassigned, reserved for future use. Until then, these bits should be set to 0.
Reserved bits
Currently unassigned, reserved for future indication of a maximum number of hits to return. Until specification, these bits should be set to 0.
Result Set
A set of responses to the corresponding Query. This set contains Number of Hits elements, each with the Result structure described below (section 3.2.4.1).
The size of the Result Set field (and of each individual Result structure it contains) is bounded by the size of the Payload Length field in the descriptor header and by the Number of Hits field, minus the size of the required Servent Identifier field at end of this payload.
Optional QHD Data
This is an optional field (the Extended QueryHits Data) of variable length, it is reserved for extensions of the current version of the protocol. Its maximum length is bounded by the Payload Length field of the header. (see section 3.2.4.3)
When used, this field SHOULD not be excessively large and agreed upon with other Gnutella servent implementors, as this field MAY be specified in a further specification of the protocol.
Some servents use this field to give other collected information about the Query or about the responding host.
Servents SHOULD then forward these optional extensions when they are present. (see Annexes)
Servent Identifier
This 16-byte string uniquely identifies the responding servent on the network. This is typically some function of the servent’s network address.
The Servent Identifier is instrumental in the operation of the Push descriptor (see below).
QueryHits descriptors are only sent in response to an incoming Query descriptor. A servent should only reply to a Query with a QueryHits descriptor if it contains data that strictly meets the Query Search Criteria.
A QueryHits descriptor SHOULD be initially generated with Hops=0 and the TTL field equal to the number of Hops traversed by the Query descriptor for which it is replies.
The Descriptor Id field in the descriptor header of the QueryHits should contain the same value as that of the associated Query descriptor. This allows a servent to identify the QueryHits descriptors associated with Query descriptors it generated.
3.2.4.1. Result Structure
Fields File Index File Size Shared File Name NUL (0x0) Terminator Optional Result Data NUL (0x0) Terminator
Byte offset 0...3 4...7 8...7+K 8+K 9+K...R-2 R-1
File Index
A number, assigned by the responding host, which is used to uniquely identify the file matching the corresponding query.
File Size
The size (in bytes) of the file whose index is File Index.
Shared File Name
The nul (i.e. 0x0000) terminated shared name of the file whose index is File Index.
Optional Result Data
Nul (i.e. 0x0000) terminated data about the file whose index is File Index. Some servents MAY use this field to return meta-data, encoded with XML. Care MUST be taken not to include any NUL byte in this field.
A text-only extension (some as XML data) in this field SHOULD be terminated by an ASCII File Separator (FS, 0x1c=28) if there is another extension after it.
When this field is not used, the second NUL terminator MUST still be present.
3.2.4.2. Total Payload length of QueryHits descriptors.
The QueryHits descriptor, with its complex structure, is the one which may have the longest payload. For efficiency and to allow more concurrent requests, a servent that receives a Query SHOULD limit the volume of the QueryHits descriptor it sends as a reply. When many hits are detected, servents MAY and SHOULD divide it in reasonable subsets, with a delay between each QueryHits descriptor sent back to the requester.
Any QueryHits descriptor SHOULD NOT need more than 4 seconds to transmit at an average speed per connection of 4kbps, because the servent needs to be able to reply to other incoming requests from its connected neighbors in a timely way. In practice, this limits the total descriptor size to 2KB, unless more uplink bandwidth is available, and if there's agreement (or negociation at connection time) about the maximum descriptor size that can be used between neighbor servents. This can only be guaranteed if replying with a TTL=1 descriptor, which explicitly won't need to be routed across relaying servents, and which has greater priority than other descriptors with larger TTL values.
When a servent replies to an incoming Query descriptor with Hops>0, the QueryHits descriptor with TTL>1 will return back to the initial servent that sent the Query, using the same connection path that was used when receiving the Query. Most neighbor servents will forward incoming QueryHits descriptors (with TTL>1) without breaking them into their individual components. Buffer size limits in relaying servents MAY impact the intended descriptor, as a relaying agent MAY drop a too long QueryHits descriptor.
All servents SHOULD be able to route QueryHits descriptors with total size (including the descriptor header) up to 2KB. And servents SHOULD NOT generate any QueryHits descriptor with more than 64KB total size, unless there's mutual agreement that such large descriptors can be safely exchanged. Between these figures, the maximum descriptor size CAN be reduced in an order of magnitude proportional to the increase of the TTL+Hops value.
3.2.4.3. QHD Data and Result Data Extensions
The QueryHits descriptor allows two kinds of extensions, either per Result or for the whole Result-Set. The choice of placement of these extensions (and their encoding and semantics) is not defined in this document; it's up to the implementers of servents to define and test them, however several things should be noted:
Important notice: This section gives some guidelines, but the "SHOULD" and "MAY" words found here are still being debated, particularly for the semantics of cachable extensions that could be splitted or merged by future Query-caching relaying servents.
1) Extensions encoding:
There MAY be several Result Data extensions for the same Result file. And there may be several QHD Data extensions for the same QueryHits descriptor.
Some extensions are also versatile, i.e. they MAY be used in descriptors with different Payload type.
So each extension MUST start with a distinct identifiable sequence to recognize its type. The following paragraphs give examples for interoperability of some existing extensions.
1.a) XML extensions
start with a ASCII "<" character (0x3c=60), which is part of its actual content and can be a comment, a document type declaration, or the beginning of the document element; an XML extension is terminated by a ASCII FS character if followed by another extension.
Standard XML data uses the UTF-8 encoding by default, but MAY use other explicit encodings. For interoperability, implementers of XML extensions SHOULD produce XML data using an explicit default target namespace and/or a distinctive document element name. Full XML conformance is not required, and relaying servents don't need to validate them.
1.b) URN extensions
start with a ASCII lowercase "u" letter (0x75=117), part of its value, and terminated by a ASCII FS, if followed by another extension.
1.c) Sets of GGEP binary-delimited extensions
(not specified here) are introduced by a magic byte (0xc3=195), and each extension in the set contains a length indicator and an extension-specific signature; however no byte in the GGEP binary-delimited extension may be NUL (0x00) when encoded within a Result Data field (this MAY use a special binary encoding). The whole set of GGEP-compatible binary extensions is terminated by a ASCII FS if followed by another extension.
1.d) BearShareTrailer-type binary extensions
(see Appendix 1) start with a vendor-specific Identifier of 4 ASCII characters (it SHOULD not start by a "<", "u" or 0xc3 byte) and specify their internal data length. Such binary extensions are not designed to be used in a Result Data extension field, but only in QHD Data extension. For newer applications, GGEP-style extensions SHOULD be preferred.
2) When to use QHD Data extensions:
Servents MAY need to split a large incoming result-set into several distinct QueryHits descriptors, each one transmitted after a time delay, to better manage its outgoing bandwidth and allow responding to other requests.
When it needs to do so, it MAY transmit the QHD Data separately in an empty or tiny result set, or MAY have to repeat the QHD Data in each QueryHits descriptors. Servents that send large QHD Data SHOULD design their extension in such a way that this data MAY be transmitted separately (however with the same responding Servent Identifier field), or so that this data MAY be repeated in multiple Query Hits.
So any meta-data associated with a single file would better not be within this QHD Data extension field, and the QHD Data will only be best used either with single-file Results Set, or to transport small servent-related information.
Also QHD data cannot be parsed without first receiving and parsing the Result Set, because there's no length indicator for the Result Set: each Result structure must be scanned while counting them, until Number of Hits have been scanned. For faster routing purpose, a servent MAY also need to limit the Number of Hits allowed in the same Result Set.
Servents SHOULD also avoid transmitting QueryHit descriptors with empty Result Set in order to send only QHD Data extensions, as some legacy servents MAY discard such empty descriptors.
The 0.4 protocol does not specify however that an empty Result Set is invalid. So, new servents SHOULD accept and forward QueryHits descriptors containing an empty Result Set if it contains QHD Data extension.
Finally, all servents SHOULD discard QueryHits descriptors with both empty Result Set (i.e. Number of Hits=0) and no QHD Data extension (i.e. PayLoad Length<=27).
3) When to use Result Data extensions:
To avoid such split of related information, meta-data can be encoded, along with the Result with which it is related, within the Result Data extension. However, pure binary format for these extensions is not possible as Result Data extensions MUST NOT contain NUL bytes; additionally it SHOULD NOT contain the ASCII File Separator (FS, 0x1c = 28) used to terminate text-only extensions with no explicit length. This MAY then require a less efficient (larger) encoding for such meta data within the Result Data extension field of a Result structure.
Using a Result Set with several combined hits saves a little output bandwidth when we compare it to the bandwidth needed when using an equivalent splitted Result Set, because of the headers overhead. However servents SHOULD avoid using descriptors with excessive length, as it may cause buffering problems in remote servents.
4) When to anticipate splitted QueryHits:
If a QueryHits extension is large then it SHOULD be carefully designed to differentiate servent-related information from files-specific meta-data.
Servent-related information SHOULD not be sent within multiple QueryHits descriptor associated with the same Query request (identified by the matching Descriptor ID field in the descriptor header), but only with the first Result Set for that Query. It SHOULD be encoded as a QHD Data extension, and this first Result Set MAY need to be reduced to contain the smallest Result structures.
Large file-related meta-data MAY be encoded as a QHD Data extension instead of a Result Data extension to allow better encoding. In such a case, the Result Set MAY need to be reduced to contain only one Result structure.
A QHD Data extension MAY be designed to include a vector of file-related meta-data, one for each file of the Result Set. However as a Result Set MAY be splitted by relaying agents, with QHD Data extensions replicated in each QueryHits descriptor, it would be difficult to reassociate the meta-data with the correct file. In that case, the extension may include the File Id within each element of the vector encoded in the QHD Data extension.
Until the semantics of splitting (or merging) a Result Set are standardized in a future version of this specification, servents need to be carefully tested with other popular implementations, to determine the appropriate policy, as it MAY break the behavior of an existing extension (for example if QueryHits are digitally signed)
3.2.5. Push (0x40) Descriptor Payload
Fields Servent Identifier File Index IP Address Port Optional Push Data
Byte offset 0...15 16...19 20...23 24...25 26...L-1
Servent Identifier
The 16-byte string uniquely identifying the servent on the network who is being requested to push the file with index File Index. The servent initiating the push request should set this field to the Servent Identifier returned in the corresponding QueryHits descriptor. This allows the recipient of a Push request to determine whether or not it is the target of that request.
File Index
The index uniquely identifying the file to be pushed from the target servent. The servent initiating the Push request should set this field to the value of one of the File Index fields from the Result Set field in the corresponding QueryHits descriptor.
IP Address
The IP address of the host to which the file with File Index should be pushed. This field is in big-endian format.
Port
The port to which the file with index File Index should be pushed.
Optional Push Data
This is an optional field of variable length, it is reserved for extensions of the current version of the protocol, to give identifying information about the content to push, or routing and authenticating information collected from previous QueryHits and/or Pong descriptor. Its maximum length is bounded by the Payload Length field of the descriptor header.
When used, this field SHOULD not be excessively large and agreed upon with other Gnutella servent implementors, as this field MAY be specified in a further specification of the protocol.
Servents SHOULD then forward these optional extensions when they are present.
A servent may send a Push descriptor if it receives a QueryHit descriptor from a servent that doesn’t support incoming connections. This might occur when the servent sending the QueryHits descriptor is behind a firewall. When a servent receives a Push descriptor, it may act upon the push request if and only if the Servent Identifier field contains the value of its servent identifier.
The Descriptor Id field in the Descriptor Header of the Push descriptor should not contain the same value as that of the associated QueryHits descriptor, but should contain a new value generated by the servent’s Descriptor_Id generation algorithm. See the section below entitled "Firewalled Servents" for further details on the Push process.
3.2.6. QRP (0x30) Extension Descriptor Payload
Fields Quary Routing Table Data
Byte offset 0...L-1
Query Routing Table Data
This is a required field consisting in bytes of variable length, it is reserved for extensions of the current version of the protocol, to send compact information about files shared by a servent, in order for the recipient to filter incoming Queries.
This field can be large, but the descriptor should be compacted with an algorithm not specified in this document..
This descriptor was not specified in the original 0.4 protocol. Implementing it in servents is optional (but sending it is required to implement the "Leaf node" mode specified in the UltraPeer extension. It should be sent only to servents implementing the UltraPeer protocol, as indicated in their connection headers. Non-QRP aware servents MAY safely ignore this descriptor, as it is completely compatible with all non QRP-aware 0.4 servents that don't use it.
The routing of this descriptor is not defined in this document. Its presence in a reception flow indicates that the recipient should support the QRP mechanism, most probably to implement the UltraPeer topology extension. Its occurrence in a flow sent by a given servent should be paced according to the QRP protocol extension that defines it. Generally, this message is not intended to be dropped by the recipient. So receiving it while it was not solicited indicates that the servent does not comply strictly to this specification but already implements a part of the QRP extension, but does not comply to its specification.
3.2.7. Bye (0x02) Extension Descriptor Payload
Fields Optional Bye Data
Byte offset 0...L-1
Optional Bye Data
This is an optional field consisting in bytes of variable length, it is reserved for extensions of the current version of the protocol, to specify filters about expected Pong replies. Its maximum length is bounded by the Payload Length field of the header.
When used, this field SHOULD be small and agreed upon with other Gnutella servent implementors, as this field MAY be specified in a further specification of the protocol.
This descriptor was not specified in the original 0.4 protocol. Implementing it in servents is optional. Servents MAY safely ignore this descriptor, as it is completely compatible with all non Bye-aware 0.4 servents.
However a Bye-aware servent MUST set TTL=1 and Hops=0 when sending this descriptor, then it SHOULD NOT send or forward any other descriptor on the same connection path; instead it MAY wait for about 30 seconds that the connection closes (if timeout elapses, it SHOULD close the connection). During that period, the servent MAY ignore all other incoming descriptors coming from the same connection path (with the exception of another incoming Bye Descriptor which MAY be interpreted). The semantic of an sending a Bye descriptor with Hops<>0 is unknown and not defined in this document.
On reception, a Bye-aware servent MUST NOT forward this message; it MAY interpret the Payload to take further actions, but it SHOULD disconnect immediately from the servent which sent this descriptor. The content of the Payload is not specified in this version of the protocol (it will typically contain a NUL terminated status line that gives the reason why a servent will be disconnected, and other Optional Bye Data extensions).
3.2.7. Open-Vendor (0x31) Extension Descriptor Payload
Fields Vendor ID Sub-Selector Optional Sub-Selector Version Optional Vendor Sub-Selector Data
Byte offset 0...3 4...5 6...7 6...L-1
Vendor ID
Case insensitive sequence of 4 characters, identifying the vendor who has authority on the descriptor format and its definition. Vendor ID values are similar to those used in QueryHits (See Appendix 1). The all-zero value is reserved for Vendor support requests and answers. See below.
Sub-Selector
A little-endian 16-bit value specifying a distinct message type defined by that vendor. The 0xFFFF and 0xFFFE values for the Sub-Selector field are reserved for feature request and answers. See below.
Optional Sub-Selector Version
A little-endian 16-bit value specifying a variant for the distinct message type defined by that vendor. The 0x0000 value is assumed if absent. Some Sub-Selectors will be versioned and some won't. The value 0x0001 represents version 0.1.
Optional Vendor Sub-Selector Data
This is an optional field consisting in bytes of variable length. Its format depends on the Vendor ID, Sub-Selector and Optional Sub-Selector Version fields. Its maximum length is bounded by the Payload Length field of the header.
This descriptor was not specified in the original 0.4 protocol. Implementing it in servents is optional. It allows servents to send experimental messages, and test their scalability and routing strategies for networking enhancements without breaking other existing servent implementations.
However servents that implement this descriptor SHOULD also implement the Open-Vendor Feature request/answer 0.1 mechanism. See below.
Non-aware servents MAY safely ignore this descriptor, as it should be completely compatible with all non-Vendor aware 0.4 servents.
However a Open-Vendor-aware servent SHOULD set TTL=1 and Hops=0 when sending this descriptor. In that case, the Descriptor ID field may be used for other usage than identifying the uniqueness of the originator. The semantic of sending a Open-Vendor descriptor with TTL>1, or forwarding it with Hops<>0 is unknown and not defined in this document.
The maximum size of this Descriptor should be below 20 KB for routing purpose, and MUST NOT exceed 64KB with TTL=1 and Hops=0.
On reception, a non-aware servent MUST NOT blindly forward this descriptor; it MAY interpret the Payload to take further actions. The content of the Payload is not specified in this version of the protocol, as it is vendor-specific, and may change over time.
Querying the list if Vendor ID supported in Open-Vendor descriptors:
One servent A can query which Vendor IDs the remote servent B support: it sends an Open-Vendor descriptor with the Vendor ID field set to all-zeroes, and sets the Sub-Selector field to 0xFFFF and Version field to 0x0001. If B supports some Open-Vendor descriptors, it will answer by sending back another Open-Vendor descriptor with the Vendor ID field set to all-zeroes, and the Sub-Selector field set to 0xFFFE, and the Optional Sub-Selector version field set to 0x0001; the Optional Vendor Sub-Selector Data field will contain the list of Vendor IDs supported.
Querying if one or more specific Vendor ID are supported in Open-Vendor descriptors:
This uses the same mechanism, unless that the servent A will insert one or more Vendor ID values in the Optional Vendor Sub-Selector Data field. The servent B will reply by listing only those Vendor IDs values that are supported in the requested set. If B does not support any of these value, it can explicitly reply with an empty list of Vendor ID values in the Optional Vendor Sub-Selector Data field of its answer. This type of request allows restricting the volume of data exchanged between servents because the servent B may support a large set of Vendor-specific extensions.
Querying the list of Open-Vendor Sub-selectors supported for a specific Vendor ID:
One servent A can query which Sub-Selectors the remote servent B support: it sends an Open-Vendor descriptor with the Vendor ID field set according to the Sub-Selectors to query, and sets the Sub-Selector field to 0xFFFF and Version field to 0x0001. If B supports Open-Vendor descriptors for that Vendor ID, it will answer by sending back another Open-Vendor descriptor with that same Vendor ID, and the Sub-Selector field set to 0xFFFE, and the Optional Sub-Selector version field set to 0X0001; the Optional Vendor Sub-Selector Data field will contain the list of Sub-Selectors supported with that Vendor ID.
A more precise query that takes version fields or identification fields into account may be used with Sub-Selector Version field set to 0x0002, in the 0xFFFF "Feature Query" Sub-Selector, or in the 0xFFFE "Feature Answer" Sub-Selector. In that case, each Sub-Selector value listed in the Optional Vendor Sub-Selector Data field will be followed by a length byte and the version information.
3.2.8. Standard-Vendor (0x32) Extension Descriptor Payload
Fields Vendor ID Sub-Selector Optional Sub-Selector Version Optional Vendor Sub-Selector Data
Byte offset 0...3 4...5 6...7 6...L-1
The structure of this descriptor is completely identical to the structure of the 0x31 descriptor type. In fact it is highly recommended that servents that implement any 0x32 descriptor also accepts receiving its 0x31 experimental variant with exactly the same Payload. The feature query mechanism can also be used to see if a legacy servent supports the approved 0x32 variant. However a servent that implements the approved 0x32 variant should no more reply with the experimental 0x31 variant of the descriptor.
This descriptor was not specified in the original 0.4 protocol. Implementing it in servents is optional. It allows servents to send experimental vendor-specific messages, for networking enhancements without breaking other existing servent implementations, but it restricts its definition to a stable and documented specification. Experimental Open-Vendor 0x31 descriptors may be safely ignored, but detecting a Standard-Vendor 0x32 message gives a hint to the implementor about which Open-Vendor descriptor they should monitor and implement in their next releases as per the available specification.
This descriptor may have special routing strategies. In that case, this descriptor MUST be sent with TTL=1 and Hops=0, and its unique 128-bit Descriptor ID MAY be used for other purpose. If the Standard-Vendor descriptor uses a standard forwarding strategy, it should include a unique 128-bit Descriptor ID which MUST be preserved while incrementing the Hops field and decrementing the TTL field. Implementing effectively an Standard-Vendor descriptor MAY require complex caching strategies. For testing purposes, and to limit the impact of possible bugs, all tests SHOULD be performed using the Experimental Open-Vendor descriptors, so that it won't harm other conforming servents.
Only when the implementation passes the specification compliance tests with other major servent implementations present on the network that implement this Open-Vendor message, the implementor SHOULD replace any occurence of experimental 0x31 descriptors for that Vendor ID and Sub-Selector by 0x32 descriptors in requests and in replies to incoming approved 0x32 descriptors. Going to the Standard-Vendor state will allow more reachability of this Open-Vendor descriptor on the network.
When answering to an incoming Open-Vendor 0x31 descriptor, Standard-Vendor 0x32 descriptors MUST NOT be used, whatever its content, unless the incoming Vendor ID matches the receiving servent implementation and the receiving servent is fully compliant to the Standard-Vendor descriptor specification.
When answering to an incoming Standard-Vendor 0x32 descriptor, Open-Vendor 0x31 descriptors SHOULD NOT be used, other Standard-Vendor 0x32 descriptors SHOULD be used instead. Other type of answers MAY include other descriptors such as Ping, Pong, Query, QueryHits, Push and Bye, or any other action defined in the Standard-Vendor descriptor.