Gnutella Protocol Development
Home :: Developer :: Press :: Research :: Servents
2.4 Standard Message Architecture Source - Latest draft Once a servent has connected successfully to the network, it communicates with other servents by sending and receiving Gnutella protocol messages. Each message is preceded by a Message Header with the byte structure given below. Note 1: One IP packet may contain several Gnutella messages, and one Gnutella message may be split up on multiple IP-packets. This means one can never assume a Gnutella message ends when the chunk of data read from the socket ends. Note 2: All fields in the following structures are in little-endian byte order unless otherwise specified. Note 3: All IP addresses in the following structures are in IPv4 format. For example, the IPv4 byte array 0xD0 0x11 0x32 0x04 byte 0 byte 1 byte 2 byte 3 represents the dotted address 208.17.50.4. 2.4.1 Message Header The message header is 23 bytes divided into the following fields. Bytes: Description: 0-15 Message ID/GUID (Globally Unique ID) 16 Payload Type 17 TTL (Time To Live) 18 Hops 19-22 Payload Length Message ID A 16-byte string (GUID) uniquely identifying the message on the network. Servents SHOULD store all 1's (0xff) in byte 8 of the GUID. (Bytes are numbered 0-15, inclusive.) This serves to tag the GUID as being from a modern servent. Servents SHOULD initially store all 0's in byte 15 of the GUID. This is reserved for future use. The other bytes SHOULD have random values. Payload Indicates the type of message Type 0x00 = Ping 0x01 = Pong 0x02 = Bye 0x40 = Push 0x80 = Query 0x81 = Query Hit Other Gnutella messages can be used, but if so the servent MUST first make sure that the remote host supports this new message type. This can be done using handshaking headers. TTL Time To Live. The number of times the message will be forwarded by Gnutella servents before it is removed from the network. Each servent will decrement the TTL before passing it on to another servent. When the TTL reaches 0, the message will no longer be forwarded (and MUST not). Hops The number of times the message has been forwarded. As a message is passed from servent to servent, the TTL and Hops fields of the header must satisfy the following condition: TTL(0) = TTL(i) + Hops(i) Where TTL(i) and Hops(i) are the value of the TTL and Hops fields of the message, and TTL(0) is maximum number of hops a message will travel (usually 7). Payload The length of the message immediately following Length this header. The next message header is located exactly this number of bytes from the end of this header i.e. there are no gaps or pad bytes in the Gnutella data stream. Messages SHOULD NOT be larger than 4 kB. The Payload Length field is the only reliable way for a servent to find the beginning of the next message in the input stream. Therefore, servents SHOULD rigorously validate the Payload Length field for each message received. If a servent becomes out of synch with its input stream, it SHOULD close the connection associated with the stream since the upstream servent is either generating, or forwarding, invalid messages. Abuse of the TTL field in broadcasted messages (Query) will lead to an unnecessary amount of network traffic and poor network performance. Therefore, servents SHOULD carefully check the TTL fields of received query messages and lower them as necessary. Assuming the servent's maximum admissible Query message life is 7 hops, then if TTL + Hops > 7, TTL SHOULD be decreased so that TTL + Hops = 7. Broadcasted messages with very high TTL values (>15) SHOULD be dropped. Immediately following the message header, is a payload consisting of one of the following messages. 2.4.2 Ping (0x00) Ping messages MAY contain a GGEP extension block (see Section 2.3), but no other payload. 2.4.3 Pong (0x01) Pong messages contains information about a Gnutella host. The message has the following fields Bytes: Description: 0-1 Port number. The port number on which the responding host can accept incoming connections. 2-5 IP Address. The IP address of the responding host. Note: This field is in big-endian format. 6-9 Number of shared files. The number of files that the servent with the given IP address and port is sharing on the network. 10-13 Number of kilobytes shared. The number of kilobytes of data that the servent with the given IP address and port is sharing on the network. 14- OPTIONAL GGEP extension block. (see Section 2.3) Pong messages are only sent in response to an incoming Ping message. It is valid for more than one Pong message to be sent in response to a single Ping message. This enables host caches to send cached servent address information in response to a Ping request. The Message ID of a Pong message MUST be the Message ID of the Ping message it is sent in reply to. The fields specifying the number of shared files and the number of kilobytes shared was intended to allow one to measure the amount of data available on the network. With a very large Gnutella network, and minimized Ping and Pong message traffic, this can no longer be done. Still, these fields SHOULD be filled out correctly. 2.4.4 Query (0x80) Since Query messages are broadcasted to many nodes, the total size of the message SHOULD not be larger than 256 bytes. Servents MAY drop Query messages larger that 256 bytes, and SHOULD drop Query messages larger than 4 kB. A Query message has the following fields: Bytes: Description: 0-1 Minimum Speed. The minimum speed (in kb/second) of servents that should respond to this message. A servent receiving a Query message with a Minimum Speed field of n kb/s SHOULD only respond with a Query Hit if it is able to communicate at a speed >= n kb/s. 2- Search Criteria. This field is terminated by a NUL (0x00). See section 2.2.7.3 for rules and information on how to interpret the Search Criteria Rest OPTIONAL extensions block. The rest of the query message is used for extensions to the original query format. The allowed extension types are GGEP, HUGE and XML (see Section 2.3 and Appendixes 1 and 2). If two or more of these extension types exist together, they are separated by a 0x1C (file separator) byte. Since GGEP blocks can contain 0x1C bytes, the GGEP block, if present, MUST be located after any HUGE and XML blocks. The type of each block can be determined by looking for the prefixes "urn:" for a HUGE block, "<" or "{" for XML and 0xC3 for GGEP. The extension block SHOULD NOT be followed by a null (0x00) byte, but some servents wrongly do that. 2.4.4.1 Flags field semantics The first two bytes of the Query message payload were previously used to signal the minumum speed required of the sharing host. The value was in little-endian format. This use has now been deprecated. The new semantic is in big-endian format. The higher bit in big-endian format (bit 15) is used as a flag to detect queries with the new semantic. This bit MUST be set. If the bit 15 is not set, then this is a query with the legacy minspeed semantic, and the field MAY be ignored, but servents MUST NOT ignore the entire query. If the bit 15 is set, then this is a query with the new semantic. Note however that bit 15 in the new semantics was the bit 7 in the legacy one (encoding for 128 kbps). In the new semantic, each bit (except for bit 15) is used as a flag, mostly to indicate compatibility with new gnutella extensions. The affectation of each bit is as follow : * Bit 14 : Firewalled indicator. The host who sent the query is unable to accept incoming connections. This flag can be used by the remote servent to avoid returning Query Hits if it is itself firewalled, as the requesting servent will not be able to download any files. * Bit 13 : XML Metadata. Set this bit to 1 if you want the sharing servent to send XML Metadata in the Query Hit. This flag has been assigned to spare bandwidth, returning metadata in queryHits only if the requester asks for it. If this bit is not set, the sharing host MUST NOT send XML metadata in return Query Hit messages. * Bit 12 : Leaf Guided Dynamic Query. When the bit is set to 1, this means that the query is sent by a leaf which wants to control the dynamic query mechanism. This is part of the Leaf guidance of dynamic queries proposal. This information is only used by the ultrapeers shileding this leave if they implement leaf guidance of dynamic queries. If this bit is set in a Query from a Leaf it indicates that the Leaf will respond to Vendor Messages from its Ultrapeer about the status of the search results for the Query. * Bit 11 : GGEP "H" allowed. If this bit is set to 1, then the sender is able to parse the GGEP "H" extension which is a replacement for the leagacy HUGE GEM extension. This is meant to start replacing the GEM mecanism with GGEP extensions, as GEM extensions are now deprecated. * Bit 10 : Out of Band Query. This flag is used to recognize a Query which was sent using the Out Of Band query extension. * Bit 9 : Reserved for a future use. * Bits 0-8 : Indicates the maximum number of query hits expected, 0 if no maximum. This does not mean that no more query hits may be returned, but that the query should be propagated in a way that will cause the specified number of hits. 2.4.5 Query Hit Query Hit messages has the following fields: Bytes: Description: 0 Number of Hits. The number of query hits in the result set (see below). 1-2 Port. The port number on which the responding host can accept incoming HTTP file requests. This is usually the same port as is used for Gnutella network traffic, but any port MAY be used. 3-6 IP Address. The IP address of the responding host. Note: This field is in big-endian format. 7-10 Speed The speed (in kb/second) of the responding host. 11- Result Set. A set of responses to the corresponding Query. This set contains Number_of_Hits elements, each with the following structure: Bytes: Description: 0-3 File Index. A number, assigned by the responding host, which is used to uniquely identify the file matching the corresponding query. 4-7 File Size. The size (in bytes) of the file whose index is File_Index. 8- File Name. The name of the file whose index is File_Index. Terminated by a null (i.e. 0x00) x Extensions block. Allowed extension types are HUGE, GGEP and plain text metadata. This field is terminated by a null (0x00), even if there are no extensions (resulting in a double null). Also, the extensions block itself MUST NOT contain any null bytes. If two or more of these extension types exist together, they are separated by a 0x1C (file separator) byte. Since GGEP blocks can contain 0x1C bytes, the GGEP block, if present, MUST be located after any HUGE and plan text blocks. The type of each block can be determined by looking for the prefixes "urn:" for a HUGE block, 0xC3 for GGEP and anything else is probably plain text metadata. Plain text metadata is intended to be displayed directly to the user. It was first invented by Gnotella (a now discontinued Gnutella servent) to tag MP3 files. Examples: "192 kbps 44 kHz 3:23" "120 kbps(VBR) 44kHz 3:55" (variable bitrate) Other plan text formats MAY be used. x RECOMMENDED extra block. This block is not required, but strongly recommended. It is sometimes called EQHD, or (incorrectly) just QHD. It has the following format: Bytes: 0-3 Vendor Code. Four case-insensitive characters representing a vendor code. For example "LIME" for LimeWire. See registered codes and register yours at http://groups.yahoo.com/group/the_gdf/database? method=reportRows&tbl=6 (Requires GDF membership) 4 Open Data Size. Contains the length (in bytes) of the Open Data field. Set to 2 in most current implementations, and 4 in those that support XML metadata outside GGEP (see Section 2.3 and Appendix 2). The Open Data area MAY be larger to allow future extensions. x Open Data. Contains two 1-byte flags fields with the following layout and in the specified order: bit: Description: 7,6 Reserved for future use 5 flagGGEP 4 flagUploadSpeed 3 flagHaveUploaded 2 flagBusy 1 Reserved for future use 0 flagPush The first flag byte can be viewed as an "enabler" for the flags in the second byte, the "setter". Only those bits that were enabled must be considered by the servent as being valid. This logic is reversed for flagPush, which is set in the first byte and enabled in the second. The enabling byte allows you to know which flags are supported by a given servent. Bits 5,4,3,2 in the first byte MUST be set if and only if the corresponding flag in the second byte is meaningful. Bit 0 in the second byte MUST be set if and only if the corresponding flag in the second byte is meaningful. Yes, the order is reversed for this flag. flagGGEP is set is set if and only if the private data block (see below) contains a GGEP block. flagUploadSpeed is set if and only if the Speed field of the QueryHit message contains the highest average transfer rate (in kbps) of the last 10 uploads. Otherwise Speed field contains the hosts total upload speed as set by the user, and therefore less reliable. flagHaveUploaded is set if and only if the servent has successfully uploaded at least one file. flagBusy is set if and only if the all of the servent's upload slots are currently full. flagPush is set if and only if the servent is firewalled or cannot accept incoming TCP connections for any other reason. The reserved flags MUST not be set, unless they are used for a future extension. If XML metadata (Appendix 2) is included in the current Query Hit message, the following 2 bytes of Open Data area will contain the size of the XML block. The XML block itself is placed in the private area (see below). x Private Data. Undocumented vendor-specific data. This field continues till the servent Identifier, which uses the last 16 bytes of the message. If the flagGGEP in the open data block is set, this block contains a GGEP (see Section 2.3) extension block. The GGEP block starts with a 0xC3 byte. Any data before or after the GGEP block is vendor-specific data, and MUST be ignored, if not recognized. Servents are NOT RECOMMENDED to use the private data area for vendor specific data. Servents SHOULD use GGEP extensions instead. If the Open Data area indicates an XML block is will also be placed in the private area (see Appendix 2). Assuming that the two bytes in the Open Data area specifies an XML block of m bytes, that block can be found by extracting the last m bytes of the private area. Both GGEP and XML can exist in the same Private Data area, but XML SHOULD be implemented inside GGEP. [TODO: How about the nul after the XMP block? What is it good for?] Last 16 Servent Identifier. A 16-byte string uniquely identifying the responding servent on the network. This SHOULD be constant for all Query Hit messages emitted by a servent and is typically some function of the servent's network address. The servent Identifier is mainly used for routing the Push Message (see below). 2.4.6 Push (0x40) A Push message has the following fields: Bytes: Description: 0-15 Servent Identifier. The 16-byte string uniquely identifying the servent on the network who is being requested to push the file with index File_Index. The servent initiating the push request MUST set this field to the Servent_Identifier returned in the corresponding QueryHit message. This is used to route the Push message to the sender of the Query Hit message. 16-19 File Index. The index uniquely identifying the file to be pushed from the target servent. The servent initiating the push request MUST set this field to the value of one of the File_Index fields from the Result Set in the corresponding QueryHit message. 20-23 IP Address. The IP address of the host to which the file with File_Index should be pushed. This field is in big-endian format. 24-25 Port. The port number the receiver of this message should push to. 26- OPTIONAL GGEP extension block. (see Section 4.1)