Gnutella Protocol Development
Home :: Developer :: Press :: Research :: Servents
4.1 Gnutella Generic Extension Protocol (GGEP) Source - GGEP proposal v 0.51. The Gnutella Generic Extension Protocol (GGEP) allows arbitary extensions in Gnutella messages. A GGEP block is a framework for other extensions. If you wish to implement a new extension to a packet, you MUST do so inside GGEP. Some extensions that were invented before GGEP (XML metadata for example) are allowed to existoutside GGEP. Servents are RECOMMENDED to implement GGEP. However, all servents MUST pass on GGEP extension blocks inside Gnutella messages. servents that have support the forwarding of all packets that contain GGEP extensions (whether or not they can process them), MUST include a new header in the Gnutella 0.6 connection handshake indicating this support. This will allow other servents to know what types of packets this servent can accept. The format of this header is GGEP : majorversion.minorversion As the current version of GGEP is 0.5 when this was written the header would be GGEP: 0.5 Servents SHOULD remove any GGEP blocks from Ping, Pong and Push messages before sending those messages to hosts that have not indicated GGEP support. For the original GGEP documentation see http://groups.yahoo.com/group/the_gdf/files/Proposals/GGEP/ 4.1.1 GGEP Format A GGEP block always starts with a magic byte used to help distinguish GGEP extensions from legacy data which may exist. It must be set to the value 0xC3. When a GGEP block is used between the nulls in a result in a Query Hits message, it is not allowed to contain any null bytes. This requires some special tricks in the field format. The magic byte is followed by any number of extensions. They SHOULD be processed in the order in which they appear. The following is the format of each extension: Bytes used: Field Name: 1 Flags 1-15 ID 1-3 Data Length x Extension Data Flags: These are options which describe the encoding of the extension header and data. Bit Name 7 Last Extension. When set, this is the last extension in the GGEP block. 6 Encoding. The value contained in this field dictates the type of encoding which should be applied to the extension data (after possible compression). 0 = There is no encoding on the data. 1 = The data is encoded using the COBS scheme. Details about the COBS encoding scheme can be found at http://www.acm.org/sigcomm/sigcomm97/ papers/p062.pdf 5 Compression. The value contained in this field dictates the type of compression that should be applied to the extension data. 0 = The extension data has not been compressed. 1 = The extension data should be decompressed using the deflate algorithm. One should only compress data if doing so will make a material difference in the resulting packet size. Details about the Deflate compression scheme may be found at http://www.gzip.org/zlib/ and http://www.faqs.org/rfcs/rfc1951.html 4 Reserved. This field is currently reserved for future use. It must be set to 0. 3-0 ID Len Value 1-15 can be stored here. Since this will not be zero, it ensures this byte will not be 0x0. ID: The raw binary data in this field is the extension ID. The length of this field can range between 1 and 15 bytes, and is determined by the Flags field. See section 2.3.2 below on suggestions and rules for creating extension IDs. No byte in the extension header may be 0x0. Data Length: This is the length of the raw extension data. Please note that most Gnutella clients will drop messages, and possibly connections if the message size is larger than a certain threshold (which varies according to message type). Please pay attention to these limits when creating and bundling new extensions. This field uses an encoding technique that ensures that 0x0 is never the value of any byte. Steps were also taken to ensure that the encoding is compact. In this technique, a length field is the concatenation of length chunks. The format of each length chunk (which contains 6 bits of length info) is described in bit level below: Format: 76543210 MLxxxxxx M = 1 if there is another length chunk in the sequence, else 0 L = 1 if this is the last length chunk in the sequence, else 0 xxxxxx = 6 bits of data. 01aaaaaa ==> aaaaaa (2^6 values = 0-63) 10bbbbbb 01aaaaaa ==> bbbbbbaaaaaa (2^12 values = 0-4095) 10ccccccc 10bbbbbb 01aaaaaa ==> ccccccbbbbbbaaaaaa (2^18 values = 0-262143) Boundary Cases: 0 = 01 000000b = 0x40 63 = 01 111111b = 0x7f 64 = 10 000001 01 000000b = 0x8140 4095 = 10 111111 01 111111b = 0xbf7f 4096 = 10 000001 10 000000 01 000000b = 0x818040 262143 = 10 111111 10 111111 01 111111b = 0xbfbf7f As you see, when the bits are concatenated, the number is in big endian format. Extension Data The actual extension data. The format of this field varies between extensions. A servent that does not recognize and extension will not be able to parse the Extension data, but since the length of this field is specified by Data Length, it can still skip to the next extension. Note that extensions MAY be empty. 4.1.2 Creating Extension IDs The Extension ID field in the GGEP header is a binary field consisting of between 1 and 15 bytes. It cannot contain the byte 0x0, and one must be able to compare IDs with a simple binary comparison. Aside from those rules, GGEP does not mandate any particular format, but does encourage the creation of short IDs that are free from conflicts. One should also note that Extension IDs are meant to be consumed by machines. Still, the following rules apply. GDF Registered Extensions: Any Extension ID of less than 4 bytes MUST be stored in the appropriate GDF database. Any Extension ID of less than 3 bytes must also be approved by the GDF. The format of the extension data must also be registered. VendorID Extensions This simple technique allows for the creation of ExtensionIDs based upon uses the following format VendorID.BinaryID VendorID for a Gnutella servent is a 4 byte value that has been registered in the GDF Peer Codes database. In the QueryHit Descriptor, this case is case insensitive. With ExtensionIDs, the case matters, as one must be able to perform a binary comparison on the ID. This means an ExtensionID of "SWAP.1" and "swap.1" are different, but both "belong" the vendor who owns the code "SWAP." This technique may be good for experimental and strictly vendor- specific extensions, but should be avoided for extension that may be useful for other vendors as well. Marking an extension by a vendor ID makes it harder for other vendors to use the extension in their servents. Extension implementers SHOULD publish the ID, format, and expected data size for their extensions in the GDF database called "GGEP Extensions." located at http://groups.yahoo.com/group/the_gdf/database?method=reportRows&tbl=10 (Requires GDF membership)