Gnutella Generic Extension Protocol (GGEP)

Document Revision Version 0.51

Protocol Version 0.5

February 4, 2002

Jason Thomas (jason@jasonthomas.com)

 

Change History

Document Revision Protocol Version Date Changes
0.51 0.5 02/4/2002
  • This GGEP document has a revision number, the protocol has a version number.  Both are listed in this table, and the document header.  This distinction will permit future cleanup of the document after the protocol has been approved.
  • Minor documentation cleanup
  • The section about using (not the ability to use) GUIDs as extension IDs removed.
0.5 0.5 01/27/2002
  • Flags completely redone
  • IDs can only be between 1-15 bytes
  • Connection Handshake Headers
  • Ping and Pong messages may now include GGEP blocks
0.4 N/A 01/15/2002
  • The fixed 2 byte length munging scheme has been replaced with a variable length scheme
  • 1 byte extension IDs (approved ones), no longer use the extension ID length field, the field is omitted and this fact is communicated in the extension flags
  • Extensions with no payload no longer need a data length field.  This state is indicated in the extension flags.
  • The extension prefix no longer contains the length of the GGEP block.  Instead, a bit in the extension prefix indicates the prefix is the last one in the block.
  • Push messages may now include GGEP blocks
0.31 N/A 01/08/2002
  • Changed the GEM field separator from 0xfc to 0x1c in the Peaceful Co-Existence section
0.3 N/A 01/03/2002
  • Length field in the extension prefix is defined to be the total length of each subsequent extension.
  • Length field in the extension is defined to be the length of the raw extension data.
  • Better guidance on the creation of Extension IDs
  • Peaceful Co-Existence
0.2 N/A 12/20/2001
  • Length field munging is done w/o use of bit fields.
  • The extension ID is prefixed with a 1 byte length identifier rather than a postfix of a specified byte.
  • The number of extension field in the extension prefix has been replaced by a field listing the total length of the extensions.
  • Encoding and compression bits within the extension flags have been changed to fields of those types.
0.11 N/A 12/17/2001
  • The extension prefix is shortened (flags and version were eliminated)
  • Base64 was dropped in favor of COBS
  • ExtensionIDs are now variable length.  This changed the extension prefix.
  • Some Extension Flags have been renamed.
0.1 N/A 12/13/2001
  • The first draft of the proposal

Background

The Gnutella 0.4 Protocol is a bare bones protocol for sharing files. Over time, servent implementers have targeted different sections of the protocol for extensions. An example of this is the Bear Share Trailer that was added to the Query Hit.

Today, proposals exist to pack yet more data into the existing protocol. Unfortunately, many of these were designed for a single purpose and will close off future extensions.

This document describes a standardized format for the creation of arbitrary new extensions. This new standard allows for :

Connection Handshake Headers

Servents that have support the forwarding of all packets that contain GGEP extensions (whether or not they can process them), must include a new header in the Gnutella 0.6 connection handshake indicating this support.  This will allow other servents to know what types of packets this servent can accept.  The format of this header is

GGEP : majorversion'.'minorversion

Format

Length Encoding

The length field uses an encoding technique that ensures that 0x0 is never the value of any byte.  Steps were also taken to ensure that the encoding is compact. In this technique, a length field is the concatenation of length chunks.  The format of each length chunk (which contains 6 bits of length info) is described in bit level below:

Format

76543210			
MLxxxxxx
			

M = 1 if there is another length chunk in the sequence, else 0

L = 1 if this is the last length chunk in the sequence, else 0

xxxxxx = 6 bits of data.

01aaaaaa ==> aaaaaa (2^6 values = 0-63)

10bbbbbb 01aaaaaa ==> bbbbbbaaaaaa (2^12 values = 0-4095)

10ccccccc 10bbbbbb 01aaaaaa ==> ccccccbbbbbbaaaaaa (2^18 values = 0-262143)

Boundary Cases

As you see, when the bits are concatenated, the number is in big endian format.

Non-Error Checking Parsing Pseudo Code

int length = 0;
byte b;
do
{
   b = *extensionbuf++;
   length = (length << 6) | (b&0x3f);
} while (0x40 != b & 0x40);		
		

GGEP Block

GGEP Extension Prefix
GGEP Extension Header 0
GGEP Extension Data 0
.
.
.
GGEP Extension Header N
GGEP Extension Data N

Extension Block

Extension blocks may contain an arbitrary number of GGEP blocks packed one against another.  Although this behavior is allowed, developers are encouraged to merge multiple GGEP blocks into a single GGEP block.  If a newer extension format is created (either a new version of GGEP or another format altogether), they will appear AFTER the last GGEP block of an earlier version.

GGEP Block 0
.
.
.
GGEP Block N

GGEP Extension Prefix

Byte Positions Name Comments
0 Magic This is a magic number is used to help distinguish GGEP extensions from legacy data which may exist.  It must be set to the value 0xC3.

GGEP Extension Header

Field Order Bytes Required Name Comments
0 1 Flags These are options which describe the encoding of the extension header and data.
1 1-15 ID

The raw binary data in this field is the extension ID.  See Appendix A on suggestions for creating extension IDs.  No byte in the extension header my be 0x0. 

2 1-3 Data Length

This is the length of the raw extension data.  This field is persisted according to the length encoding rules listed above.

GGEP Extension Header Flags

Bit Positions Name Comments
7 Last Extension When set, this is the last extension in the GGEP block.
6 Encoding The value contained in this field dictates the type of encoding which should be applied to the extension data (after possible compression).
5 Compression The value contained in this field dictates the type of compression that should be applied to the extension data.
4 Reserved This field is currently reserved for future use.  It must be set to 0.
3-0 ID Len Value 1-15 can be stored here.  Since this will not be zero, it ensures this byte will not be 0x0.

Encoding Types

ValuesTypes
0 There is no encoding on the data.
1 The data is encoded using the COBS scheme.

Compression Types

ValuesTypes
0 The extension data has not been compressed.
1 The extension data should be decompressed using the deflate algorithm. 

Usage

Ping Message

The Clip2 document states that ping messages have no payloads.  Given this definition, existing servent vendors drop connections that issue pings containing payloads.  As such, developers are suggested to allow widespread distribution of GGEP enabled servents before releasing extensions for the ping message.  Similarly, they should only forward ping messages containing GGEP extensions to other servents who have indicated their support of GGEP via the handshake header. 

Servents are instructed to forward all ping messages containing GGEP blocks they do not understand regardless of ping/pong reduction schemes.

The payload of the ping message is now defined to be:

Extension Block

Pong Message

Being that there are not currently extensions to the pong message and the last field has a fixed length, it is easy to extend this message to include GGEP.  That said, since the current pong message currently has a fixed length, existing servents may drop connections if they receive pongs containing extensions.  To this end, developers are suggested to only include GGEP blocks in response to ping messages containing GGEP blocks, as that will guarantee that the pathway is GGEP enabled.

The payload of the pong message is now defined to be:

Port
IP Address
Num Files Shared
Num KBytes Shared
Extension Block

Query Message

The Query message is redefined in a way as not to break Gnutella 0.4 compatible servents.

MinSpeed
Criteria
0x0
Extension Block

QHD Private Data

Until now, servent vendors have been left to define the format of this opaque field.  Ones that are able to write to this field, only read from it if they recognize their vendor code.  To this end, we must first indicate to all clients that the private data section contains a GGEP block, so they know to crack open the field.  To do this, we use an open data bit 5 of the Flags and Flag2 fields (the format of this field is defined in the Clip2 document).  Remember that the ability to crack the GGEP block does not mean that one is able to understand the extensions contained within.

For compatibility with a couple of existing servents that already use this field, it is necessary to search for the GGEP block by looking for the first occurance of the GGEP magic byte. 

Private Data Format for servents that already use the private area and are trying to retain compatibility with older versions of their code  This will be phased out over time:

Servent Specific Private Data Backwards Data that are guaranteed not to contain the GGEP magic byte 0-XXX bytes (usually 1)
GGEP Block YYY bytes
More Servent Specific Private Data for backwards compatibility section that can completely be ignored. 0-ZZZ bytes

Private Data Format for new servents:

Extension Block

Note that the code necessary to find the GGEP block in both formats is identical.

Query Hit Result

Servent vendors must be careful to ensure that 0x0 does not appear in any extension data placed embedded into the Query Hit Result.  One does this by using any of the available encoding options.

File Index
File Size
File Name
0x0
Extension Block

Push Message

Being that there are not currently extensions to the push message and the last field has a fixed length, it is easy to extend this message to include GGEP.  That said, since the current push message currently has a fixed length, it is possible that old servents will validate against that length, and throw out push messages that include GGEP extensions.  To this end, servents should only send push messages containing extension blocks to other servents that have indicated GGEP support via the connection handshake.

The payload of the push message is now defined to be:

Servent ID
File Index
IP Address
Port
Extension Block

Notes/Issues

Existing Standards

Implementation Notes.

Appendix A - Creating Extension IDs

The Extension ID field in the GGEP header is a binary field consisting of between 1 and 15 bytes.  It cannot contain the byte 0x0, and one must be able to compare IDs with a simple binary comparison.  Asisde from those rules, GGEP does not mandate any particular format, but does encourage the creation of short IDs that are free from conflicts.  One should also note that Extension IDs are meant to be consumed by machines.  To this end, the following techniques are recommended.  If one has a strong need to create an alternative format, be sure to avoid conflicts with the following schemes.

GDF Registered Extensions

Any Extension ID of less than 4 bytes must be stored in the appropriate GDF database.  Any Extension ID of less than 3 bytes must be also be approved by the GDF.  The format of the extension data must also be registered.

VendorID Extensions

This simple technique allows for the creation of ExtensionIDs based upon uses the following format VendorID|'.'|BinaryData VendorID for a Gnutella servent is a 4 byte value that has been registered in the GDF Peer Codes database.  In the QueryHit Descriptor, this case is case insensitive.  With ExtensionIDs, the case matters, as one must be able to perform a binary comparison on the ID.  This means an ExtensionID of "SWAP.1" and "swap.1" are different, but both "belong" the vendor who ones the code "SWAP."

Appendix B - Peaceful Co-Existence

It is unfortunately the case that GGEP extensions will co-exist with legacy extensions for quite some time.  In some cases, both may exist in the same space allocated for extensions.  The below algorithm for interpreting extension fields will help sort out such this co-existence.

  1. Peek 1 byte, if is GGEP magic goto 2, else goto 6
  2. Read GGEP magic byte
  3. Read and process a single GGEP extension (extension headers, extension data)
  4. If the extension flag do not have the Last_Extension bit set, goto 3
  5. Goto 7
  6. Read until end of extension field, or 0x1c, whichever comes first.  It will be up to you to determine which legacy extension exists in this space.
  7. If no bytes left, quit
  8. Peek 1 byte, if 0x1c or 0x0, advance 1 byte
  9. Goto 1