Gnutella Protocol Development
Home :: Developer :: Press :: Research :: Servents
3.3 The download mesh 3.3.1 Purpose of the download mesh The purpose of the download mesh is to help people finding more sources for the files they are looking for, without needing to requery the network. These supplementary sources are called alternate locations, or alt-locs in this document. With the wide deployment of downloading one same file from multiple sources (also called sometimes swarm downloading, although this expression has a more specific meaning), servents can benefit from knowing about several sources for the files they want to download. There is no original proposal for the download mesh. The roots of the download mesh specification can be found in the HUGE proposal. The wide adoption of the HUGE protocol brought the creation of the download mesh, using URNs as the way to uniquely identify a given file on the GNet. Note that not all parts of the HUGE specification are related to the download mesh, for example querying by URN is unrelated to the way the download mesh works. Basically, the solution choosen to construct the download mesh is to try to make each servent aware of those other servents that share the same files on the GNet in a decentralized way. There are many ways to do that, the solution presented here has been choosen as a good compromise between efficiency and lower bandwidth usage. The download mesh is more efficient to help finding sources for popular files with a lot of sources on the GNet. However, the goal is to make the mesh work better for every files including rare files. Recent changes in the download mesh should help getting closer from this goal. 3.3.2 Headers The HUGE proposal uses two headers to indicate respectively the URN associated with a file and known alternate locations for this file. Those headers are X-Gnutella-Content-URN to give the URN of the file, and X-Gnutella-Alternate-Location to indicate alternate locations for that file. But new headers have been introduced recently (2003) to construct a new download mesh, as the older one suffered from poorly implemented servents which lead the mesh to be mostly inefficient. One of the reasons for introducing the new smaller headers was to use less bandwidth. Former legacy format example : X-Gnutella-Alternate-Location: http://1.2.3.4:6546/uri-res/N2R? urn:sha1:OJUNVQ75FQMZ5RXR3LJUDIQSGSVC5RFE 2002-12-27T12:35:51Z\r\n, http://1.2.3.5:6461/uri-res/N2R? urn:sha1:OJUNVQ75FQMZ5RXR3LJUDIQSGSVC5RFE 2002-12-27T11:38:51Z X-Gnutella-Content-URN: urn:sha1:OJUNVQ75FQMZ5RXR3LJUDIQSGSVC5RFE New concise format example : X-Alt: 1.2.3.4:6347,1.2.3.5 X-Gnutella-Content-URN: urn:sha1:OJUNVQ75FQMZ5RXR3LJUDIQSGSVC5RFE The X-Alt header is the replacement of the legacy X-Gnutella-Alternate-Location header. The port number MAY be omitted if it is 6346. Legacy format is allowed in X-Alt headers, but newer clients SHOULD only send the new concise format. If a servent implements PFSP, it SHOULD submit and accept partial ranges available using the PFSP X-Available-Ranges header. Servents implementing push proxy MAY also use another X-Alt format, as follows : X-Alt: <GUID>;1.2.3.4:6346;1.2.3.5:6347 Again, the port MAY be omitted if it is 6346. <GUID> is the Base32 encoded version of the proxied hosts' 16-byte Gnutella GUID. Both concise formats MAY also be mixed together by some vendors. Thus the following header is valid : X-Alt: 1.2.3.4:6347,<GUID>;1.2.3.5;1.2.3.6:6347,1.2.3.7:6348 There was no agreement on the ways to maintain compatibility with the legacy servents using the older headers. Thus the legacy headers may be considered deprecated. They SHOULD be still understood by newer servents to benefit from the alt-locs given by older servents, but the new concise alt-locs headers MUST be used for every servent willing to participate to the new download mesh. Thus servents SHOULD answer to hosts sending old headers with legacy headers as this implies that the remote host is using the older mesh. There is no harm submitting alternate locations coming from the older mesh, as they will be checked and dropped if they are not valid. In addition to the original proposal, a new X-NAlts header was also added to indicate bad (expired, false or malicious) alternate locations. The format of the header is the same as the X-Alt header. Example : X-NAlts: 1.2.3.4:6346, 1.2.3.5:6341 The port MAY be omitted if it is 6346. An alt-loc SHOULD be considered expired if a 404 HTTP response was received or if the socket couldn't connect to the remote host, probably meaning that the servent has disconnected, but SHOULD NOT be considered expired when the server is busy (503 response), or when a Requested Range Not Satisfiable (416 response) is received. A servent that sends malformed HTTP headers SHOULD also be removed from the mesh. Chances are that it's download mesh implementation is also bad, and thus it should be considered as a bad alt-loc. If this servent send alt-locs, they SHOULD be discarded as well. Servents SHOULD NOT add to the mesh uploaders which queued their download requests, so that the upldoaders will not be overloaded with more downloader requests. But they are neither put in the bad alt-locs as the uploader exists and has the file. Some servents MAY also do the same for busy servents (503 response). 3.3.3 Description When downloading a file from uploaders, the downloader SHOULD inform the uploaders about others locations it knows for this file, and from which it has successfully downloaded. The downloader MUST NOT inform the uploader about alternate locations from which it has not actually downloaded yet. If, for example, the downloader has 10 locations and tries eight of them, out of which the first five worked and the last three did not work, all of the first five uploaders must be informed that the last three uploaders are bad, and these good uploaders must also be informed that the other 4 uploaders are good. This downloader says nothing about the last two downloaders -- because it has not tried them it has no way of deciding if these locations are good or bad. If there are many alt-locs available, the servent should not submit too much to spare bandwidth. A maximum of 10 alternate-locations for a given file is suggested. To submit alt-locs (good or bad) to an uploader, the downloader has two solutions. If it implements download by chunk and the download is still in progress, it SHOULD submit alt-locs when downloading the next chunks of data. If not, then it SHOULD implement HEAD requests, and send one after the file has been downloaded, including the submitted alt-locs. Similarly, the uploader stores the alternate locations given to it by each downloader, and sends them back to all the other downloaders of the same file. The difference in this case is that the uploader will not send a request to the alternate location to check their validity. This would cause too much unneeded traffic as the uploader has no other reason to connect to the alternate locations indicated by the downloaders. Instead, with the scheme described here the uploader relies on the downloaders to verify the goodness of the alt-locs, as part of their function. In contrast with downloaders, bad alternate locations MUST NOT be submitted by the uploaders. The fact that the uploader has no way to check the validity of the alternate locations was the main flaw in the initial download mesh mecanism, and that is one of the reasons which lead to change the initial specification, notably to add a way to remove bad alternate locations from the download mesh. However, a good download mesh implementation can avoid this issue. Alternate locations can also be obtained in QueryHits replying to Queries sumbited by the user, when the hash value is included in the QueryHit. In this case the host's address MAY be added to the mesh once it has been checked that this is a valid alternate-location. Also, if the host sending the QueryHit implements GGEP, it SHOULD send an ALT GGEP extension (see 3.3.4). These alt-locs, as always, MUST be checked by the downloader before being submitted. The good practices to keep a high quality download mesh are as follows : 1. Test alt-locs before forwarding them. Downloading clients MUST test every alternate locations before submitting them to it's uploaders, using X-Alt header for good alt-locs, and X-Nalts to submit bad alt-locs. Each known alt-loc (good or bad) SHOULD be submitted to each uploader after the test. 2. Inform uploaders about bad locations. As uploaders have no way to know when an entry expires, a downloader MUST inform the uploader about every bad alt-loc it knows. 3. Clean expired entries. The uploaders MUST notably remove alt-locs that are submitted using the X-NAlts header. Uploaders SHOULD have some tolerance though, and not remove the host from their list of alternate locations unless two (maybe three) downloaders failed to download from the host. This will help also against malicious servent trying to destroy the mesh. 4. Minimize transfers. Alt-locs should be exchanged between servents as often as necessary, but no more often. Hence, a servent SHOULD NOT send the same alt-locs more than once to another servent. Similarly, it should not submit the same bad alt-loc more than once. The points 1, 2 and 3 are the absolute prerequisite for participating to the new download mesh (via the new concise headers). There are various options to implement the point 3. Some vendors make their entries expire after a given amount of time (for instance, two hours). Some other vendors cycle their alt-locs so that each of them is submitted regularly to the downloaders, which can in turn notify the uploader when a bad location was found within it's submitted alt-locs. The second solution is better as it ensure that every alt-loc will be tested regularly by the downloaders. Both solution can be mixed, notably to take in account the possibility that some wrongly implemented downloaders will not give a feedback on the expired entries. For servents implementing PFSP there are some additional requirements, see chapter 3.3.5 below. Under these rules, the alternate locations are propagated through the download mesh from uploader (source of the file) to uploader, using the downloaders to check the alt-locs and then submit them to others uploaders of the same file. Bad alt-loc are removed from the mesh with the use of X-NAlts headers, allowing the downloaders to notify each uploader which submitted a bad location. Remember that X-NAlts headers are not propagated as X-Alt headers, though. The downloaders are doing most of the maintenance work on the download mesh, while the uploaders are blindly trusting the downloaders. The advantage of this scheme is that it benefits from the fact that downloaders will naturally search for and find new alternate locations while downloading a file from multiple sources, and thus can maintain the download mesh with a very low bandwidth cost. 3.3.4 GGEP extension Servents implementing GGEP SHOULD send an ALT GGEP extension in queryHits to submit alternate locations in QueryHits. If a server has alt-locs for a while whose hash matches the hash in a query it receives the server SHOULD send Alternate Locations in the Query Hit using the GGEP extension. See GGEP ALT extension in appendix C. 3.3.5 Additional requirements The basic requirements for a servent is to implement the HUGE specification. However, some additional features may benefit to the download mesh. A server implementing PFSP MUST add itself as an alternate location. It SHOULD do so when requesting for the second chunk of data (or alternatively, although the first way is preferred, it MAY add itself to the mesh by sending and HEAD request at the end of the download). It is assumed that non PSFSP aware servents will just not be able to use those partial sources, but they will propagate them anyway to other servents (because 503 and 416 responses do not stop the alternate locations to be propagated), which may use them if they implement PFSP. A servent which does not implement PFSP but does implement HEAD HTTP requests MAY send an HEAD request to the uploaders once finished downloading the file, to add himself into the download mesh. Persistent connections are not required. However it can help the download mesh logic to avoid sending duplicate alternate locations to the same servent. 3.3.6 Sources - HUGE Proposal v0.94 : http://rfc-gnutella.sf.net/src/draft-gdf-huge-0_94.txt - PFSP v1.0 : http://rfc-gnutella.sf.net/src/Partial_File_Sharing_Protocol_1.0.txt - HTTP/1.1 : http://www.w3.org/Protocols/rfc2616/rfc2616.html 3.3.7 - Credits These specifications were written by Mathias Bollaert and Sumeet Thadani (LimeWire LLC), from ideas discussed and agreed on the GDF. Andrew Mickish from Freepeers (BearShare) proposed the best practices.