Distributed Tracker
Status: algorithm design done, re-PEX code merged in mainbranch Motivation: Traditional television is moving online according to numerous studies. This means every point of failure needs to be removed from the architecture, resulting in a fully self-organising system with unbounded scalability called 4th Generation P2P. Tribler is striving for unbounded scalability. Tribler already addresses the problem of decentralized content discovery using BuddyCast, but the current client still lacks a solution for decentralized tracking of swarms. Previous work on decentralized tracking has been done by Jelle Roozenburg, but further research is required. Current research is being conducted by MSc student Raynor Vliegendhart. Master thesis workImplement and deploy a fully distributed tracker in Tribler. Used algorithm is the 2-hop torrentsmell algorithm, described in the research assignment document below. ToDo:
Thesis Chapter Layout
Swarm crawling measurement
Popular torrent, reportedly 2897 seeders and 826 downloaders, on 2009-09-04, from 10:36 to 11:38.
3 Linux ISO swarms If we crawl the swarm while trying to stay connected to PEX-capable peers until the connection is dropped by the other party, the graph looks quite different (see below). Peers seem to be less likely connectable after they have closed the connection. Also, while less peers get sent an online check, there is an increasing trend of connectability. This is in line with the results of the "Understanding Churn in Peer-to-Peer Networks" paper, where the authors found the uptime of a peer to be a good indicator of the peer's remaining uptime. Research AssignmentOngoing work by Raynor. Large measurements show that The Pirate Bay is the most dominant central tracker. For each tracker we computed the number of total peers that it tracks. This was done by summing up the swarm sizes of all torrents in which a tracker was mentioned in the announce fields. The .torrent files of these swarms were obtained by running the Tribler TorrentCollecting software for roughly a year. Swarm sizes were constantly updated during this time. The top 20 trackers are depicted in the bar chart below:
In this chart, peers that are tracked solely by a tracker are marked as "Single Tracker". Hence, these peers come from a swarm which are only tracked by a single tracker. The remaining peers come from swarms that are tracked by multiple trackers. "Primary Tracker" indicates the peer came from a multi-tracker swarm which is primarily tracked by this tracker, i.e. the tracker is mentioned in the 'announce' field of the torrent file. If the tracker was not mentioned in the 'announce' field (and hence, it must have been mentioned in the 'announce-list' field), we label the peers as "Foreign Tracker". The swarms presented in this chart where carefully filtered for spam, fakes, and misreportings. DHT Papers
PEX, Gossips, Random/Ant Walks
Also of interest: PEXCrawl BitTorrent Trackers
Unsorted Papers
Previous Work: Little BirdThis section should be merged after research has been completed Removing the Bittorrent trackerIn the BitTorrent p2p system all users that download a certain file profit from each other by bartering file pieces. Using this principle, files can be downloaded faster. A group of users that download the same file on the same moment are called a download swarm. The function of the BitTorrent tracker is to find out which users are in the download swarm of a certain file. A secondary task of the tracker is to administrate statistical information about the swarm, for instance which part of the file is downloaded by which users. Tracker protocolThese tasks are currently implemented in a centralized fashion. A BitTorrent tracker is a server to which users can send a getPeers() request. This is request will be answered by the tracker with a response containing a list of IP addresses and port numbers of peers that are currently in this swarm. Communication with the tracker is done using the HTTP protocol, just like a webserver. Disadvantages of a centralized trackerThe decentralized design of p2p systems is one of their major advantages. It makes the BitTorrent system flexible, scalable and reliable. The BitTorrent tracker is one of the parts of the system that still has a centralized design. This makes its simple implementation possible, but is formost a disadvantage for BitTorrent. A centralized tracker is a single point of failure in the BitTorrent system. This means that its failure will cause an interruption of the BitTorrent service, namely: to enable people to join and use download swarms. While the system gives a reliable download environment through thousands of users going online and offline at each moment, a disfunctional tracker immediately stops the download possibility of all new users. A BitTorrent tracker is also not scalable because of its centralized design. On this moment it is already noticable that trackers have long response delays and peers often have to do more tracker requests due to time-outs. When the number of BitTorrent users will grow in the future this problem will even grow in such a way that building a reliable centralized tracker will be undoable or at least very expensive. A final disadvantage of the current centralized trackers is that they are an easy target for an attack at the BitTorrent system. A simple distributed denial of service (DOS) attack can stop a tracker and with it thousands of users. With the current use of BitTorrent for the exchange of copyrighted materials, it is obvious that certain parties have motives for such attacks. Distributed trackerFor the reasons given in the previous section, it shows that the centralized BitTorrent tracker should be replaced by a distributed follow-up. In this section we will explain how this is done and which issues have to be considered when distributing the tracker's functionality over all peers in the swarm. Find initial tracker peers The most important functionality of a tracker is to give any peer a list of addresses through which other peers in the swarm can be contacted. With a centralized tracker, the tracker location is saved in the .torrent-file so that each peer that starts a download can immediately contact it. When such a central point is not present, then we first have to find the tracker. In the distributed case, the tracker consists of all peers (or more commonly: all peers in the swarm in question). So the problem is: find initial peers that are in the swarm and are part of the distributed tracker. This problem will be solved using peer gossiping. Using this protocol, each peer that starts to seed a new file over the network or joins an existing swarm, pushes a message to the peers in its neighborhood. This messages contains his permanent identifier, a timestamp and a datastructure (probably a Bloom filter) with all the info hashes of the files it is currently seeding. These messages are forwarded by other peers according to the probabilistic gossiping protocol. In this way, most peers in the network receive the messages. When a swarm of reasonable size has formed, most peers should have a collection of messages with information about the current members of this swarm. If a peer wants to download the file (so join the swarm), it can simply connect to the peers in its list, check if they are still active swarm members and join in. Get more swarm peers In the current BitTorrent system, all peers in a swarm periodically reconnect to the tracker to add more swarm members to their peer lists. By doing so, they have knowledge of more peers and the probability of finding a peer with the pieces of the file you need grows. In the distributed tracker case, when a peer has found some peers using the method of section 'Find initial tracker peers', it will use these peers to enlarge its list of known peers in the swarm. This can be done by asking these peers for their swarm list (all peers in the swarm). Although this sounds very simple, there are different approaches to do this. Track just seeders?Would it make sense to gossip only about seeders? A popular file will have fewer seeders than leechers, and the set of seeders will be more stable. In principle, you only need to find one peer, since you can random walk from there to find others. Bittorrent is normally supposed to work even when the swarm includes only leechers, but would it make sense to give up on that and instead try to get more seeders? Actually, the distinction between seeders and leechers is blurry to me. In practice, a peer could advertise itself when 1) its policy has been configured to seed files for a sufficiently long period of time after download completion and 2) it has detected that the user leaves it running all the time and not just when he is downloading a file. --rimey@hiit.fi DHT and alternativesNumerous DHT proposals exist to decentralise peer discovery. An advanced proposal uses Skip Graphs to discover content and can be adapted to discover peers. However, security is not fully taken into account and a trivial pollution attack can bring the system down. Attachments
|