Swarm Cache Manager

Status: some working code, 2 years of development ahead (Johan)

Abstract

We propose a method for improving the long term performance of BitTorrent peer-to-peer networks by increasing and refining the contribution of individual nodes.

Self organizing systems rely on algorithms with emergent behavior. Current BitTorrent clients use only basic seeding policies, such as seeding until a fixed upload to download ratio is achieved. We propose a system in which the user allocates space for a cache which is automatically managed by the program.

Central to our approach is a health function for swarms which indicates reliability and speed. We apply a seeding policy to the cache that takes account of this health function. This way, the program is doing informed seeding and the long tail is better served.

This mechanism moves the BitTorrent client towards a YouTube-like "play-and-forget" model in which the user doesn't have to actively manage the hard drive space. It also brings direct benefit to the user by boosting his reputation, if deployed in a system in which such a notion exists (e.g. Tribler using BarterCast).

The Cache

The goal is to increase performance by increasing the quality and quantity of user contribution to the system. For this to happen, we need to ensure that the people always seed and that they seed swarms that need seeding.

Always Seed Something - Quantity

Permanent seeding is achieved through a cache system. When a user clicks "play" on a torrent, the downloads starts a usual in a BitTorrent client. The difference is that the destination directory is not exposed to the user (currently state_dir/swarms_cache_dir) and the GUI doesn't offer any management interface for the torrent (starting, stopping, deleting).

We will offer a separate "save" button that takes the torrent out of the cache and puts it under user control.

Informed Seeding - Quality

If the user is to use a client with such a cache, it has to be driven by algorithms that lead to better performance (this depends on BarterCast rewarding users for their contribution). The algorithms refer to the cache retention policy: what swarms to delete when we run out of space.

To make an informed choice, we first need to measure the health of a swarm, i.e. quantify its need for seeding. We then sort the swarms in the cache based on the results and delete the healthiest one.

Here are some of the strategies for sorting the cache contents, a.k.a. health functions:

  • Peer dynamics

(S / L) * (S + L) * (1 / RS) * (1 / RL)

S is the number of seeders, L is the number of leechers, RS the rate of departure for seeders, RL the rate of arrival for leechers.

  • Availability

min_p(sum_i(Ap(i)))

Ap(i) is 1 if piece p is present at peer i.

  • Local traffic

1 / Ut

Ut is the upload in the last t minutes.

  • Global traffic

sum_i(Dt(i)) / N

Dt(i) is the download of leecher i in the last t minutes and N is the number of leechers whose speed we measured.

Some of the strategies can also use normalization with respect to file size.

Detect fake swarms

This does not refer to detecting fake content.

Tracker reports fake peers in peer list

Malicious tracker gathers peer information from regular trackers (just a guess) and uses it to populate its peers list. (What if the peers spam the tracker?)

Statistics needed
  • number of initiated TCP connections (N1)
  • number of failed TCP handshakes (N2)
  • number of failed BitTorrent handshakes (N3)
Criteria for marking as fake
  • less than 1% of peers are connectable
  • less than 50% of the connected peers respond to the BitTorrent handshake

What criterion to consider for calculating the percentage? Time, number of peers?

What if there are multiple trackers? Is it not better to delete just the polluting tracker? If so, the peers must be analyzed according to source. FUTURE: possible problem with PEX.

Observations

DownloadState is created by DownloadImplementation.network_get_state() with data from SingleDownload.get_stats(). SingleDownload.get_stats() calls SingleDownload._getstatsfunc() which is actually BT1Download.startStats(). BT1Download.startStats() calls DownloaderFeedback.gather() which uses Statistics.update().

Tribler/Core/BitTornado/BT1/Statistics only covers Connecter. If we want data about pre-handshake events, we need to add logging to Encoder.

Fake statistics information

Tracker advertises a high number of peers, but only provides a small number of peers when actual peer list is requested.

Problem

The tracker may return less than 50 peers when numwant is missing or 0. How can we detect that it is doing it for malicious reasons?

Observations
  • swarm statistics

The complete and incomplete keys are optional. The only mandatory keys in the tracker announce response are interval and peers (BEP 3).

libtorrent checks for complete and incomplete but does not take any action when they are missing (http://libtorrent.rakshasa.no/browser/trunk/libtorrent/src/tracker/tracker_http.cc#L212). LimeWire does the same (https://www.limewire.org/fisheye/browse/limecvs/core/com/limegroup/bittorrent/TrackerResponse.java?r=bittorrent-branch#l101). tvtorrents does not use these keys.

The GUI uses TorrentChecking to report total number of peers. TorrentChecking uses TrackerChecking.trackerChecking().

  • peer list

It is not mandatory for the tracker to return 50 peers if it has at least 50 peers. tvtorrents returns less (a variable number, depending on the size of the swarm), even in big swarms.

The numwant parameter in announces is optional, so the tracker can not be expected to respect it. tvtorrents does not respect it.

Code

See the following SVN branch:
https://svn.tribler.org/abc/branches/mihai/SwarmsCacheManager-from-release-4.5-r10057