Swarm Cache ManagerStatus: some working code, 2 years of development ahead (Johan) AbstractWe propose a method for improving the long term performance of BitTorrent peer-to-peer networks by increasing and refining the contribution of individual nodes. Self organizing systems rely on algorithms with emergent behavior. Current BitTorrent clients use only basic seeding policies, such as seeding until a fixed upload to download ratio is achieved. We propose a system in which the user allocates space for a cache which is automatically managed by the program. Central to our approach is a health function for swarms which indicates reliability and speed. We apply a seeding policy to the cache that takes account of this health function. This way, the program is doing informed seeding and the long tail is better served. This mechanism moves the BitTorrent client towards a YouTube-like "play-and-forget" model in which the user doesn't have to actively manage the hard drive space. It also brings direct benefit to the user by boosting his reputation, if deployed in a system in which such a notion exists (e.g. Tribler using BarterCast). The CacheThe goal is to increase performance by increasing the quality and quantity of user contribution to the system. For this to happen, we need to ensure that the people always seed and that they seed swarms that need seeding. Always Seed Something - QuantityPermanent seeding is achieved through a cache system. When a user clicks "play" on a torrent, the downloads starts a usual in a BitTorrent client. The difference is that the destination directory is not exposed to the user (currently state_dir/swarms_cache_dir) and the GUI doesn't offer any management interface for the torrent (starting, stopping, deleting). We will offer a separate "save" button that takes the torrent out of the cache and puts it under user control. Informed Seeding - QualityIf the user is to use a client with such a cache, it has to be driven by algorithms that lead to better performance (this depends on BarterCast rewarding users for their contribution). The algorithms refer to the cache retention policy: what swarms to delete when we run out of space. To make an informed choice, we first need to measure the health of a swarm, i.e. quantify its need for seeding. We then sort the swarms in the cache based on the results and delete the healthiest one. Here are some of the strategies for sorting the cache contents, a.k.a. health functions:
Some of the strategies can also use normalization with respect to file size. Detect fake swarmsThis does not refer to detecting fake content. Tracker reports fake peers in peer listMalicious tracker gathers peer information from regular trackers (just a guess) and uses it to populate its peers list. (What if the peers spam the tracker?) Statistics needed
Criteria for marking as fake
What criterion to consider for calculating the percentage? Time, number of peers? What if there are multiple trackers? Is it not better to delete just the polluting tracker? If so, the peers must be analyzed according to source. FUTURE: possible problem with PEX. ObservationsDownloadState is created by DownloadImplementation.network_get_state() with data from SingleDownload.get_stats(). SingleDownload.get_stats() calls SingleDownload._getstatsfunc() which is actually BT1Download.startStats(). BT1Download.startStats() calls DownloaderFeedback.gather() which uses Statistics.update(). Tribler/Core/BitTornado/BT1/Statistics only covers Connecter. If we want data about pre-handshake events, we need to add logging to Encoder. Fake statistics informationTracker advertises a high number of peers, but only provides a small number of peers when actual peer list is requested. ProblemThe tracker may return less than 50 peers when numwant is missing or 0. How can we detect that it is doing it for malicious reasons? Observations
Code
See the following SVN branch: |