P2P Simulator

We are designing and implementing a full-scale trace-based P2P Simulator which includes highly detailed aspects of BitTorrent and Tribler. The goal is to provide a platform on which protocol changes and new ideas can be extensively tested and measured without having to wait for real user-data.

The core of the design consists of a detailed BitTorrent simulator. On top of this, the Tribler overlay is simulated with the BuddyCast epidemic protocol, BarterCast statistics exchange. Highly advanced mechanisms that make use of these protocols, such as moderation, are being integrated as well.

The simulator is built with a trace-based framework. Given a trace of peers (e.g., up/down timestamps + connectability + file requests), a full simulation of the network can be executed. In the future, automatic trace generation will be supported based on given popularity distributions.

Currently implemented

  • BitTorrent piece-level swarm simulator
    • Unchoking + optimistic unchoking
    • Rarest first piece-picking
    • Piece + subpiece support
    • Multiple tracker support
  • Trace-based BuddyCast support
    • Full semantic overlay simulation
    • Taste exchange
  • BarterCast support
    • Bandwidth statistics exchange
    • Maxflow-based reputation system
    • Sharing-ratio enforcement based on reputation

Under implementation

  • Logging facilities
  • Documentation
  • Convenient configuration facilities
  • ModerationCast integration
  • Probability-based trace generation

ToDo

Reduction of memory usage to enable going beyond simulating 100,000 peers in a semantically clustered network. Question: is the Python dictionary the bottleneck?

Datasets

Over the past years we have collected numerous TBytes of P2P systems and web 2.0 sites. These (anonimized) traces provide us with the insight of how real people behave on real systems. Our special interests are software usage, Internet connection capacity, session duration, social networks, taste clustering, group formation, and tagging. We believe our collection constitutes one of the largest accurate non-profit collection of P2P datasets.

  • SuprNova.org
    Download behavior for 18 months with popularity of files. Includes data on system failures on trackers, web servers, .torrent servers, and load balancers. From early 2003 onwards.
  • Piratebay.org
    Content availability popularity. Includes early data from 2004.
  • Filelist.org
    Download behavior of 91745 peers with connectability, Bittorrent client name, download speed, online/offline time, etc.
  • Youtube.com
    For 750,000+ users we collected their public user profile information, published clips, social network information, and favorite clip information (144GByte of summer 2006; ST3:/data/dataset)
  • Del.icio.us
    Tagging behavior, taste diversity, and user activity (2.3 GByte; 4311 users)
  • Flickr.com
    Tagging behavior, taste diversity, and user activity (0.5 GByte)
  • CiteUlike.org
    Tagging behavior, taste diversity, ratings, and user activity (0.5 GByte; 3592 users)
  • Librarything.com
    Tagging behavior, ratings, and taste diversity
  • Vuze.com
    BitTorrent piece exchange

More dataset details: