Firewall Measurement Data sets

This page discusses the data sets used in

J.J.D. Mol and J.A. Pouwelse and D.H.J. Epema and H.J. Sips (2008). Free-riding, Fairness, and Firewalls in P2P File-Sharing.

In Proc. of the 8-th IEEE International Conference on Peer-to-Peer Computing, pp. 301-310.

which contain information regarding the behaviour of peers in thousands BitTorrent swarms. More specifically, two data sets are used:

  1. The data set by Alexandru Iosup, Pawel Garbacki, et al. constructed in May 2005, which tracked peers in thousands of BitTorrent swarms over the course of several days. Contains how long which peer stayed in which swarm, and whether the peer was firewalled.
  1. Data pulled from websites of BitTorrent communities in January 2008, covering several public and private trackers and tens of thousands of BitTorrent swarms. Contains the number of seeders and leechers in each swarm. For one community (TVTorrents), it also contains which peers were in which swarm, and whether the peer was firewalled.

Storage

The data sets are stored at

superstorage3:/data/fff_datasets

and the first data set is a reprocessing of the data set stored in

superstorage1:/home/pawel/p2ptraces/bt_piratebay/active_measurement/

Data set 1

The following files are offered, in order of generation:

analyse_bt.pygenerates p2ptrace_arrdeps.log on stdout, from *.torrent.smallres and *.torrent.error as found in Iosup's dataset. Input files are expected to be in the current directory.
p2ptraces.arrdepsinput data set with one row per peer:
swarmnr arrival departure firewalled
arrival/departure in timestamp (seconds)
firewalled in 0 for reachable and 1 for firewalled/NATed
p2ptraces.filesizesinput data set with one row per swarm:
swarm_filename size
swarm_filename is filename of .torrent file in Iosup data set (on superstorage1).
size is size of data represented by .torrent, in bytes.
p2ptrace.loginput data set with one row per swarm:
swarmnr #seeders #peers fwperc
where fwperc is the percentage of firewalled peers.
p2ptrace2k.logthe 2000 largest swarms in p2ptrace.log
p2ptrace2k_sorted.logthe same file, sorted on fwperc

Data set 2

Two directories are included, final and sources, which contain the data used to generate the graphs, and the more extensive source data, respectively. In the source directory are several tarballs containing the web pages fetched to produce the final data set, as well as scripts to generate them. See the README file in the source directory for a more thorough explanation. In the final directory are located:

*.ratioseeder/leecher ratio files
tvtorrents.fwratiofractions of peers firewalled

All *.ratio files measure:

  • Website reports about trackers serving TV shows
  • Private or public trackers (all4nothin, mininova, btjunkie, thebox, tvtorrents, piratebay, swetv)
  • Measured on the same day in January 2008

All *.ratio files contain:

  • swarms >= 10 peers only
  • one line per swarm
  • fraction of seeders per swarm

tvtorrents.fwratio measures:

  • January 2008, private tracker, TV shows
  • swarms >= 10 peers only
  • one line per swarm
  • fraction of firewalled peers per swarm (as reported by tvtorrents website)