Firewall Measurement Data sets
This page discusses the data sets used in
J.J.D. Mol and J.A. Pouwelse and D.H.J. Epema and H.J. Sips (2008). Free-riding, Fairness, and Firewalls in P2P File-Sharing.
In Proc. of the 8-th IEEE International Conference on Peer-to-Peer Computing, pp. 301-310.
which contain information regarding the behaviour of peers in thousands BitTorrent swarms. More specifically, two data sets are used:
- The data set by Alexandru Iosup, Pawel Garbacki, et al. constructed in May 2005, which tracked peers in thousands of BitTorrent swarms over the course of several days. Contains how long which peer stayed in which swarm, and whether the peer was firewalled.
- Data pulled from websites of BitTorrent communities in January 2008, covering several public and private trackers and tens of thousands of BitTorrent swarms. Contains the number of seeders and leechers in each swarm. For one community (TVTorrents), it also contains which peers were in which swarm, and whether the peer was firewalled.
Storage
The data sets are stored at
superstorage3:/data/fff_datasets
and the first data set is a reprocessing of the data set stored in
superstorage1:/home/pawel/p2ptraces/bt_piratebay/active_measurement/
Data set 1
The following files are offered, in order of generation:
analyse_bt.py | generates p2ptrace_arrdeps.log on stdout, from *.torrent.smallres and *.torrent.error as found in Iosup's dataset. Input files are expected to be in the current directory.
| p2ptraces.arrdeps | input data set with one row per peer:
| | swarmnr arrival departure firewalled
| | arrival/departure in timestamp (seconds)
| | firewalled in 0 for reachable and 1 for firewalled/NATed
| p2ptraces.filesizes | input data set with one row per swarm:
| | swarm_filename size
| | swarm_filename is filename of .torrent file in Iosup data set (on superstorage1).
| | size is size of data represented by .torrent, in bytes.
| p2ptrace.log | input data set with one row per swarm:
| | swarmnr #seeders #peers fwperc
| | where fwperc is the percentage of firewalled peers.
| p2ptrace2k.log | the 2000 largest swarms in p2ptrace.log
| p2ptrace2k_sorted.log | the same file, sorted on fwperc
|
Data set 2
Two directories are included, final and sources, which contain the data used to generate the graphs, and the more extensive source data, respectively. In the source directory are several tarballs containing the web pages fetched to produce the final data set, as well as scripts to generate them. See the README file in the source directory for a more thorough explanation. In the final directory are located:
*.ratio | seeder/leecher ratio files
| tvtorrents.fwratio | fractions of peers firewalled
|
All *.ratio files measure:
- Website reports about trackers serving TV shows
- Private or public trackers (all4nothin, mininova, btjunkie, thebox, tvtorrents, piratebay, swetv)
- Measured on the same day in January 2008
All *.ratio files contain:
- swarms >= 10 peers only
- one line per swarm
- fraction of seeders per swarm
tvtorrents.fwratio measures:
- January 2008, private tracker, TV shows
- swarms >= 10 peers only
- one line per swarm
- fraction of firewalled peers per swarm (as reported by tvtorrents website)
|