Swarm sizeThe popularity of content. Problem description
Information at handAfter exchanging of BuddyCast message, followed by metadata exchange (mentioned in the protocol), each peer has below information that can use to calculate swarm size:
There is another set of information that is obtained during periodic connections to the tracker:
Also information from tracker seems to be reliable (may be!!) and easy to get, but the challenge is to ask tracker as less as possible and to have appropriate measure of the swarm size. Problem AnalysisTo provide a solution, first we should know, how the swarm size is changed?
How is the reliability of information obtained from tracker?According to the BT protocol, the tracker should have the most recent info about the swarm size, but in reality it is not. There are some factors that temporarily degrades the accuracy of tracker's informations. For example, If a peer leaves the swarm without informing the tracker (sending stop message) the tracker will have inaccurate info until next check that may happen after interval time units. How is the reliability of information obtained through gossip message?Without strict security conditions information obtained through gossip protools are not reliable. For measuring swarm size below issues may arise:
SolutionMain pointsIn crowd based popularity measurement systems there are two important trends that should be considered and balanced when policy designing:
Solution DescriptionAt first step, we consider the case that there is no malicious peer in the system and all peers follow the protocol (They send what they know about swarm size without deliberately changing it) and later we can elaborate it. To make a correct measurement each node should keep a log( or popularity table ) of the swarm size info that it has received from others. The Popularity table contain these items:
Beside above table each peer has access to this set of info:
What we do is that, by extending BuddyCast protocl along with torrents infohash that it sends( fetched from MyPreference and Preference list), it sends latest information about swarm size to the receiver. In this implementation it sends
If peer does not have any info about torrent size, not checked yet or rejected by tracker, it send specified values that later discarded by receiver.
Some HeuristicsIn the case of swarm size measurement some heuristics can be applied:
For the first version we just looking for the most recent value from tracker asked by peer or one of its neighbors. Dealing with time differenceBecause peers are not synchronized so direct comparing of peers' tracker checking time is not logical. For example, suppose that P1 and P2 ask tracker for the torrent T1 and send ( t1, s1, l1) and (t2, s2, l2) to peer P0 respectively. Now peer P0 has two sets of information but because peers P1 and P2 have different local times, comparing reports and deciding which one is newer is impossible. To resolve this problem, peers send the age of tracker asking. Before sending their tracker checking report, they subtract the tracker checking time from their current time and send it along with the message. When a peer receives a record, it logs the message arrival time. At any time the age of the report can be find by this formula:
In this way the reports from different peers are comparable and peers can choose the youngest report easily. Avoiding Spread of ContaminationTo avoid fast spread of contamination we avoid to send second hand information to other peers. It is possible that a peer receive incorrect value from other peer. If peer relay it to other peers, in next gossip, after a while that wrong value will be spread in a large group of peers. To avoid this problem, peers just send first hand information, those information that are obtained directly from tracker. But peers can use received size reports to show on the GUI, or may be send it in reply to remote search method. Avoiding of unlimited table growingTo avoid growing of Popularity table, we put a maximum limit for the number of reports for a specified torrent and reports for a specified torrent from a specified peer. When a new report is obtained and it the number of records exceed the maximum limit, old records are removed from the table. The maximum number of reports for a torrent and also the maximum number of reports for the combination of torrent and peer are set to 3 and 5 respectively. This values are Global values and can be changed very easily. Some fingure calculation ( just estimation, correct values need simulation)
In the current version of the Tribler, each node obtains swarm size by asking tracker or metadata request. The interval for asking tracker is 30 seconds. About metadata request, after receive of a BC
message, the peer selects of the received torrents and ask sender about metadata and swarm size. The interval of the tracker asking and BC message are 30 and 15 seconds respectively. According to the configuration each peer needs to keep track of 5000 torrent. With this assumptions, in the best case , non-overlaping, each peer can update 6 torrents per minute. Four of them using BC message and the other two by asking tracker. We consider the best case that peers use a round robin method to ask tracker and also they receives different set of torrents in each round of BC. With these assumptions the refresh time is:
Now suppose that in each round a peers sends the size of the 50 swarms to another peer. The interval of asking tracker is same, each 30 seconds, and again each peer keeps track of 5000 torrents. We analyze the best case and worst case.
Now suppose the case that the number of swarms is higher than the number of peers and the received swarm sets have overlap with the rate of 0.5( the probability of receiving same torrent in different rounds is 0.5). If there be 2000 peers and 100000 swarms, at least 50 peers will be responsible for a single swarm and each 5000/50=100 can manage 5000 swarms cooperatively. So these 50 peers divide 5000 between themselves and ask tracker, each one needs to ask for just 100 swarms that it takes 50 minutes. After that they can send swarm info to each other via BC message. Suppose that half of the swarms have overlap so in each round each peer receives 25 new swarm size info from others, 100 per minute. So the refresh time is:
Notice that the 50 minutes, before starting gossip protocol, is very pessimistic and nodes does not need to wait for 50 minutes, they can start BC message as soon as they get some swarm size info. But again with these pessimistic assumption the refresh time of the proposed method is much less than the current applied method. Implementing of the proposed solution
In the first version we think that all of peers are benevolent and does not send incorrect values to others. With this assumption measuring of the swarm is very easy, the last received
value is the most correct value. So lets continue with this assumption and complete it later. Adding Popularity table
At first step we need to create a table to keep received swarm size info from different peers. We name this table "Popularity":
Updating Popularity tableWhen a new record should be added/removed to/from the Popularity table:
Sql scripts for creation and altering tablesCREATE TABLE Popularity ( torrent_id INTEGER, peer_id INTEGER, msg_receive_time NUMERIC, size_calc_age NUMERIC, num_seeders INTEGER DEFAULT 0, num_leechers INTEGER DEFAULT 0, num_of_sources INTEGER DEFAULT 0 ); CREATE INDEX Message_receive_time_idx ON Popularity (msg_receive_time); CREATE INDEX Size_calc_age_idx ON Popularity (size_calc_age); CREATE INDEX Number_of_seeders_idx ON Popularity (num_seeders); CREATE INDEX Number_of_leechers_idx ON Popularity (num_leechers); CREATE UNIQUE INDEX Popularity_idx ON Popularity (torrent_id, peer_id, msg_receive_time);
Tracker down scenario (Johan's point)When a tracker goes down permanently the desired outcome is that after some time all Tribler peers will remove this dead .torrent from their hard disk. One possible policy is that old popularity info will cause a .torrent to be scheduled for deletion.
However, care is taken that coming online after 1 month of downtime does not cause deletion of all old .torrents in the cache. Rahim: At the current implementation popularity values are not removed otherwise new values be reported, but old values are reliable until a new value be reported. Testing and evaluation
|