Swarm-based Reputation ConsensusIn a nutshell: a mechanism for improving bartercast with swarm-based information diffusion. Trusted peers start torrents to disseminate their current view of the system and other peers join such torrents to update their views. [Rahim: How a peer knows that it is trusted by others?] [Todo: the discussion on the feasibility of keeping track of behavior for one million peers is relevant but somehow orthogonal to how information is propagated. Where should we put it then?] Relationship with BarterCastBarterCast is a reputation mechanism based on a contribution network built by each peer with information it gathers from the system. Peers propagate information about other peers' behavior through gossiping and use this information to build a directed graph of contributions. This graph then serves as input for a ranking algorithm such as Maxflow (of contributions to the peer calculating the reputations) or PageRank. An issue with this mechanism, however, is that information might take a considerable amount of time to propagate and therefore the opinion of the community about a peer might diverge for long periods. [Johan: more importantly it facilitates a secure and fast bootstrap. The reputation system at a newly installed peer is fully operational after just downloading a single file] If information could be passed around in the form of signed certificates of contribution, it could be more trustable. However, this would cause potentially large convergence times. This can be circumvented if some peers merge this information and are trusted to propagate the result of such merging. [Todo: I was unsure whether transaction certificates are sent around in BarterCast.] Synchronizing views with trusted torrents
Consider a network for which the reputation network has converged in all peers. The central idea of swarm-based reputation consensus is that the centroids of the reputation graph can periodically (once every few days maybe) initiate torrents to propagate the contribution network they currently have. After a (small) number of these peers perform this operation, other peers join the torrents and update their current reputation network by merging their view of the system with that of the trusted peer that initiated the torrent.
If the convergence about the contribution network is not perfect, some peers will not identify correctly who are the trusted publishers of the contribution network. These peers can use the popularity of current swarms distributing contribution networks to estimate which are the most trusted ones. [Todo: This assumes trusted swarm size information.] The rationale for this approach is that if peers synchronize often with the same highly-trusted peers, information will be more rapidly disseminated than through gossiping alone. [Todo: if we know who are the most trusted peers, maybe it is also a good idea to prioritize them when gossiping new info.] Open issues:
Analysis of BarterCastTo reply some of the above questions, whether gossiping is enough for spreading info or are there strong peers that their local history is rich enough, we decided to evaluate current running BarterCast and measure some parameters. Below are some of them and their description:
How to measureRequirements:
The BIG question is how? Simulation or actual experiment? Using crawlerBy using modified peers we can crawl inside the local history of the nodes and impose them to log and send their local history's changes. By following the log of changes we can trace the change of local history from beginning to end. Below are the tasks that should be done:
Bartercast graph visualizationTo have a proper imagination of the layout and connectivity of the bartercast graph, the collected bartercast records have been visualized in one graph. By using [Graphviz] and [ZGRViewer] tools a directed graph is drawn that shows data transfer between Tribler nodes. We started colleting of data on 20-June and the drawn graph is for 6'th of August. During this period each online node is contacted every hour and asked for new bartercast records. In the below graph:
Bartercast graph (for the first 20000 records), max_degree=65 ===Bartercast graph zoomed in (for the first 20000 records) ===Bartercast graph all records till 2009-08-06 , max_degree=189 ===
Note: A higher resolution picture also is attached but showing it on the wiki messes the text. You can download and look at it by other tools. Initial ResultsAfter collecting data for almost a one and half month ( from 20 June till Aug. 6) and processing them we obtained following results. During this period the crawler contacted 2017 nodes. Some of them didn't reply at all but from almost 1800 nodes replied at least once. Some peers were very active and crawler was able to collect their full local history but most of them just come and leave. The cycle or interval length is et to 4 hours and the period of asking peers is one hour. Below is the analysis result for the mentioned set of data. Average of replicas ===Average number of replicas in different cycles. Records come in the system in different time, so the first cycle for a records may be different cycle for other records. To resolve this issue, we shift all of the records to the start cycle and measure the average number of records relatively. The same processing is done for number of peers in different cycles. In the way calculations for last periods are less accurate(because less peers and records enter into system in the initial time) but it is acceptable. Figure below shows this
Below graph just shows average number of replicas Coverage ===
By definition coverage means, the rate of the peers that have a records in their cache. Figure below shows the average of coverage for different records in different cycles Convergence ===By Convergence we mean the how local history of a node can represent local union of local history of all nodes in the system. By formula, in a specified cycle the number of records in local history divided by the number of records in all local histories of all nodes. We did measuring convergence at the period and categorized local histories in five groups:
figure and table below shows the obtained result Peers cache richness, accuracy and onliness
The system is a dynamic system and peers join and leave. To differentiate peers from each other we used to fuzzy parameter: richness and accuracy. As the name indicates, the richness means how many records is available in a peer's cache and accuracy gives and indication of synchronization between peer's cache and collected info about peer's cache by crawler. If crawler can not connect to
node for long time the accuracy level of that node will decrease. The richness and accuracy of the peers have been categorized into five different groups
Figure and table below shows the obtained result It is possible that some peers be rich in cache but they don't reply to the crawaler for long time. An other metric that is used to measure this phenomena is accuracy. Accurate peers are those that reply to the crawler just very close, in time, to the end of measurement. For exampe, if the last collected record is in the system be logged at time T then those peers that have replied to the crawler after T -threshold_level are very accurate. List below shows different accuracy levels and their meaning:
For calculation, the threshold is set to eight hours and a slack of four hours also deducted from last reply time to leverage the values around the middle. Figure below shows the result: Another parameter about peers that worth to measure is their onliness, or the rate of their online time to the time passed since their discovery by crawler. Because of the way that crawler contacts with peers, their online time is not accurate but in approximation it is very close to accurate value. Again we used fuzzy value to categorize onliness:
Correlation between parameters
The accuracy, richness and onliness are in some way correlated. Peers that stay online for long time collect many records, so there should be positive correlation between onliness and richness. Also peers with less accuracy will have less richness and onliness. The following graphs show correlations between different parameters.
same story is applied for other correlations shown below.
Correlation between accuracy and onliness
Correlation between richness and accuracy
Correlation between richness and onliness Discovery of peers by crawlerCrawler finds new permids in the system by via three ways. The are founded and given by Buddycast or they are in the from_id or to_id part of the bartercast record. If a peer given by Buddycast to the crawler, the crawler tries to connect to that peer and fetch some records from it. To evaluate how the system bhave in this case, the number of discovered permids in different cycle are counted. The first graph shows the cumulative number of discovered peers in different steps, with non-overlao between methods. For example if a permid was seen for the first time in the from_id part, it will not be counted in other ways. The second graph shows the same graph but with overlap. In this graph a permid may be counted in all three way. The last graph show daily discovery of new permids. As it is expected, at the beginning it finds many records but more or less it has decreasing trend. Cumulative number of discovered peers without overlapCumulative number of discovered peers with overlapdaily discovery of new peersCumulative number of connected peersRecords update statistics
Another interesting that we measured was the update ratio of bartercast records. By update we mean, a record we same from_id and to_id but different downloaded or uploaded values. For example, record (A,B, d, u ) is an update of record (M,N, o , p ) if A=M and B=N and d>=o or u >= p. In the current system records are overwritten by new updated ones. We measured update ration of bartercast records in two case. First, by considering all records and second putting away those nodes that their from_id is equal to those to_id ( control records). usually control records seen a lot updates but real records very few. Figures below shows the histogram and also cumulative number of updates for these to set of records. Red color graph is for all records ( real and control records) and blue one for just for real records.
Histogram for records with different updates (all records)
Cumulative sum of records with different updates
Histogram for records( distinct from_id and to_id ) with different number of updates
Cumulative sum of records ( distinct from_id and to_id ) with different number of of updates
As it is seen from the graph most of the records never get second update. Evaluation of the maxflow and Bartercast
To answer the question of whether the Bartercast and maxflow are able to make a good estimation of peers' reputation or not, we calculated average of Subjective Reputations(SR) and then plotted it against net-contribution to Tribler peers.
We also separated the amount of upload/download between Triblers peers and all peers(Tribler and non-Tribler). Some facts about these values are::
For some nodes, 974 nodes, we are able to calculate both total and Tribler contribution, the result is very interesting as well:
But the interesting point is the ratio of Tribler upload and download to total upload and download. In both cases it is almost 0.03 and it means that only 3% of data transfers happens between two Tribler peers and in 97% one side is non-Tribler. It is not clear how much of this data is control data, like Buddycast or Bartercast messages? Correlation between Tribler net-contribution and average of subjective ratiosThe following graphs show the correlation between average of SRs and net-contribution. SRs are calculated from the view-point of the local history holder. For example, if peer Q be inside the local history of peer P( P is the holder of cache) then inside this cache, 2-hop maxflow is calculated from P to Q and from Q to P, then by using the formula: arctan(flow(Q->P) -flow(P->Q))/(PI/2) we calculate the reputation of Q from the point of view of P. Figures below show the average of these SRs, that were able to calculate SR: Correlation graph without zooming Correlation graph with [-1000 , 1000 ] on x-axis and [-0.2 ,0.2] on y-axis Correlation graph with [-500,500] on x-axis and [-0.1, 0.1] on y-axis Correlation graph with [-200,200] on x-axis and [-0.05 ,0.05] on y-axis Correlation graph with [-100 , 100] on x-axis and [-0.2, 0.1] on y-axis Correlation graph with [-100, 100] on x-axis and [-0.02, 0.02] on y-axis
As the graphs show there are few dots in the left-up and right-down quarters of the page and it means that there is positive correlation between Avg of SRs and net-contribution, but from the scattering of dots it does not seems to be a strong correlation. To give a better answer we calculated the correlation coefficient between these two parameters.Because of the non-linear function that is used in calculating the reputation, "arctan" , we don't expect to have a linear correlation between them, as Pearson correlation shows this fact. Before calculating we also removed outliers that may unexpectedly deviate the result. Below is the result for different models in 2-hop maxflow:
We also calculated the maxflow with 6-hop distance and below is the correlation coefficients:
In comparison to 2-hop maxflow there is a little improvement but it is not spectacular.
As it was expected there is no strong correlation two parameters but at least it is positive and we can conclude that Bartercast and maxflow can differentiate between very bad and very good users, but not more.
research problems
Related work / Alternative approaches:
Attachments
|