Moderation and rich metadataStatus: operational code, waiting for deployment and improvements Currently every peer can publish .torrent files using Buddycast3. No moderation mechanism exists yet. An unsolved problem is how to create a metadata infrastructure for Video on Demand that facilitates user-generated metadata. Research question : Design, implementation, and evaluation of an efficient epidemic protocol for distribution of rich-metadata with pollution prevention measures We have worked out a roadmap for addressing the above research question in incremental stages until June 2009. Each stage represents an improvement in both moderation sophistication and growth of the moderator community. Key is that the software and community use of the new features are in balance. Measurements on related projects such as the MusicBrainz numbers or Wikipedia traces show that it takes many months or even years to grow a community of moderators. Also, the robustness to vandalism or fraud will also grow over time as we see normal usage patterns and can detect anomalies better. Furthermore, the software will give the growing moderator community increasingly more power over the metadata, until they are in full control. 5-stage roadmapResearch questions for each stage:
General research challenges: scalability, robustness to fraud, acceptable propagation speed, low bandwidth usage on moderator. Our proof-of-principle Python code. Related Gossip work Draft outline of architectureWe have chosen to design and implement the simple moderation protocol using gossiping (based on BuddyCast). Peers can create and receive moderations, which contain extra data for a given torrent:
We have chosen not to enable peers to change the title of a torrent. This would make finding a torrent with a badly moderated title very difficult. Furthermore we have decided not to include majority voting as a tie-breaker between moderations, but to use the last moderation. This is far more scalable when there is no trust mechanism. (Determining the majority in a non-secure environment requires either trust or every peer gathering all the moderations itself.) As a pollution prevention measure we use blacklisting. Users can block moderators that send bad moderations. This is done for the PermID and also for the IP-address of the peer. To further prevent the propagation of bad moderations we do not automatically forward moderations. Peers have to indicate that they are willing to forward moderations for certain moderators. Moderations are signed using the Elliptic Curve Digital Signature Algorithm which is also used to verify the PermIDs. This enables the peers to verify the authenticity of the message even if it is forwarded by a third party. The protocol allows for rate-control to minimize bandwidth consumption. The above design is implemented as a proof-of-principle. Several simulations have been conducted to determine scalability and robustness to fraud. Please read the following documents for more details: --- by Vincent, --- ModerationCAST design document of 7 pages with the message format, a nd finally the very extensive --- Msc thesis on P2P moderation] Initial Implementation Design(Dave and Rameez currently working on this) For an initial implementation we decided to simplify, where possible, Vincents design and also make a few additions. Mainly, we have excluded a lot of the meta-data fields only keeping essentials. ModerationCast is a pre-requise for VoteCast, which will allow users to rate (vote for) moderators. ModerationCast produces three kinds of message:
Moderation_Have message:
repeated up to 100 records After BuddyCast returns a peer from the overlay (approx. every 15 seconds) a Moderation_Have message is passed to it (push) containing a list of hashes and time_stamps of moderations, stored locally. The hash represents a .torrent and the time_stamp indicates the creation time of the stored moderation. Up to 100 such Hash, Time_stamp pairs may be sent in one message. The local node selects moderations to include in the list based on a 50:50 policy. 50% of moderations are selected randomly and 50% of moderations are selected based on time_stamp recency. Moderation_Request message:
repeated up to 100 entries Any node that receives a Moderation_Have message examines it to determine if it wishes to request any of the available moderations. A node will ask for any new (previously unseen) moderation or any more up-to-date moderation (based on time_stamp). More up-to-date moderations overwrite old moderations. The node sends back a Moderation_Request message containing a list of the Hashes of the required moderations. Moderation_Reply message:
Repeated up to 100 records The Moderation_Reply message contains the actual moderation metadata requested by the remote peer. The local peer extracts the request moderations from its localDB and sends it. Moderation table: stored in local megacache for each moderator encountered is mod-perm-ID, vote (0,+ or -) Attachments
|