Content Search
Status: operational Latest news: As of October 2009 we believe that Tribler includes the fastest known zero-server P2P implementation of keyword search: median response time is: 325 milliseconds. This means that search results in Tribler V5.2 often appear faster on your screen then in Google, Twitter or Youtube. Disclaimer: Tribler V5.2 will be released in Feb 2010 and is currently only available in source code form at svn.tribler.org Finding content is the prime functionality of a P2P system. No matter how fast downloads are, everything starts with finding something interesting. Finding content should be as easy as using Google. Ideally, the first search result is exactly what you want and is a fast-downloading healthy swarm. Protection against spam is another key ingredient. If you were not looking for V-i-a-g-r-a or a special home video it should not appear in your search results. In user-generated video communities such as Youtube and Tribler this is a difficult problem. The key requirements of P2P search are thus: fast results, correct results and spam protection. Search speedCurrent median search speed of the latest Tribler Python code is 325 ms. The first search results thus quickly appear on your screen. For several years we have improved the search feature using our own overlay network. Search speed is being improved in every release by using more sophisticated algorithms. In the first years our search network was still small, slow and vulnerable to Tribler-specific spam attacks. The workings of the underlying P2P algorithms are described in various pages on this wiki: decentralised content discovery, DecentralizedRecommendation, SimilarityFunction, ClickLog, AdversarialContentSearch and Moderation (in progress). To measure our search response time we conducted an experiment using our live search overlay. Using similar keywords that our users use we send queries into the P2P network. We recorded the response time for numerous queries by measuring the time between sending the query request and first incoming results. The graph below shows two lines, one for the old algorithm and the improved approach of 5.2 version. We are very please that now with 95% probability search results arrive within 1 second. Thus we achieved an important milestone: sub-second search in a full self-organising system. Unbounded scalability and semantic clusteringDue to the lack of servers in Tribler architecture we believe our algorithms should be able to scale to many millions of users. This is important aim for the Tribler team as we aim for using Tribler in television hardware. With unbounded scalability every future Internet-connected television can be connected to a single global network. In the past security, speed and search quality where in conflict. With Tribler we moved to a new generation: semantic clustering and full-replication of swarm info. Semantic clustering means that the peers in the network form connections with peers of similar TV viewing behavior. Or at a technical level: peers which download content with the same info_hash. We also exploit the fact that hard disks of today are big and cheap. We integrate full replication of metadata: every peer in the network stores search info for 50.000 swarms (.torrent files only). The formula above is used to calculate how similar another peer in our overlay is. Assuming that peer u has created a peer-item matrix with received Buddycast messages and its own preference list, the similarity of peer u' to u is calculated by this formula :
P2P Content Search AlgorithmsNew in upcoming V5.2 are P2P RSS feeds and moderations. Keyword search was the main improvement of V4.1. We had many complaints that the core P2P features of search & download where poor in Tribler V4.0. ContentSearch improvements :
Version V4.0 of Tribler keeps all .torrent information in main memory. A Google-like keyword search window will replace the browse all feature as the default opening screen For V4.1 we no longer keep all .torrent information in main memory. Like many other parts of Tribler this info needs to be loaded on-demand. Partial word matches are no longer shown and results with 1 matching keyword of a 2 keyword search are also no longer shown (aka default AND). A search of look 2007 does not return The.Outlook.2007.HDTV.
Note that even the most basic in information retrieval algorithms are not yet included: Keyword frequency, BM25 in Tribler. On the V4.1 : Enable remote queries of your MegaCache
(partly?) Implemented by Arno... For Tribler verion 4.2 : Keyword searches are more powerfull
User InterfaceCanceled Tribler user interface design: Mixing user-genrated video results and social network search has proven to be difficult for people to understand. long-term Research ideasMapping of the user desires with the available content. Craft a iterative cycle of re-search using results feedback. Feedback on keyword search convergence. Real-time search keyword suggestion. pairs of keywords & click-log. Implicit tagging. Adversarial content search. Usage of click-log approach in P2P search with privacy preservation and securityArjen: click-log approach with maximum privacy. Spam and adversarial behavior is the key challenge of this research. Understanding the trade-off between performance, relevance, scalability, robustness, etc. Exploit the usage of social networks and friends for security & privacy. Use of aggregation to combat privacy leakage. Johan requirements: running experimental code and zero-servers. Attachments |