IPv8 Datasync

We want to design an efficient data synchronization algorithm for our wild and wacky IPv8 idea. Very preliminairy running Python code is now available,plus detailed documentation.

Simulation results

IPv8 Database Tables

The IPv8 database consists of four tables.

PermIds: Table to hold just the PermIds

idPermId
  • id: Autoincrement number
  • PermId: Peer PermId

IPv8: Table to hold the IPv8 to IPv4/IPv6 relation with the following structure

PermIdIPv4 / IPv6PortCertificateTimestampOurTimestamp
  • PermId: Relation to PermIds table
  • IPv4 / IPv6: Last seen IPv4 or IPv6 address
  • Port: Last seen port
  • Certificate: Signed by PermId to which the IPv4 address belongs or by the PermId which created the entry?
  • Timestamp: Time the entry was created on the original peer
  • OurTimestamp: Time the entry was created on our node

RelationTypes : Table to hold relation types

RelationIdPeerRelationType
  • RelationId: Integer name of a relation type
  • PeerRelationType: Text describing the relation type

PeerRelations : Table to hold peer relations

!PermId1!PermId2RelationIdRelationStrengthFactorCertificateTimestamp
  • !PermId1: First's peer PermId, relation to PermIds
  • !PermId2: Second's peer PermId, relation to PermIds
  • RelationId: Relation id from the relation types table
  • RelationStrengthFactor: Somehow we calculate this using betweeness centrality
  • Certificate: !PermId1 signature of the unique identifier of the entry
  • Timestamp: Time the entry was created on the original peer

SyncLog : Table to hold the last successful sync timestamp

PermIdId
  • PermId: Peer's PermId with which we synced
  • Id: Row id sent by the Node we successfully synced with

Q: Are the PermId's always of size 113 chars?

  • Yes but there are 3 entries -out of 700k- in the permID file from SuperpeerLogs which are not, but probably an error while generating the file

Database Maintenance

Not agreed To keep the database clean and tidy and prevent it from growing huge, all the entries should auto expire after a predefined period. We can have an extra row in relation type table defining the grace period of each relation type. By auto expiring we don't need to transfer the database deletions across the network. Entries that need to be retained should be regenerated from the original peer.

Syncing Algorithm 1

Syncing using bloom filters


Node A want to sync with Node B A new node in the network directly enters live mode by contacting a peer

live mode

  1. Node A sends SYNC to Node B
  2. Node A send a bloom filter with the entries in it's database of the last 4/24/168 hours (depending on how we define the LIVE mode)
  3. Node B checks it's entries of the last 4/24/168 hours against the received bloom filter
  4. Node B sends back to Node A all the entries that did not match the bloom filter
  5. Node A received the entries and stores them into the database
  6. Node A goes to history mode

history mode

  1. Node A check if Node B PermID is in the FULLY SYNCED bloom filter
    • If Node B is not in the FULLY SYNCED bloom filter
      1. Node A reads from the SYNCLOG the last synced timestamp for Node B
        • If there is no entry in the SYNCLOG
          • If Node B is LIVE create a new entry in the SYNCLOG with timestamp at the last 4/24/168 hours
          • if Node B is not LIVE terminate connection.
      2. Node A creates a bloom filter with entries starting from timestamp and backwards for 4/24/168 hours
      3. Node A send the timestamp to Node B
      4. Node A sends the bloom filter to Node B
      5. Node B checks the selected entries against the received bloom filter
      6. Node B sends back to Node A all the entries from the selected set that did not match the bloom filter
      7. Node A saves the oldest timestamp in the SYNCLOG for Node B
    • If Node B is in the FULLY SYNCED bloom filter
      1. Do nothing, we are already fully synced with this node. GOTO catch up mode

catch up mode

  1. If Node A has not received updates the last 4/24/168 hours
  2. Record two timestamps
    • Now
    • The last update minus 4/24/168 hours
  3. Create a bloom filter in which will store the node permids which we used to fill this gap
  4. Send to Node B the two timestamps
  5. Node B selects SLICE entries (or up to the older timestamp which ever comes first) from it's database starting from the newest received timestamp
  6. Node B sends
    • the oldest and the newest timestamp of the selected entries back to Node A to prepare a bloom filter
    • '-1' if there are no entries between these timestamps
  7. Node A
    • If timestamps received: creates a bloom filter with the entries between these two timestamps
    • If '-1' received: saves Node B in the bloom filter for that gap.
  8. Node A sends the bloom filter to Node B
  9. Node B checks the selected entries against the received bloom filter
  10. Node B sends back to Node A all the entries from the selected set that did not match the bloom filter
  11. Node A saves the oldest timestamp in the SYNCLOG for Node B
  12. If Node A has synced with more than CATCHUPNODES this gap, we can delete the timestamps and the bloom filter and consider the gap synced
  13. Do the same for other gaps

Algorithm notes

  • Since we know beforehand how many entries the bloom filter will have we can predefine the error rate of it, leading to very few false positives.
  • Bloom filter false positives come with the cost of Node A never getting some database entries, because Node B thinks that there are already there
  • Note that the syncing is one way only. Node A gets updates from Node B
  • Since the bloom filter is generated on-the-fly we need a really fast implementation
  • We can further reduce the bandwidth consumption by compressing the returned database rows and the bloom filter. implemented
  • The algorithm successfully synchronizes nodes that have (never) synchronized before
  • SIZESLICE is a number that defines the maximum number of entries checked in Node B and therefore the maximum number of entries transfered from Node B to Node A.
  • The CPU time consumed and traffic sent from Node B is upper bounded to SIZESLICE
  • A node get maximum SIZESLICE updates from an other node. A bootstrapping node can run the update procedure with lower interval to get more updates faster.
  • We use bloom filter with 0.05 error rate. Test runs prove that you get false positives (therefore Node A does not receive all the updates) but the bloom filter size is a lot smaller than when error rate is 0.001. Plus when Node A syncs with Node C may get the missed updates from it.

Algorithm issues

  • Unclear algorithm - solved with version 2
  • If Node B has more than 1k entries to check against the bloom filter sent by Node A, the CPU time is really big (statistics to be added) - enhanced with version 2 but the problem still persists - solved with version 2.5 (problem now in client only)
  • If Node B entries >> Node A bloom filter, then we get many false positives Not a problem with version 2.5 since Node B entries are top bounded to SIZESLICE

Proof of concept code

Go to the IPv8Datasync/TestSuite

Benchmarks

All time tests run using python's timeit

Test machine: Core2Duo 2Ghz - 2Gbyte - Linux

Bloom filter creation

Entries Time (μsec)
100 5476.28
1000 53933.61
10000 525702.40

Discussion

Johan: so the challenge is to 1) select from, say 1 million items, the 100.000 new discovered items since a last given sync timestamp, then 2) create a Bloom filter which include those new items, 3) advertise those fresh items available for sync and 4) transfer the most interesting items ?

  • Giorgos: Yes that's the idea. We create a unique one-time bloom filter per node we sync. I guess we could include all the new discovered items since the last sync but this depends on how many new items we have.

Giorgos: Which data is more important to reach the bootstrapping node first? The most recent or the older?


Syncing Algorithm (2)

Sync using Set Reconciliation

Polynomial encoding of sets. Each peer evaluating the characteristic polynomial of their set at several points and using this information to compute the ratio of the two characteristic polynomials and hence their set differences. Tractable computational complexity and nearly optimal communication complexity . Multiple rounds of communication. The upper bound on the set of differences needs to be known. Exact reconciliation.

Proof of concept code (1)

  1. Download the proof of concept code provided by the authors in their website
  2. Use dos2unix to convert mainfile.cpp and prob_recon_support.cpp
  3. Apply the patch provided by Gertjan to compile successfully (Note: for some strange reason it still does not compile on some machines (e.g ubuntu 09.10 - gcc 4.4.1))
  4. Run ./configure && make

Proof of concept code (2)

There is CPISync a program to synchronize Notes database from PalmPilot with the computer. Full source code, project report and research paper.

Links


Syncing Algorithm (3)

Sync using Approximate Set Reconciliation

Approximate reconciliation. Less computation than exact reconciliation. Single round of communication. Utilizing Merkle tress, Bloom filters and Patricia tries. Combining the best from Bloom filters and Merkle trees, namely quicker searches for small numbers of differences without the complications of managing the tree structures.

Proof of concept code

None

Links


Syncing Algorithm (4)

Optimized Union of Non-disjoint Distributed Data Sets

Different approach from the 1 to 1 used in the previous algorithms. In this case a node sends a sync signal to many other nodes. The sync receivers communicate privately and decide using special algorithms who is going to send what. The nodes to not have to share all the same items. The transfer is done in a way that bandwidth is utilized at a maximum level. The node requesting the sync does not receive duplicate items from the sync nodes.

Links


Syncing Algorithm (5)

Efficient Reconciliation of Unordered Databases

The paper introduces two ways to reconcile unordered databases in peer to peer networks. The one method is an optimization of the other. Basically hosts exchange hashes of their database. If the hashes do not match they split the database in half (or in the median value for the optimized version) and check gain if the halfs match. If the half do not match they split and try again. If the size of the split is <= 1 they exchange the actual values of the database. Algorithms in pseudo-code 3.1 and 3.2 in the paper

This is also an older work of Minsky, the author of practical set reconciliation which uses the polynomial representation of the set

Proof of concept code

None

Links


Possible future work: Secure and scalable peer discovery tool

Build database of 100.000+ peers around you integral security mechanism of long-term relationships for indirect reciprocity and web of trust scalability by selecting similar peers with a generic similarity or clustering function. Research question: how complex is such a tool, how can be reduce this tool to be the most simplistic, yet powerfull. How does it behave? how can we balance finding similar peers with finding trustworthy peers? Does this conflict or can a balance be struck, and how?