Coder Social home page Coder Social logo

Comments (7)

yiannisbot avatar yiannisbot commented on July 29, 2024

Thanks for raising this issue @davidd8! Indeed a worthwhile experiment to run, which we will prioritise as team capacity allows. At the moment, unfortunately, we do not have results for this specific ask. However, I'm including below some results between retrieving content from go-ipfs (using a controlled experiment, where we create CIDs from one geographic location and retrieve it from another) and retrieving content from the gateway (using PL gateway logs). The results are not directly comparable though, i.e., we do not request the same CIDs through the different resolution paths, so take them with a pinch of salt.

The first figure shows the CDF of the go-ipfs retrieval latency broken down in steps. Notice that Fig. d) includes the Bitswap discovery process, which times out at 1sec and is always negative in this experiment, because the CIDs we retrieve were artificially created, i.e., no other node can have this content. Fig. e) shows the DHT lookup only (i.e., excluding the Bitswap step and its 1sec timeout), while f) shows the Bitswap content fetch.

Screenshot 2022-04-06 at 09 22 55
Screenshot 2022-03-10 at 14 39 28

The next figure shows results from the logs of one of PL's gateways, albeit from a small sample.

Screenshot 2022-04-06 at 09 27 30


One other note related to your request: points 1 and 2, i.e., retrieval that includes a Hydra vs retrieval that doesn't is not something that can be configured at experiment setup time, as Hydras are integrated in the system and are expected (if spread across correctly) to always be present. So, in order to figure out the difference between these points, we'd have to carry out retrievals and just filter out and analyse those requests that did vs those that did not hit a Hydra peer. Let me know if you had any other thoughts on how the experiment could be set up. Alternatively, one would have to connect and direct a request to a Hydra peer explicitly - is that what you're looking for?

from network-measurements.

davidd8 avatar davidd8 commented on July 29, 2024

Thanks @yiannisbot, these graphs are great. If I'm understanding them correctly, it looks like the PL gateway has a p80 TTFB of < 75ms, where the fastest go-ipfs TTFB (yellow) has a p80 TTFB of ~1.2s (500ms DHT walk + 700ms content fetch). Is there a way to see the breakdown of response latency for the PL gateway based on whether it hits the cache, node store, or looks up a peer / DHT?

Your suggestion to retrospectively filter out the results and compare the analysis for requests that do and do not hit a Hydra peer, makes sense to answer (1) and (2).

For peering (3), an experiment with known CIDs could help show the difference in TTFB for go-ipfs nodes and the ipfs.io gateway. The trick may be having to control for ipfs nodes' proximity to the content. It may also be interesting to see if there's a significant difference between different IPFS clusters, as in compare response times between a Pinata CID to a web3.storage CID.

from network-measurements.

yiannisbot avatar yiannisbot commented on July 29, 2024

These are very interesting things to find out. The best will be to set up an experiment to benchmark these different content fetch avenues. I'll try to do that ASAP.

@davidd8 clarifying a few things relating to the graphs and your comments above:

the PL gateway has a p80 TTFB of < 75ms

Yes, but this includes the cached content, which is everything before (on the left) of the dashed line in the bottom figure above. So, it's not a "clean" p80. We'd have to exclude the cached responses in order to get a more realistic figure. Intuitively, a clean p80 (non-cached content) through the gateway would have to be at least as high as the DHT way, no?

Is there a way to see the breakdown of response latency for the PL gateway based on whether it hits the cache, node store, or looks up a peer / DHT?

See above, regarding the dashed line. The figure shows this, but it's not easy from this plot to figure out what's the CDF for cached vs non-cached content. We'd probably have to reprocess the data and produce a different plot.

the fastest go-ipfs TTFB (yellow) has a p80 TTFB of ~1.2s (500ms DHT walk + 700ms content fetch)

A few notes here:

  • The overall latency, if you include the initial Bitswap discovery too is closer to 2.1s (see leftmost top figure). But this includes the Bitswap discovery, which takes 1s in itself. It's debateable whether it's worth waiting that much, but that's the default behaviour of go-ipfs. Of course, in case content is found through Bitswap the fetch completes faster. We're looking into this with this RFM: https://github.com/protocol/network-measurements/blob/master/RFMs.md#rfm-16--effectiveness-of-bitswap-discovery-process
  • The fastest DHT walk indeed takes 500ms.
  • Content fetch for eu_central (yellow) indeed takes about 700ms. But that's not TTFB, it's for fetching the whole content, so more like TTLB :) The file size we experimented with is 0.5MBs, so it seems fetching (Bitswap) takes quite a while to complete for such a small file. This needs looking into too.

I'll put together an experiment description and come back to you for a review.

from network-measurements.

yiannisbot avatar yiannisbot commented on July 29, 2024

Intuitively, a clean p80 (non-cached content) through the gateway would have to be at least as high as the DHT way, no?

Correction: this is unless content is found (through Bitswap) in one of the clustered peers, i.e., not cached locally, but found in one of the directly connected storage clusters, which are queried through Bitswap before getting to the DHT step.

from network-measurements.

yiannisbot avatar yiannisbot commented on July 29, 2024

Grant to support this experiment: https://www.dgm.xyz/grants/g5riWRq4BkhDvl9vsjda

from network-measurements.

dennis-tra avatar dennis-tra commented on July 29, 2024

A quick update on the results of DHT lookup times with and without considering Hydra peers.

In the below graphs, we conducted two parallel DHT experiments with six nodes each. Those nodes ran in all different AWS regions - similar to the experimental setup in our SigComm paper. The first set of six nodes was operated without any special modifications. The second set of six nodes was fed a list of Hydra Head PeerIDs that were excluded from any DHT operation. E.g., if another DHT server node serves us a Hydra head as a closer peer to the desired CID we don't take that response into account. Different to the experiment in the SigComm paper we pumped the kubo version to 0.16.0.

With that being said, these are the results:

Hydras Excluded:

image

Hydras Included:

image

Some numbers:
Region me_south_1 did 2155 retrievals
Region ap_southeast_2 did 2779 retrievals
Region af_south_1 did 2086 retrievals
Region us_west_1 did 2787 retrievals
Region eu_central_1 did 2652 retrievals
Region sa_east_1 did 2762 retrievals

af_south_1: 50th 3.4344015, 90th 4.663337, 95th 5.53278475
ap_southeast_2: 50th 3.648888, 90th 7.128013400000002, 95th 11.181374599999996
eu_central_1: 50th 2.388563, 90th 6.2162819, 95th 9.009752949999996
me_south_1: 50th 2.829238, 90th 3.690286200000001, 95th 4.335971499999999
sa_east_1: 50th 3.4066875000000003, 90th 7.263756200000001, 95th 9.794784199999997
us_west_1: 50th 2.879004, 90th 8.585243799999999, 95th 11.557577999999996

ALL: 50th 3.125124, 90th 6.047937, 95th 9.298019
Both DHT walks : 50th 0.637327, 90th 1.373615, 95th 2.24039
One DHT walk : 50th 0.3186635, 90th 0.6868075, 95th 1.120195

Region me_south_1 did 587 publications
Region ap_southeast_2 did 538 publications
Region af_south_1 did 587 publications
Region us_west_1 did 540 publications
Region eu_central_1 did 565 publications
Region sa_east_1 did 562 publications

As these graphs are hard to compare, @WYiluo went ahead and produced these comparison charts:

50th percentile:

image

90th percentile:

image

95th percentile:

image

Note, these charts don't just compare the DHT walks but the overall retrieval latencies!


References


Talk

from network-measurements.

yiannisbot avatar yiannisbot commented on July 29, 2024

Closing this issue as Hydras are not even part of the network anymore. We're gearing up to have comparative performance between the DHT and IPNI. Initial thinking outlined here: #47 with separate issues, or RFMs to fllow.

If more help is needed on this, please re-open or create a new issue. Thanks.

from network-measurements.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.