Coder Social home page Coder Social logo

Comments (14)

jbellis avatar jbellis commented on May 24, 2024 1

Let's use the following grid for the existing parameters

        var mGrid = List.of(16, 24, 32, 48);
        var efConstructionGrid = List.of(100, 200, 400);
        var efSearchFactor = List.of(1, 2);

from jvector.

dlg99 avatar dlg99 commented on May 24, 2024

Raw output and plots for:

        var files = List.of(
                "../ivec/pages_ada_002",
                "../hdf5/nytimes-256-angular.hdf5",
                "../hdf5/glove-100-angular.hdf5",
                "../hdf5/glove-200-angular.hdf5",
                "../hdf5/fashion-mnist-784-euclidean.hdf5",
                "../hdf5/sift-128-euclidean.hdf5");

        var mGrid = List.of(16, 24, 32, 48);
        var efConstructionGrid = List.of(100, 200, 400);
        var efSearchFactor = List.of(1, 2);
        var diskOptions = List.of(true, false);

bench_ada_hft5.txt
hdf5fashion-mnist-784-euclideanhdf5_plot
hdf5glove-100-angularhdf5_plot
hdf5glove-200-angularhdf5_plot
hdf5nytimes-256-angularhdf5_plot
hdf5sift-128-euclideanhdf5_plot
ivecpages_ada_002_plot

from jvector.

jbellis avatar jbellis commented on May 24, 2024

First graph includes PQ=0, is that a bug in the plot?

from jvector.

jbellis avatar jbellis commented on May 24, 2024

i think something is broken w/ the last one (possibly w/ the inputs), recall of 0.003 is way too low

from jvector.

jbellis avatar jbellis commented on May 24, 2024

I would like to see graphs of the plots constrained to PQ of 1/8 original size (current one-size-fits-all setting), and 1/16; if recall does not drop significantly at overquery=2, then also include 1/32

from jvector.

dlg99 avatar dlg99 commented on May 24, 2024

PQ=0 is filled on the chart when pq is not used (diskOptions == false)

from jvector.

dlg99 avatar dlg99 commented on May 24, 2024

this is ada 100k with dataset downloaded from s3 (previous was from gdrive+slack) and PQ as

        List<Integer> pqDimensions = new ArrayList<>();
        int dims = ds.baseVectors.get(0).length;
        for (int i = 2; i <= 32; i *= 2) {
            if (dims / i > 1) {
                pqDimensions.add(dims / i);
            }
        }

bench_ada_100k.txt
ivec2100kpages_ada_002_100k_plot

from jvector.

dlg99 avatar dlg99 commented on May 24, 2024

ada 1M does not work because "MappedRandomAccessReader doesn't support large files"

from jvector.

jbellis avatar jbellis commented on May 24, 2024

that's addressed in main branch, but 100k is fine for now

from jvector.

dlg99 avatar dlg99 commented on May 24, 2024

Other datasets with the same selection of PQ:
bench_others.txt
hdf5fashion-mnist-784-euclideanhdf5_plot
hdf5glove-100-angularhdf5_plot
hdf5glove-200-angularhdf5_plot
hdf5nytimes-256-angularhdf5_plot
hdf5sift-128-euclideanhdf5_plot

from jvector.

jbellis avatar jbellis commented on May 24, 2024

Going back to the original goal of evaluating PQ on Ada002 embeddings -- it looks like this is the first dataset we've found where even at 1/8 the size the recall at 16/100/OQ=2 is worse than the recall at 16/100/OQ=1/PQ=Off. So being even more aggressive with PQ is not warranted.

from jvector.

jbellis avatar jbellis commented on May 24, 2024
PQ@768 build 63.70s,
PQ encode 4.85s,
Build M=16 ef=100 in 15.08s with 0.40 short edges
  Query PQ=false top 101/1 recall 0.9434 in 11.77s after 146464790 nodes visited
  Query PQ=true top 101/1 recall 0.9201 in 72.69s after 147004020 nodes visited
  Query PQ=false top 101/2 recall 0.9702 in 22.05s after 250168950 nodes visited
  Query PQ=true top 101/2 recall 0.9703 in 123.74s after 251120730 nodes visited

I was looking at the wrong numbers in your graph (seduced by 1536/8=192)

from jvector.

jbellis avatar jbellis commented on May 24, 2024
PQ@384 build 35.77s,
PQ encode 3.04s,
Build M=16 ef=100 in 15.49s with 0.40 short edges
  Query PQ=false top 101/1 recall 0.9463 in 2.60s after 29129956 nodes visited
  Query PQ=true top 101/1 recall 0.8293 in 7.76s after 29300004 nodes visited
  Query PQ=false top 101/2 recall 0.9719 in 4.34s after 49809210 nodes visited
  Query PQ=true top 101/2 recall 0.9629 in 12.88s after 50224696 nodes visited
PQ@192 build 22.40s,
PQ encode 1.45s,
Build M=16 ef=100 in 15.10s with 0.40 short edges
  Query PQ=false top 101/1 recall 0.9427 in 2.68s after 30315300 nodes visited
  Query PQ=true top 101/1 recall 0.6806 in 4.19s after 31296206 nodes visited
  Query PQ=false top 101/2 recall 0.9691 in 4.69s after 51024144 nodes visited
  Query PQ=true top 101/2 recall 0.8682 in 6.49s after 51692598 nodes visited

from this very small sample it looks like it's okay to reduce to 384 if you're doing OQ=2, but not 192

from jvector.

jbellis avatar jbellis commented on May 24, 2024

[I switched from numruns=10 to numruns = 2 is why all the query times got much smaller]

from jvector.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.