Coder Social home page Coder Social logo

Comments (10)

gernest avatar gernest commented on June 20, 2024 1

I took another look, we already have arrowutils.ReorderRecord which takes care of all the case I mentioned. We can go back to @thorfour sample and build array.Int32 instead of *array.Boolean from the bitmap. Pretty much calling (*Bitmap).ToArray() will give you the indices or we can avoid allocation and iterate on the bitmap to build the indices array.

Something like

	b := array.NewInt32Builder(...)
	b.Reserve(int(bitmap.GetCardinality()))
	bitmap.Iterate(func(x uint32) bool {
		b.Append(int32(x))
		return true
	})
	indices := b.NewArray()

So steps will be

  • *Bitmap -> *array.Int32
  • call arrowutils.ReorderRecord with the result
  • Profit

from frostdb.

thorfour avatar thorfour commented on June 20, 2024

Sorry, apparently fat fingered the close button.

I played around with this today

 func filter(ctx context.Context, pool memory.Allocator, filterExpr BooleanExpression, ar arrow.Record) (arrow.Record, bool, error) {
      bitmap, err := filterExpr.Eval(ar)
      if err != nil {
          return nil, true, err
      }

      if bitmap.IsEmpty() {
          return nil, true, nil
      }

      // Construct filter array
      // NOTE: this is intermediary right now. The Eval function should return a boolean array instead so we can directly pass it in.
      bldr := array.NewBooleanBuilder(pool)
      defer bldr.Release()
      for i := 0; i < int(ar.NumRows()); i++ {
          bldr.Append(bitmap.Contains(uint32(i)))
      }
      filterArr := bldr.NewArray()
      defer filterArr.Release()

      result, err := compute.FilterRecordBatch(ctx, ar, filterArr, compute.DefaultFilterOptions())
      if err != nil {
          return nil, true, err
      }
      return result, false, nil
  }

And ended up with a panic

--- FAIL: Test_DB_All (0.00s)
    db_test.go:2085:
        	Error Trace:	/Users/thor/go/src/github.com/polarsignals/frostdb/db_test.go:2085
        	Error:      	Received unexpected error:
        	            	not implemented: function 'array_take' has no kernel matching input types (dictionary<values=utf8, indices=uint32, ordered=false>, uint16)
        	Test:       	Test_DB_All

Looks like it's not implemented for all types.

from frostdb.

garrensmith avatar garrensmith commented on June 20, 2024

That is unfortunate. Should we open an issue on apache arrow for this?

from frostdb.

thorfour avatar thorfour commented on June 20, 2024

Yea and link it here so we know if/when we can actually implement this.

from frostdb.

gernest avatar gernest commented on June 20, 2024

Yea and link it here so we know if/when we can actually implement this.

No need for this. We can in fact implement it now. Basically, this is how filter on records works

  • build array of indices containing rows you wan't to choose
  • for each record column use array_take to select rows of interest
  • assemble result of previous step (concurrently)
  • build a new record with the result.

So, actually @thorfour solution is correct, we just need to be smart with array.Dictionary columns. We can be smart, and take and assemble ourselves when we know we have *array.Dictionary column and delegate to compute for the rest.

So, @thorfour can you please open the PR with your changes ? I will help and make sure we massage it until it works for any case we currently have, we will probably need to run benchmarks as well to make sure we don't introduce regressions.

from frostdb.

gernest avatar gernest commented on June 20, 2024

Quick check says it is not a simple change. I'm taking this task. Will submit supplementary patches to make it possible.

from frostdb.

gernest avatar gernest commented on June 20, 2024

๐Ÿ˜“ finally I have this working. I will wait for supplementary patches to land then I will drop the PR.

from frostdb.

gernest avatar gernest commented on June 20, 2024

Done #697 . I wish someone with access to production like workload would do some benchmarks and give us numbers.

from frostdb.

github-actions avatar github-actions commented on June 20, 2024

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

from frostdb.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.