Comments (10)
I took another look, we already have arrowutils.ReorderRecord
which takes care of all the case I mentioned. We can go back to @thorfour sample and build array.Int32
instead of *array.Boolean
from the bitmap. Pretty much calling (*Bitmap).ToArray()
will give you the indices or we can avoid allocation and iterate on the bitmap to build the indices array.
Something like
b := array.NewInt32Builder(...)
b.Reserve(int(bitmap.GetCardinality()))
bitmap.Iterate(func(x uint32) bool {
b.Append(int32(x))
return true
})
indices := b.NewArray()
So steps will be
*Bitmap
->*array.Int32
- call
arrowutils.ReorderRecord
with the result - Profit
from frostdb.
Sorry, apparently fat fingered the close button.
I played around with this today
func filter(ctx context.Context, pool memory.Allocator, filterExpr BooleanExpression, ar arrow.Record) (arrow.Record, bool, error) {
bitmap, err := filterExpr.Eval(ar)
if err != nil {
return nil, true, err
}
if bitmap.IsEmpty() {
return nil, true, nil
}
// Construct filter array
// NOTE: this is intermediary right now. The Eval function should return a boolean array instead so we can directly pass it in.
bldr := array.NewBooleanBuilder(pool)
defer bldr.Release()
for i := 0; i < int(ar.NumRows()); i++ {
bldr.Append(bitmap.Contains(uint32(i)))
}
filterArr := bldr.NewArray()
defer filterArr.Release()
result, err := compute.FilterRecordBatch(ctx, ar, filterArr, compute.DefaultFilterOptions())
if err != nil {
return nil, true, err
}
return result, false, nil
}
And ended up with a panic
--- FAIL: Test_DB_All (0.00s)
db_test.go:2085:
Error Trace: /Users/thor/go/src/github.com/polarsignals/frostdb/db_test.go:2085
Error: Received unexpected error:
not implemented: function 'array_take' has no kernel matching input types (dictionary<values=utf8, indices=uint32, ordered=false>, uint16)
Test: Test_DB_All
Looks like it's not implemented for all types.
from frostdb.
That is unfortunate. Should we open an issue on apache arrow for this?
from frostdb.
Yea and link it here so we know if/when we can actually implement this.
from frostdb.
Yea and link it here so we know if/when we can actually implement this.
No need for this. We can in fact implement it now. Basically, this is how filter on records works
- build array of indices containing
rows
you wan't to choose - for each record column use
array_take
to select rows of interest - assemble result of previous step (concurrently)
- build a new record with the result.
So, actually @thorfour solution is correct, we just need to be smart with array.Dictionary
columns. We can be smart, and take
and assemble
ourselves when we know we have *array.Dictionary
column and delegate to compute
for the rest.
So, @thorfour can you please open the PR with your changes ? I will help and make sure we massage it until it works for any case we currently have, we will probably need to run benchmarks as well to make sure we don't introduce regressions.
from frostdb.
Quick check says it is not a simple change. I'm taking this task. Will submit supplementary patches to make it possible.
from frostdb.
๐ finally I have this working. I will wait for supplementary patches to land then I will drop the PR.
from frostdb.
Done #697 . I wish someone with access to production like workload would do some benchmarks and give us numbers.
from frostdb.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
from frostdb.
Related Issues (20)
- [Proposal] Consolidate string column handling HOT 6
- Notes on vectorisation HOT 7
- pqarrow/arrow.go assert *array.Dictionary only support *array.Binary type HOT 3
- not completely fixed https://github.com/polarsignals/frostdb/pull/662#issue-2056817397 HOT 3
- can we support percentile statistics of aggregation function๏ผ HOT 3
- records.Build can be moved out of internal packages, so that we can use it to directly build arrow.record? HOT 6
- unsupport aggregation specific dynamic column HOT 1
- Move prehash function to happen before write to WAL
- unsupport avg on a single dynamic column HOT 1
- OrderedAggregate leaks memory HOT 1
- What is the state of schema v1alpha2 ? HOT 18
- memory leak for PredicateFilter HOT 1
- panic: Duplicate registration HOT 1
- `Test_Table_ReadIsolation` flaky
- MergeRecords can support array.Float64 sort? HOT 4
- Proper transaction support HOT 4
- Snapshot refactor
- index: block rotation deadlock due to not releasing parts
- snapshot: two snapshots at the same txn causes data loss
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from frostdb.