Coder Social home page Coder Social logo

kxsystems / arrowkdb Goto Github PK

View Code? Open in Web Editor NEW
26.0 26.0 12.0 507 KB

kdb+ integration with Apache Arrow and Parquet

Home Page: https://code.kx.com/q/interfaces

License: Apache License 2.0

CMake 1.07% Batchfile 0.22% Shell 1.11% q 2.10% C++ 68.79% C 5.26% Raku 15.38% Perl 6.08%
arrow kdb parquet q

arrowkdb's People

Contributors

nmcdonnell-kx avatar nugend avatar vgrechin-kx avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

arrowkdb's Issues

Infinite handling

Hi,

I see the last release of the arrowkdb package contained new feature that supports null handling. Is there any plans to introduce a similar feature for infinities? Ideally, we would like any of the infinity values in our table to be set to nulls when writing to parquet files using .arrowkdb.pq.writeParquetFromTable.

Thanks,

Any way to support dictionary encoded columns with ARROW_CHUNK_ROWS?

At the moment this throws an unequal length arrays error when attempting to pass the dictionary and indices in for a column represented that way.

The functionality is definitely supported by the AtrowStream format. It seems like the issue is that the MakeDictionary function and the MakeChunkedArray function don’t play nicely together. I’m not sure what the preferred solution is. I’m happy to handle preparing the value array manually and passing the indices in with an explicit reference if that’s what’s needed.

If you want to handle it in the library, my guess is you could handle the values and indices in separate passes?

Docs should be included with release

New docs folder wont currently be added to a release build - should be included in the future while docs are there.
.travis.yml example area
e.g.
elif [[ $TRAVIS_OS_NAME == "windows" ]]; then
7z a -tzip $FILE_NAME README.md install.bat LICENSE q examples;
elif [[ $TRAVIS_OS_NAME == "linux" || $TRAVIS_OS_NAME == "osx" ]]; then
tar -zcvf $FILE_NAME README.md install.sh LICENSE q examples;

Separate nulls for nested char arrays and symbols.

Don’t believe it’s currently possible. Poked in at the current mappings and it seems tractable given the separation of the writing code paths between symbols and nested char arrays.

Would be nice!

how can we get symbols to be written as dictionary encoded strings

it seems the mechanism for doing this is in the library:
for instance given a symbol vector:

sym:`a`b`c`a`a`c
dvalues:distinct sym
indices:dvalues?sym
/ideally we would use the smallest type that can support the number of distinct symbols:
mt:(.arrowkdb.dt[`int8`int16`int32`int64])!im:floor 2 xexp 0 7 15 31
mkt:4 5 6 7h!im
indextype:mt bin c:count dvalues
indexktype:mkt bin c
datatype_symbol:.arrowkdb.dt.dictionary[.arrowkdb.dt.utf8[];indextype[]]
/we can even pretty print the type we want:
.arrowkdb.ar.prettyPrintArray[datatype_symbol;(dvalues;indexktype$indices);::]

but what's not clear is how to enhance the current inferSchema to do this calculation, this means that currently tables that have symbols are not the same after the round trip and all the symbols are cast to type string

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.