Comments (4)
I would add dependency on PooledArrays.jl as it is lightweight and do as @dmbates suggests. Then, in particular this benefits DataFrames.jl which has a fast path for PooledVector
.
from arrow-julia.
Hmmm, yes, we should have better copy
definitions for Arrow column types. But we should also wrap Arrow.Table
in Tables.CopiedColumns
so DataFrames doesn't make a copy by default; the point of Arrow.jl is to very efficiently load data and be able to do DataFrames stuff on them, so we want to ensure the default avoids the copying. We should also highlight in the README/docs that the arrow format is very much immutable; you can run great analytics/queries over the data, but for mutating, it's an explicit step to convert away from the arrow formatted columns.
from arrow-julia.
We should have better copy definitions for Arrow column types
agreed
But we should also wrap Arrow.Table in Tables.CopiedColumns so DataFrames doesn't make a copy by default
This would be OK given the objective of the package, but as you note - we then should be very clear that this is the case. Note that in old CSV.jl 90% of issues/questions raised in DataFrames.jl were about immutability of returned columns (in the time they were immutable). And we then should add an instruction how to perform a copy somewhere in the docs.
from arrow-julia.
Ok, fix is up: #21
from arrow-julia.
Related Issues (20)
- html comment tag at the top of main documentation page may have one too many dashes at the beginning
- explanation of Arrow.Stream vs. Arrow.Table seems ambiguous HOT 3
- `Arrow.write` performance on large DataFrame HOT 3
- Bus errors when writing `DataFrame` HOT 8
- Arrow stream writer and reader implementation questions
- [feature request] support run-end encoded layout
- Custom type cannot round trip (Colors.jl) HOT 1
- colmetadata does not read custom metadata with multiple writes
- `getindex` broken with `SVector{3, UInt}` in the presence of missing data HOT 2
- Removing .arrow files without closing Julia seems impossible in Windows HOT 18
- support Dates.CompoundPeriod in deserialization?
- copy does not copy to standard Julia Types HOT 5
- Unexpected allocations HOT 2
- Type instability in getcolumn
- Cannot append DictEncode columns to Stream
- Arrow-over-HTTP client and server examples in Julia
- Deeply nested structs cause long compilation times HOT 9
- `snappy_jll v1.2.0` lead to Arrow_jll failed to build HOT 4
- Deserialization as Vector{SubArray} breaks `push!` on DataFrame HOT 7
- Add support for FileIO HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from arrow-julia.