Comments (5)
Definitely interested in contributions here - we have read-only support for a subset of this - lists I believe. Let me know if you want to discuss details of the design or anything else. If not and you are fine just researching and implementing that works for us too.
from tech.ml.dataset.
@archaic - is this still an issue for you? I would rather see this contribution in tmducken but arrow also makes sense.
from tech.ml.dataset.
I wasn't able to come up with a good solution. I would love to have this functionality, as most of the datasets I work with are annoying enough to have small segments of nested data that are difficult to wrangle into columns. I think it is a difficult problem to handle generically - does the schema get inferred, that is hard in itself?
I ended up falling back to using metosin/malli (for columns with complex schemas) to define schema's for columns, then the raw java arrow library to convert the malli schema's to arrow datatypes and column writers, however this seems like a regression to just be able to use a dataset and arrow/write! automatically.
from tech.ml.dataset.
Related Issues (20)
- `group-by-column->indexes` returns something that uses some kind of different key lookup equality? HOT 9
- Travis auto-tests are broken HOT 5
- left join on nil value fails - regression HOT 3
- Do `partition` and `partition-by` make any sense here in TMD? HOT 1
- left-join on longer datasets causes an error HOT 1
- CVE-2021-40531 on org.apache.datasketches/datasketches-java HOT 1
- left-join fails when options argument is nil HOT 2
- Documentation and the actual behavior of `select` do not match. HOT 2
- `ds/rows` produces something vector-of-maps-like that transit cannot handle HOT 7
- tribuo changes types between input dataset and prediction HOT 5
- tensor->dataset not working for 2-d arrays HOT 2
- dataset->categorical-maps does not work as documented
- make `invert-categorical-map` more strict on unknown reverse mapping values HOT 4
- add additional arrity for probability-distributions->label-column to specify result-data type
- strange cat map produced with multiple columns HOT 1
- not all comment lines are recognized as comments HOT 4
- Missing column when reading a parquet file HOT 5
- 'exact' type rolling window
- upgrade to org.tribuo 4.3.1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tech.ml.dataset.