Comments (4)
I see your point @tmtmtmtm.
I'll add an option to explicitly turn off boolean inferencing. I'll also fine-tune the inferencing heuristic a bit more to further minimize false positives:
- if there are only two records, turn it off
- add more conditions to the inferencing logic
from qsv.
Yes. This is by design @tmtmtmtm
If the cardinality of a column is 2, and the two unique values are "truthy" - i.e. the first character is 1/0, t/f, y/n case-insensitive, then its inferred to be a boolean field.
In the real world - this heuristic should be good enough.
Did you find this behavior in a "real-world" dataset?
from qsv.
I improved it a bit to minimize True/False boolean inferencing false positives.
Lemme know how it works for you. Feel free to reopen.
from qsv.
Did you find this behavior in a "real-world" dataset?
Yes. I don't think I'd have stumbled across it otherwise :)
A large part of my use of qsv is in a pipeline of handling changes to data. In the case in question here, exactly two records had changed, but this time the names triggered this false inference, so when the CSV was turned into JSON and passed through to the next stage of the pipeline, everything went badly awry.
Am I reading the code correctly the new failure case would be if the names started with "Tr" and "Fa" rather than just "T" and "F"? If so, that's definitely better, but it's still not one I'm particularly comfortable in presuming would never happen.
Perhaps there could be an option to turn this behaviour off? It smells too much like all those scenarios where Excel silently mangles some dates on import etc.
from qsv.
Related Issues (20)
- `slice`: add a `--json` flag for JSON output HOT 4
- BUG Incorrect delimiter in qsv sniff HOT 7
- qsv validate - valid pass feature request HOT 2
- What would it take to get `foreach` working on Windows? HOT 4
- `stats` & `frequency`: add a `--json` flag for JSON output
- `foreach`: add `--dry-run` option HOT 1
- qsv count HOT 1
- Locating.installing qsv HOT 2
- sql windows functions HOT 7
- add `--no-headers` support to qsv cat rowskey HOT 2
- group by HOT 1
- `frequency`: add `--other` option HOT 5
- `luau`: additional helper functions
- `search`: add preview and JSON options HOT 1
- `search` & `searchset`: when a CSV is indexed, parallelize search
- partition file into files with n rows each (except for last file) HOT 7
- `stats` command writes output file even when `--output` is not set HOT 7
- `stats`: Max precision for float types HOT 5
- sqlp selects wrong data when multiple tables have the same named column HOT 5
- Getting "usage error: " prepended to help messages for commands HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from qsv.