Comments (6)
Thanks for the report and for trying the tools.
The only tool explicitly supporting Windows line-endings on Linux is csv2tsv
. You are correct, tsv-append
works fine, and a couple other tools as well, but it's more accidental than by design.
What's going on is that the tools are using D standard library functions for reading lines, these functions assume unix line endings on unix platforms. If the file has Windows line ending (a \r\n
pair), the line is left with an extraneous \r
character at the end of the last field. If this extraneous \r
interfers with processing, the tool doesn't work.
In the case of tsv-summarize
and tsv-filter
, if they try to interpret that the last field as numeric value, the conversion will fail. However, even if they don't perform a conversion, the tools are not necessarily working correctly. tsv-select
for example, isn't really preserving Windows line endings.
As an example:
$ # This outputs the Windows line endings
$ echo $'AA\tXX\tYY\r\nBB\tXX\tYY\r' | grep $'YY\r'
AA XX YY
BB XX YY
$ # Select the last field.
$ echo $'AA\tXX\tYY\r\nBB\tXX\tYY\r' | tsv-select -f 3
YY
YY
$ # Grep shows the Windows line ending
$ echo $'AA\tXX\tYY\r\nBB\tXX\tYY\r' | tsv-select -f 3 | grep $'YY\r'
YY
YY
$ # Select the second field
$ echo $'AA\tXX\tYY\r\nBB\tXX\tYY\r' | tsv-select -f 2
XX
XX
$ # Grep shows its not a Windows line ending.
$ echo $'AA\tXX\tYY\r\nBB\tXX\tYY\r' | tsv-select -f 2 | grep $'XX\r'
$
I'm not inclined to add support for Windows line endings on Unix platforms. The dos2unix
tool is a good tool for this and fits the pipeline approach being used by the tools.
However, something is going wrong with the error message formatting, and the error message should identify a Windows line ending as a likely problem. The documentation for the tools should also discuss line endings. I'll have to look into both of these.
Regarding csv2tsv
- This tool explicitly supports Windows line-endings because they are commonly used in many programs that generate CSV files.
from tsv-utils.
The badly formatted error message is due to the \r
character being included in the error message. I'll have to fix it.
from tsv-utils.
Thanks for your explanation and the great tools.
As you said, I used dos2unix
to convert the line endings, so this issue wasn't deterrent to my work. Updating the documentation and especially the error message is more than enough to solve this issue.
from tsv-utils.
Current plan: On Unix builds, check for Windows/DOS line endings when processing the first line of a file. That should handle most cases prior to ever hitting the error message. Regarding the poor error message format: There's an open D bug for it: https://issues.dlang.org/show_bug.cgi?id=17708
from tsv-utils.
Addressed by PR #103, merged to master. Will be included in the next release.
from tsv-utils.
Included in release v.1.1.16.
from tsv-utils.
Related Issues (20)
- AUR package with LTO & PGO enabled HOT 2
- How to best use the code as a library? HOT 4
- Improve tsv-pretty lookahead logic [tsv-pretty mistake in column formatting.] HOT 8
- bufferedByLine does not work with File due to @safe <> @system conflict HOT 3
- Issue with installing on Windows 10 using D / build failure HOT 28
- tsv-summarize: Slice SummarizerBase._operators when invoking std.algorithm.each
- Inconsistent newline handling on Windows HOT 2
- Status of Windows build HOT 6
- Bulding tsv-utils with LTO and PGO on Archlinux HOT 14
- Homebrew install HOT 6
- Package tsv-utils for conda(-forge)? HOT 1
- No linux release assets for v2.2.1
- -bash: ./tsv-pretty: cannot execute binary file HOT 1
- Ability to produce proper CSV files
- Sort using column names
- tsv-append: limit number of rows per file? [feature request]
- Error [tsv-filter]: Not enough fields in line. File: c.tsv, Line: 1425063 HOT 1
- ENH: Add ARM64 build assets for native functionality on M1 macs (the future)
- Q: any API doc? how to skip empty field in csvReader?
- Updated benchmarks including qsv
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tsv-utils.