mithrandie / csvq Goto Github PK
View Code? Open in Web Editor NEWSQL-like query language for csv
Home Page: https://mithrandie.github.io/csvq
License: MIT License
SQL-like query language for csv
Home Page: https://mithrandie.github.io/csvq
License: MIT License
Command cvsq --format CSV --without-header
outputs an unnecessary blank line (newline) when the input CSV file contains only a header (no data rows).
$ printf "date,source_currency_code,target_currency_code,exchange_rate" |
csvq --format CSV --without-header "SELECT * FROM STDIN"
$
By contrast, command cvsq --format CSV
correctly outputs only the header:
$ printf "date,source_currency_code,target_currency_code,exchange_rate" |
csvq --format CSV "SELECT * FROM STDIN"
date,source_currency_code,target_currency_code,exchange_rate
$
This is a minor, but rather annoying bug because it means that when appending the output of csvq --format CSV --without-header
of a CSV file that contains no rows to another CSV file, csvq
appends a blank line.
Please make csvq
available as a Debian package and submit it for inclusion in the Debian package repository.
Hi,
the one thing I am missing in this awesome piece of code, is a UNION clause.
Can you contemplate this?
el
I have found out that quote symbol as a first symbol of a field can break the reader, if this quote is not a field quoting, but an ordinary quote. Tested with v1.13.1.
The source data is:
Код формы;Наименование показателя;Единицы измерения;Комментарий
1-МТС;"""░░░░░░ - ░░░░"", ░ ░░░ ░░░░░ ░░░░░░░░░░░░░ ░░░ ░░ ░░░░░░░░░░ ░░░░░░(░░░. ░░░.)";░░░. ░░░.;
1-МТС;"""░░░░░░ - ░░░░"", ░ ░░░ ░░░░░ ░░░░░░░░░░░░░ ░░░ ░░ ░░░░░░░░░░ ░░░░░░(░░.)";░░.;
1-МТС;"""░░░░░░ - ░░░░""; ░░░░░░░░░░ (░░░░░░░░░░) ░░░░░░░░░ ░░░░░░░░░░░░░░░░░░░░ ░░░░░░░, (░░░. ░░░.)";░░░. ░░░.;
1-МТС;"""░░░░░░ - ░░░░""; ░░░░░░░ ░░░░░ ░ ░░░ (░░░░░░░░░) ░░ ░░░░░ ░░░░, (░░.)";░░.;
1-МТС;"""░░░░"" ░ ░░░░░░ ░░░░░░░░░░░░░, ░ ░░░ ░░░░░ ░░░░░░░░░░░░░ ░░░ ░░ ░░░░░░░░░░ ░░░░░░(░░░. ░░░.)";░░░. ░░░.;
1-МТС;"""░░░░"" ░ ░░░░░░ ░░░░░░░░░░░░░, ░ ░░░ ░░░░░ ░░░░░░░░░░░░░ ░░░ ░░ ░░░░░░░░░░ ░░░░░░(░░.)";░░.;
1-МТС;"""░░░░"" ░ ░░░░░░ ░░░░░░░░░░░░░; ░░░░░░░░░░ (░░░░░░░░░░) ░░░░░░░░░ ░░░░░░░░░░░░░░░░░░░░ ░░░░░░░, (░░░. ░░░.)";░░░. ░░░.;
1-МТС;"""░░░░"" ░ ░░░░░░ ░░░░░░░░░░░░░; ░░░░░░░ ░░░░░ ░ ░░░ (░░░░░░░░░) ░░ ░░░░░ ░░░░, (░░.)";░░.;
This is a semicolon-separated csv. I've replaced the text with blocks for convinience.
Let's update this file test_trgt.csv
:
csvq -Q -d ; -D ; 'update test_trgt set `Комментарий` = '+++' where `Код формы` = '1-МТС'
The file will become this:
Код формы;Наименование показателя;Единицы измерения;Комментарий
1-МТС;"░░░░░░ - ░░░░", ░ ░░░ ░░░░░ ░░░░░░░░░░░░░ ░░░ ░░ ░░░░░░░░░░ ░░░░░░(░░░. ░░░.);░░░. ░░░.;+++
1-МТС;"░░░░░░ - ░░░░", ░ ░░░ ░░░░░ ░░░░░░░░░░░░░ ░░░ ░░ ░░░░░░░░░░ ░░░░░░(░░.);░░.;+++
1-МТС;"""░░░░░░ - ░░░░""; ░░░░░░░░░░ (░░░░░░░░░░) ░░░░░░░░░ ░░░░░░░░░░░░░░░░░░░░ ░░░░░░░, (░░░. ░░░.)";░░░. ░░░.;+++
1-МТС;"""░░░░░░ - ░░░░""; ░░░░░░░ ░░░░░ ░ ░░░ (░░░░░░░░░) ░░ ░░░░░ ░░░░, (░░.)";░░.;+++
1-МТС;"░░░░" ░ ░░░░░░ ░░░░░░░░░░░░░, ░ ░░░ ░░░░░ ░░░░░░░░░░░░░ ░░░ ░░ ░░░░░░░░░░ ░░░░░░(░░░. ░░░.);░░░. ░░░.;+++
1-МТС;"░░░░" ░ ░░░░░░ ░░░░░░░░░░░░░, ░ ░░░ ░░░░░ ░░░░░░░░░░░░░ ░░░ ░░ ░░░░░░░░░░ ░░░░░░(░░.);░░.;+++
1-МТС;"""░░░░"" ░ ░░░░░░ ░░░░░░░░░░░░░; ░░░░░░░░░░ (░░░░░░░░░░) ░░░░░░░░░ ░░░░░░░░░░░░░░░░░░░░ ░░░░░░░, (░░░. ░░░.)";░░░. ░░░.;+++
1-МТС;"""░░░░"" ░ ░░░░░░ ░░░░░░░░░░░░░; ░░░░░░░ ░░░░░ ░ ░░░ (░░░░░░░░░) ░░ ░░░░░ ░░░░, (░░.)";░░.;+++
As you can see, on the second line the field quotes and double quotes (as quoted quotes) have been removed.
Let's try to read this resulting file:
csvq -d ';' 'select * from test_trgt'
This will result an error: [L:1 C:15] data parse error in file … test_trgt.csv: line 2, column 21: unexpected " in field
.
I have tried any variants of -k
and -Q
options, but cannot make it read again correctly. The file is correct, field does not need to be quoted, but quote mark as a first field symbol seems to be breaking it.
In most Database and Statistics, we use stddev instead of stdev
I love csvq
and use it daily in my workflows. I’d really like to give back somehow. Have you perhaps a patreon account or similar with which I could compensate you for all the terrific work?
the following sql works on SQLite
SELECT sellerid, qty, round(1.0 * qty / (SUM(qty) OVER()),2) AS ratio_to_report FROM winsales
but got error on csvq
[L:1 C:51] syntax error: unexpected token "OVER"
have to change to
with stat as (
select NULLIF(SUM(qty),0) as qty_ttl from winsales
)
SELECT sellerid, qty, round(1.0 * qty / qty_ttl,2) AS ratio_to_report
FROM winsales, stat
salesid,dateid,sellerid,buyerid,qty,qty_shipped
30001,8/2/2003,3,b,10,10
10001,12/24/2003,1,c,10,10
10005,12/24/2003,1,a,30,
40001,1/9/2004,4,a,40,
10006,1/18/2004,1,c,10,
20001,2/12/2004,2,b,20,20
40005,2/12/2004,4,a,10,10
20002,2/16/2004,2,c,20,20
30003,4/18/2004,3,b,15,
30004,4/18/2004,3,b,20,
30007,9/7/2004,3,c,30,
Downloaded tar - ball of latest release
csvq-1.15.2.tar.gz
Go version on openSuse 15.3:
localhost:/public/linux/tmp # go version
go version go1.15.12 linux/amd64
-> Should be ok, since 1.14 listed as requirement
localhost:/public/linux/tmp/csvq-1.15.2 # ls
BINARY_CODE_LICENSE CHANGELOG.md csvq_env.json docs .gitignore go.mod go.sum help.go init_windows.go lib LICENSE main.go Makefile README.md testdata .travis.yml
-> Note: No build directory yet
localhost:/public/linux/tmp/csvq-1.15.2 # make
go build -trimpath -o /public/linux/tmp/csvq-1.15.2/build/bin/
go build github.com/mithrandie/csvq: build output "/public/linux/tmp/csvq-1.15.2/build/bin/" already exists and is a directory
make: *** [Makefile:22: build] Error 1
-> Checked that build/bin does exist and is empty
One of the reasons CSV/TSV are so widely used is that these are "line-oriented" formats, so that very often an unboundedly large file can be processed without taxing computer memory, either by processing one line at a time, or by crafting a suitable "reduction" framework.
There are some tools that allow a stream of JSON entities (as envisioned by the JSONLines specification) to be processed in much the same way as sed or awk process lines of text.
At present, csvq's -f option does not support JSONLines or any similar streaming option. Thus the utility of csvq for converting very large CSV/TSV files to a JSON format is severely limited.
(Other streaming options for JSON are described on the wikipedia page: https://en.wikipedia.org/wiki/JSON_streaming )
csvq is a useful tool. I am very grateful to csvq.
But I can not solve the following problem.
Because of the behavior of Relational Operators, comparison of single character ('1', 'T' or '0', 'F') does not work well.
In the table (sample.txt) below.
flg,value
0,value1
F,value2
K,value3
I do this query.
csvq "select * from sample where flg<>'F'"
The result does not contain "flg ='0'".
+-----+---------+
| flg | value |
+-----+---------+
| K | value3 |
+-----+---------+
Is there any solution or workaround?
I couldn't use csvq in interactive mode on git-bash because it isn't a real shell, so I went ahead and ran it on powershell.
But I ran into a few problems, like...
[Some of these problems seem really weird and I hope it might be isolated to just my computer, but looking for a confirmation]
1> Navigating using the arrow keys. Going backwards is not possible without using backspace.
2> Typing single or double quotes autocompleted the closing quote and I couldn't navigate my cursor out of the quote without deleting the closing quote using Ctrl+D and manually closing it myself later.
3> Not being able to run the previous command run in the interactive mode using the up arrow key
the following 3 sql do not work,
select year(SalesDate) as yr, count(*) from sales group by 1
select year(SalesDate) as yr, count(*) from sales group by yr
select year(SalesDate) as yr, count(*) from sales group by year(SalesDate)
this one works,
select yr, count(*) as cnt from (select year(SalesDate) as yr from sales) group by yr
btw, It's csvq is Postgres/MySQL/SQLite compatible, had better Postgres(most big data support it).
Is the top command supported in any way?
E.g. select top 10 * from myfile
When pasting an SQL query containing open parentheses into the interactive shell, csvq
automatically closes these parentheses, causing syntax errors.
For example, when pasting the following query into the interactive shell, csvq
appends a trailing closing parenthesis.
WITH RECURSIVE t (n)
AS (
SELECT 1
UNION ALL
SELECT n + 1
FROM t
WHERE n < 5
)
SELECT n FROM t;
After pasting into interactive shell:
/home/derek > WITH RECURSIVE t (n))
[L:1 C:21] syntax error: unexpected token ")"
/home/derek > AS ()
[L:1 C:3] syntax error: unexpected token "AS"
/home/derek > SELECT 1
+---+
| 1 |
+---+
| 1 |
+---+
/home/derek > UNION ALL
[L:1 C:5] syntax error: unexpected token "UNION"
/home/derek > SELECT n + 1
[L:1 C:12] field n does not exist
/home/derek > FROM t
[L:1 C:7] syntax error: unexpected token "FROM"
/home/derek > WHERE n < 5
[L:1 C:6] syntax error: unexpected token "WHERE"
/home/derek > )
[L:1 C:3] syntax error: unexpected token ")"
/home/derek > SELECT n FROM t;
[L:1 C:15] file t does not exist
As a minimal example, notice that after pasting the string "()" into the interactive shell, csvq
expands this string into "())":
/home/derek > ())
[OS] CentOS 7.7.1908
[go version] go1.13.3 linux/amd64
input file is sample.txt
case 1, case2 and case 3 should cause the same result, but case 2 does not.
case 1 (expected): $ cat sample.txt|csvq -n "select * where c1 like '%i686'"
+--------------------------------+
| c1 |
+--------------------------------+
| glibc-2.28-9.fc29.i686 |
| glibc-devel-2.28-9.fc29.i686 |
| glibc-headers-2.28-9.fc29.i686 |
| gpm-devel-1.20.7-6.el7.i686 |
| gpm-libs-1.20.7-6.el7.i686 |
+--------------------------------+
case 2 (unexpected): $ cat sample.txt|csvq -n "select * where c1 like '%i_86'"
+------------------------------+
| c1 |
+------------------------------+
| gpm-devel-1.20.7-6.el7.i686 |
+------------------------------+
case 3 (expected): $ cat sample.txt|csvq -n "select * where c1 like '%.i_86'"
+--------------------------------+
| c1 |
+--------------------------------+
| glibc-2.28-9.fc29.i686 |
| glibc-devel-2.28-9.fc29.i686 |
| glibc-headers-2.28-9.fc29.i686 |
| gpm-devel-1.20.7-6.el7.i686 |
| gpm-libs-1.20.7-6.el7.i686 |
+--------------------------------+
When receiving from standard input multiple CSV files having the same column headers, csvq
should automatically concatenate these files into a single CSV such that the query result is the union of all the input data rows. This would simplify the command required to import multiple CSV files from one which requires a complicated for-loop that strips all header rows but the first to a simpler command which requires only a single cat
command.
In the event that csvq
encounters a row in the input stream that differs from the first row (header) in the stream by at least one column, but not all columns, csvq
would fail and report an error. Any row which differs from the header in all columns and which is type compatible with the second row in the stream, csvq
would treat as a data row.
In the README.md
your example
csvq "select id, name from `user.csv`"
do not work in bash
because the back ticks are executed (bash
is looking to run user.csv
and capture its output).
My suggestion is to use single quotes:
csvq 'select id, name from `user.csv`'
Hello all!
I just discovered this tool and was poking around with it.
Seams like some queries were not yet implemented?
Say I have this input file:
$ cat input.txt
count,string
77766,in the
73529,for the
56800,of the
48059,on the
46528,to be
43250,to the
41130,thanks for
36993,at the
34185,i love
34015,going to
If I try to select only those rows with count
greater or equal to 56800
, I get this error:
$ cat input.txt | csvq -f csv "SELECT * FROM stdin WHERE count >= 56800"
[L:1 C:33] syntax error: unexpected token ">="
However, if I just grab everything, I do get the rows:
$ cat input.txt | csvq -f csv "SELECT * FROM stdin"
count,string
77766,in the
73529,for the
56800,of the
48059,on the
46528,to be
43250,to the
41130,thanks for
36993,at the
34185,i love
34015,going to
am I doing something wrong or it's just the case that this sort of queries were not yet implemented?
Version 1.13.3
create table winsales(
salesid,
dateid,
sellerid,
buyerid,
qty,
qty_shipped);
insert into winsales values(30001, '8/2/2003', 3, 'b', 10, 10);
got error
[L:1 C:49] field b does not exist
insert into winsales values(30001, '8/2/2003', 3, "b", 10, 10);
got
30001,0.00199700449326011,3,b,10,10
insert into winsales values(30001, "8/2/2003", 3, "b", 10, 10);
got what I want.
both are correct
select "3+5"
select '3+5'
Please add SQL as an output format to serve as input to SQLite and possibly other relational database engines like PostgreSQL and MySQL. The output for a given CSV file would begin with a CREATE TABLE
statement followed by one or more INSERT INTO
statements to recreate the contents of the CSV file as a database table in SQLite.
It would be nice to have a --dry-run
option, which would show results but no commit changes to actual file.
Please add a datetime function that can parse a date represented in a specified string format. This function would be the inverse function to DATETIME_FORMAT
and might accept a date string, a format string that describes the format of the date string, and return a datetime string or an integer that represents seconds or nanoseconds since January 1, 1970 UTC.
As I stated very briefly at the start of my LATERAL JOIN
test, my test slightly modified the example queries in PostgreSQL’s LATERAL JOIN. Here, I repeat my test using the original and unaltered queries:
$ cat orders.csv | csvq "SELECT * FROM STDIN"
+----+---------+----------------------------+
| id | user_id | created_at |
+----+---------+----------------------------+
| 1 | 1 | 2017-06-20 04:35:03.582895 |
| 2 | 2 | 2017-06-20 04:35:07.564973 |
| 3 | 3 | 2017-06-20 04:35:10.986712 |
| 4 | 1 | 2017-06-20 04:58:10.137503 |
| 5 | 3 | 2017-06-20 04:58:17.905277 |
| 6 | 3 | 2017-06-20 04:58:25.289122 |
+----+---------+----------------------------+
-- lateral_query_original1.sql
SELECT user_id, first_order_time, next_order_time, id FROM
(SELECT user_id, min(created_at) AS first_order_time FROM orders GROUP BY user_id) o1
LEFT JOIN LATERAL
(SELECT id, created_at AS next_order_time
FROM orders
WHERE user_id = o1.user_id AND created_at > o1.first_order_time
ORDER BY created_at ASC LIMIT 1)
o2 ON true;
-- lateral_query_original2.sql
SELECT user_id, first_order_time, next_order_time, id FROM
(SELECT user_id, min(created_at) AS first_order_time FROM orders GROUP BY user_id) o1
INNER JOIN LATERAL
(SELECT id, created_at AS next_order_time
FROM orders
WHERE user_id = o1.user_id AND created_at > o1.first_order_time
ORDER BY created_at ASC LIMIT 1)
o2 ON true;
$ cat orders.csv | csvq --source lateral_query_original1.sql
lateral_query_original1.sql [L:6 C:10] field user_id is ambiguous
$ cat orders.csv | csvq --source lateral_query_original2.sql
lateral_query_original2.sql [L:6 C:10] field user_id is ambiguous
In both cases, csvq
complains that the non-scoped or unaliased reference to user_id
in the second or lateral subquery in the FROM
clause is ambiguous. Assuming, possibly incorrectly, that the PostgreSQL SQL interpreter follows the SQL standard and the csvq
error is incorrect, in the presence of the same column reference in multiple adjacent subqueries in the same FROM
clause, when resolving the non-scoped reference to user_id
in the second subquery, csvq
should choose the instance from within the immediate enclosing scope.
It's my understanding that the -f JSON
option should output JSON, but consider:
csvq -i tsv "select * from g"
+-------------+----+------+----+-------+-----+-------------+----------+-------------+---------------+----------+
| # | ST | YR | CH | DNAME | DID | uid | NAME | DetailedPID | SimplifiedPID | |
+-------------+----+------+----+-------+-----+-------------+----------+-------------+---------------+----------+
| ak1995l001r | AK | 1995 | 9 | NULL | 004 | AK1995L001 | Robinson | NULL | D | K001803 |
+-------------+----+------+----+-------+-----+-------------+----------+-------------+---------------+----------+
$ csvq -f JSON -i tsv "select * from g"
panic: interface conversion: json.PathExpression is nil, not json.ObjectPath
Adding the -a command-line option makes no difference.
Adding the missing column name resolves the issue, but it would be more helpful if the error message was more informative, and it would be even more helpful if a default column name were (or could be) provided.
My apologies in advance if I've missed something in the documentation.
Thank you.
The --allow-uneven-fields
command-line option does not allow jagged input properly, as illustrated by:
cat test.tsv | csvq --allow-uneven-fields -i tsv 'select * from STDIN LIMIT 5;'
panic: runtime error: slice bounds out of range [:2] with capacity 1
where
$ cat test.tsv
a b c
1 2
3
4
The last two lines are both missing two fields.
Please implement LATERAL subqueries. According to Add LATERAL Joins or CROSS APPLY to Your SQL Tool Chain, SQL:1999 introduced "lateral derived tables". Many popular relational database systems support lateral subqueries:
APPLY
implements lateral subqueries.In PostgreSQL’s LATERAL JOIN, Krzysztof Kempiński explains how LATERAL subqueries work in PostgreSQL 9.3.
For the following recursive CTE query and CSV input file, csvq
incorrectly or inconsistently right justifies column TotalQuantity
in all but the first row of the result set, but correctly (or consistently) left justifies the numbers in the first two columns.
Should csvq
left or right justify numbers in the columns of a result set?
Query select-running-total-quantity.sql
:
WITH ordered_quantities AS (
SELECT *
FROM quantities
ORDER BY Timestamp
),
RECURSIVE total_quantities(Timestamp, Quantity, TotalQuantity) AS (
SELECT
Timestamp,
Quantity,
Quantity
FROM ordered_quantities
WHERE Timestamp = 1
UNION ALL
SELECT
q.Timestamp,
q.Quantity,
q.Quantity + tq.TotalQuantity
FROM quantities q
JOIN total_quantities tq
ON tq.Timestamp = q.Timestamp - 1
)
SELECT *
FROM total_quantities;
Input file quantities.csv
:
Timestamp,Quantity
1,123
2,67
3,23
4,6
5,89
6,27
7,105
Result set:
$ csvq --source select-running-total-quantity.sql
+-----------+----------+---------------+
| Timestamp | Quantity | TotalQuantity |
+-----------+----------+---------------+
| 1 | 123 | 123 |
| 2 | 67 | 190 |
| 3 | 23 | 213 |
| 4 | 6 | 219 |
| 5 | 89 | 308 |
| 6 | 27 | 335 |
| 7 | 105 | 440 |
+-----------+----------+---------------+
Notice that in column TotalQuantity
only the first row is left justified as are all of the numbers in columns Timestamp
and Quantity
.
More importantly, however, the good news the result of the query is correct!
Hi,
I have two files, one with variable number of fields, no header and delimited by '|' (psv), and another with two fields, delimited by ',' and with a header (csv).
I would like to join them and have I have experimented a little with
SELECT psv.c17 AS Code
, csv.c2 as Reason
FROM CSV('|',unl.psv
, AUTO, no_header) psv,
CSV(',',rejections.csv
) csv
WHERE psv.c17 = csv.c1
but is squeals
invalid argument for CSV: cannot be converted as a no-header value: no_header
I can of course use --no-header in the command line and it works in my example because csv.c2 can not be equal to psv.c17, but I wonder if this was possible.
greetings, el
This is an enhancement request.
The documentation says, "The stdin table loads data from pipe or redirection as a csv data."
I would like to specify the import format for standard input.
$ cat sample.tsv # TSV File
col1 col2
a b
aaaaaaaaaa b
$ od -t x1 sample.tsv
0000000 63 6f 6c 31 09 63 6f 6c 32 0a 61 09 62 0a 61 61
0000020 61 61 61 61 61 61 61 61 09 62 0a
0000033
$ csvq -f FIXED 'SELECT * FROM sample' # The format to be loaded is automatically determined by the file extension
col1 col2
a b
aaaaaaaaaa b
$ cat sample.tsv | csvq -i TSV -f FIXED 'SELECT * FROM STDIN' # For STDIN, the -i option is ignored.
col1 col2
a b
aaaaaaaaaa b
$ cat sample.tsv | tr "\t" , | csvq -f FIXED 'SELECT * FROM STDIN' # STDIN is loaded as a csv data
col1 col2
a b
aaaaaaaaaa b
$
Is it possible to have the (interactive) prompt in color?
Currently CSVQ:/Users/user> is not visibly different from ~/Users/user but if one could put the CSVQ of the prompt into a (bold) color it would be helpful in my view.
greetings, el
This bug will appear in v1.6.2.
There are following two ways to avoid it.
Specify an option "--color=false".
Run the SET FLAG statement in the pre-load file.
https://mithrandie.github.io/csvq/reference/command.html#configurations
e.g.
$ echo 'SET @@COLOR = false;' > $HOME/.csvqrc
Please implement datetime function LOCALTIME
that accepts a DATETIME
and an optional time zone and translates the DATETIME
(which should include a time zone suffix) to the corresponding datetime in the given time zone or in the default time zone if the time zone argument is absent. If the DATETIME
does not include a time zone suffix, LOCALTIME
would assume the default time zone.
For example, given the table of DATETIME
values
+---------------------------+
| date_and_time |
+---------------------------+
| 2019-01-17T16:00:00Z |
| 2019-01-29T03:57:00-05:00 |
+---------------------------+
the query
SELECT LOCALTIME(date_and_time, 'Asia/Tokyo')
would produce the result
+---------------------------+
| date_and_time |
+---------------------------+
| 2019-01-18T01:00:00+09:00 |
| 2019-01-29T17:57:00+09:00 |
+---------------------------+
SELECT LOCALTIME(date_and_time)
with command line option --timezone Asia/Tokyo
or statement SET @@TIMEZONE TO "Asia/Tokyo";
would produce the same result.
Note that SELECT LOCALTIME(date_and_time, 'UTC')
would be equivalent to SELECT UTC(date_and_time)
.
You can verify these datetime conversions using the Unix date command:
$ TZ=Asia/Tokyo date -f %Y-%m-%dT%H:%M:%S%z -j +%Y-%m-%dT%H:%M:%S%z 2019-01-17T16:00:00+0000
2019-01-18T01:00:00+0900
$ TZ=Asia/Tokyo date -f %Y-%m-%dT%H:%M:%S%z -j +%Y-%m-%dT%H:%M:%S%z 2019-01-29T03:57:00-0500
2019-01-29T17:57:00+0900
# github.com/mithrandie/csvq/lib/query go/src/github.com/mithrandie/csvq/lib/query/encode.go:77:9: assignment mismatch: 2 variables but csv.NewWriter returns 1 values go/src/github.com/mithrandie/csvq/lib/query/encode.go:149:10: assignment mismatch: 2 variables but fixedlen.NewWriter returns 1 values go/src/github.com/mithrandie/csvq/lib/query/encode.go:166:10: assignment mismatch: 2 variables but fixedlen.NewWriter returns 1 values
Please enable Discussions in the mithrandie/csvq GitHub site. This would allow discussion of topics that are not issues or to ask questions.
Csvq is a very interesting piece of software!
I'd like to report that from release 1.4.3 to 1.5.0, with the introduction of the new parsing routine for the delimiter character, it became impossible to use a delimiter definition like "\t"
on Windows. This might be because the Windows shell does not use backslash escaping - it does not even have a means of giving a tab character as a process argument other than entering the literal character itself (which can't be done on the command line, and which does not work with 1.5.0 either). Would it be possible to let csvq interpret backslash escaping again, like in 1.4.3, without breaking anything else?
Both types of calling csvq work fine with 1.4.3 on a tab-separated file:
csvq -d "\t" "select * from t limit 10"
csvq -d "<literal tab character>" "select * from t limit 10"
With 1.5.0, both fail.
Best regards - Werner.
Why does SELECT ''
result in NULL
rather than ''
?
$ csvq "SELECT '' AS col1"
+------+
| col1 |
+------+
| NULL |
+------+
For comparison, in both MySQL 5.7 and PostgreSQL 13, SELECT ''
produces ''
.
Please add a csvq
option for setting an SQL variable on the command line. This would be particularly useful for SQL queries specified using the --source
option. For example, the command
$ csvq --source select-query.sql --var a=1 --var b=2
would define SQL variables @a
and @b
and assign them the values 1 and 2, respectively, and would be equivalent to csvq
executing the statement VAR @a := 1, @b := 2;
before executing SQL source select-query.sql
. SQL source file select-query.sql
could then refer to these variables:
SELECT @a AS a, @b AS b;
The earlier csvq
example command would be equivalent to the query
VAR @a := 1, @b := 2;
SELECT @a AS a, @b AS b;
I could not find anything. While I really like the tool this is really a pain point for me.
Is there support planned?
In the CSV file that it creates, csvq
SQL statement CREATE TABLE
does not append a newline character to the header that it generates in the first row:
$ csvq --repository /tmp "CREATE TABLE test(a, b, c)"
file "/tmp/test" is created.
Commit: file "/tmp/test" is created.
$ cat /tmp/test
a,b,c$ rm /tmp/test
Note that the $
prompt immediately follows the last column name in the header instead of appearing on the next line because CREATE TABLE
does not append a newline character to the header row.
As a consequence of this issue, should a program other than csvq
append a data row to the new table without first appending a newline to the header (or prepending a newline to the data row), the data row will immediately follow the header on the same row:
$ csvq --repository /tmp "CREATE TABLE test(a, b, c)"
file "/tmp/test" is created.
Commit: file "/tmp/test" is created.
$ printf "1,2,3\n" >> /tmp/test
$ cat /tmp/test
a,b,c1,2,3
$ rm /tmp/test
A simple workaround to this issue is to append a newline to the new CSV file after CREATE TABLE
creates it, but before appending any data rows:
$ csvq --repository /tmp "CREATE TABLE test(a, b, c)"
file "/tmp/test" is created.
Commit: file "/tmp/test" is created.
$ echo >> /tmp/test
$ printf "1,2,3\n" >> /tmp/test
$ cat /tmp/test
a,b,c
1,2,3
$ rm /tmp/test
Another solution would be to prepend each new data row with a newline, but omit the trailing newline:
$ csvq --repository /tmp "CREATE TABLE test(a, b, c)"
file "/tmp/test" is created.
Commit: file "/tmp/test" is created.
$ printf "\n1,2,3" >> /tmp/test
$ printf "\n4,5,6" >> /tmp/test
$ cat /tmp/test
a,b,c
1,2,3
4,5,6$ rm /tmp/test
Consistent with CREATE TABLE
, SQL statement INSERT INTO
also follows this convention of prepending each data row with newline:
$ csvq --repository /tmp "CREATE TABLE test(a, b, c)"
file "/tmp/test" is created.
Commit: file "/tmp/test" is created.
$ csvq --repository /tmp "INSERT INTO test VALUES (1,2,3)"
1 record inserted on "/tmp/test".
Commit: file "/tmp/test" is updated.
$ csvq --repository /tmp "INSERT INTO test VALUES (4,5,6)"
1 record inserted on "/tmp/test".
Commit: file "/tmp/test" is updated.
$ cat /tmp/test
a,b,c
1,2,3
4,5,6$ rm /tmp/test
To avoid this issue altogether, instead of using CREATE TABLE
to create the CSV file and write the header row, we could "manually" create the file and write the header row with a trailing newline:
$ printf "a,b,c\n" > /tmp/test
$ printf "1,2,3\n" >> /tmp/test
$ cat /tmp/test
a,b,c
1,2,3
$ rm /tmp/test
There are several disadvantages to this manual approach:
The newline convention that csvq
SQL statements CREATE TABLE
and INSERT INTO
follows, while internally consistent, is not consistent with external shell commands.
In a SELECT UNION query, shouldn't csvq allow the use of the same table name in any or all of its component queries? For example, instead of reporting a duplicate table name error, shouldn't the following query calculate the union of the input rows?
Actual result:
$ printf "a,b\n1,2\n3,4\n" | csvq "SELECT * FROM STDIN UNION ALL SELECT * FROM STDIN"
[L:1 C:45] table name STDIN is a duplicate
Expected result:
$ printf "a,b\n1,2\n3,4\n" | csvq "SELECT * FROM STDIN UNION ALL SELECT * FROM STDIN"
+---+---+
| a | b |
+---+---+
| 1 | 2 |
| 3 | 4 |
| 1 | 2 |
| 3 | 4 |
+---+---+
$ csvq --version
csvq version 1.15.0
Note that this query worked in an earlier version of csvq, but I don't recall exactly which version.
In a simple table like this (filename m100a):
+----+----------+----------+----------+---------+------------+
| Nr | Name | Eintritt | Guthaben | Bezahlt | Kontostand |
+----+----------+----------+----------+---------+------------+
| 1 | Müller | 2005 | Ja | Ja | 1200 |
| 2 | Meier | 2004 | Nein | Ja | 800 |
| 3 | Hansen | 2005 | Ja | Nein | 123.45 |
| 4 | Lee | 2004 | Nein | Nein | -40.05 |
| 5 | Kubert | 2003 | Ja | Ja | 21.85 |
| 6 | Linder | 2005 | Nein | Nein | 81852 |
| 7 | Hubertus | 2005 | Ja | Ja | 1200 |
+----+----------+----------+----------+---------+------------+
I do
csvq "select * from m100a where Guthaben like ('N%') "
+----+--------+----------+----------+---------+------------+
| Nr | Name | Eintritt | Guthaben | Bezahlt | Kontostand |
+----+--------+----------+----------+---------+------------+
| 2 | Meier | 2004 | Nein | Ja | 800 |
| 4 | Lee | 2004 | Nein | Nein | -40.05 |
| 6 | Linder | 2005 | Nein | Nein | 81852 |
+----+--------+----------+----------+---------+------------+
which works, however if I add the same condition (or another) with an 'or' condition, then it does not:
csvq "select * from m100a where Guthaben like ('N%') or Guthaben like ('N%')"
Empty RecordSet
I was running csvq and got an out of memory error. Is there a way to avoid this making? This was the command:
./csvq -f csv -o santafe.csv 'select * from
datos_nomivac_covid19.csv where jurisdiccion_residencia="Santa Fe"'
The file was 1.6 GB.
I was using the las build https://github.com/mithrandie/csvq/releases/download/v1.15.1/csvq-v1.15.1-linux-386.tar.gz
Error trace:
runtime: out of memory: cannot allocate 4194304-byte block (3778379776 in use)
fatal error: out of memory
goroutine 33 [running]:
runtime.throw(0x84f84b4, 0xd)
runtime/panic.go:1117 +0x6a fp=0xc15e46a0 sp=0xc15e468c pc=0x807a5ba
runtime.(*mcache).refill(0xf1867538, 0x402)
runtime/mcache.go:164 +0x271 fp=0xc15e46c8 sp=0xc15e46a0 pc=0x805c4e1
runtime.(*mcache).nextFree(0xf1867538, 0x2, 0xffffffff, 0x7f, 0x0)
runtime/malloc.go:882 +0x79 fp=0xc15e46e4 sp=0xc15e46c8 pc=0x8053089
runtime.mallocgc(0x8, 0x84af820, 0x80b0a01, 0x7)
runtime/malloc.go:1069 +0x706 fp=0xc15e472c sp=0xc15e46e4 pc=0x8053926
runtime.newobject(0x84af820, 0x886019c)
runtime/malloc.go:1177 +0x2c fp=0xc15e4740 sp=0xc15e472c pc=0x8053abc
github.com/mithrandie/csvq/lib/value.glob..func1(0x8860190, 0x2)
github.com/mithrandie/csvq/lib/value/pool.go:9 +0x23 fp=0xc15e474c sp=0xc15e4740 pc=0x816ff63
sync.(*Pool).Get(0x8860190, 0xfc299ca8, 0xa)
sync/pool.go:148 +0x85 fp=0xc15e476c sp=0xc15e474c pc=0x80b0815
github.com/mithrandie/csvq/lib/value.getString(...)
github.com/mithrandie/csvq/lib/value/pool.go:32
github.com/mithrandie/csvq/lib/value.NewString(...)
github.com/mithrandie/csvq/lib/value/type.go:51
github.com/mithrandie/csvq/lib/query.readRecordSet.func1(0xc15ba280, 0x660949a9, 0x0, 0xc158c790, 0xc159a1fc, 0xc159a210)
github.com/mithrandie/csvq/lib/query/view.go:1082 +0x2d1 fp=0xc15e47d8 sp=0xc15e476c pc=0x83d1801
runtime.goexit()
runtime/asm_386.s:1315 +0x1 fp=0xc15e47dc sp=0xc15e47d8 pc=0x80a91c1
created by github.com/mithrandie/csvq/lib/query.readRecordSet
github.com/mithrandie/csvq/lib/query/view.go:1071 +0x129
goroutine 1 [semacquire, 1 minutes]:
sync.runtime_Semacquire(0xc159a218)
runtime/sema.go:56 +0x36
sync.(*WaitGroup).Wait(0xc159a210)
sync/waitgroup.go:130 +0x7c
github.com/mithrandie/csvq/lib/query.readRecordSet(0x85a531c, 0xc1427560, 0x85a1a24, 0xc15a0190, 0x660949a9, 0x0, 0x305, 0x305, 0xc142e8c8, 0x1, ...)
github.com/mithrandie/csvq/lib/query/view.go:1129 +0x1af
github.com/mithrandie/csvq/lib/query.loadViewFromCSVFile(0x85a531c, 0xc1427560, 0x85a3f18, 0xc159c2d8, 0xc158e180, 0x8050000, 0x85a6f14, 0xc158c770, 0xc15b5d0c, 0xc158c770, ...)
github.com/mithrandie/csvq/lib/query/view.go:960 +0x15f
github.com/mithrandie/csvq/lib/query.loadViewFromFile(0x85a531c, 0xc1427560, 0xc14a8320, 0x85a3f18, 0xc159c2d8, 0xc158e180, 0xffffffff, 0x2c, 0x0, 0x0, ...)
github.com/mithrandie/csvq/lib/query/view.go:860 +0xec
github.com/mithrandie/csvq/lib/query.cacheViewFromFile(0x85a531c, 0xc1427560, 0xc15a00a0, 0x85a6f14, 0xc158c710, 0x0, 0xffffffff, 0x2c, 0x0, 0x0, ...)
github.com/mithrandie/csvq/lib/query/view.go:834 +0x747
github.com/mithrandie/csvq/lib/query.loadObject(0x85a531c, 0xc1427560, 0xc15a00a0, 0x85a6f14, 0xc158c710, 0xc158c700, 0xc1584080, 0x15, 0x0, 0x0, ...)
github.com/mithrandie/csvq/lib/query/view.go:665 +0x2c7
github.com/mithrandie/csvq/lib/query.loadView(0x85a531c, 0xc1427560, 0xc15a00a0, 0x85a7374, 0xc158e120, 0x0, 0x0, 0x0, 0x0)
github.com/mithrandie/csvq/lib/query/view.go:337 +0xf4d
github.com/mithrandie/csvq/lib/query.LoadView(0x85a531c, 0xc1427560, 0xc15a00a0, 0xc159c258, 0x1, 0x1, 0x0, 0x0, 0x0, 0xffffffff)
github.com/mithrandie/csvq/lib/query/view.go:117 +0x16d
github.com/mithrandie/csvq/lib/query.selectEntity(0x85a531c, 0xc1427560, 0xc15a00a0, 0x85a72ac, 0xc15ba0c0, 0x0, 0x0, 0x0, 0x0)
github.com/mithrandie/csvq/lib/query/query.go:198 +0xe3
github.com/mithrandie/csvq/lib/query.Select(0x85a531c, 0xc1427560, 0xc148c460, 0x0, 0x0, 0x0, 0x85a72ac, 0xc15ba0c0, 0x0, 0x0, ...)
github.com/mithrandie/csvq/lib/query/query.go:126 +0x2c8
github.com/mithrandie/csvq/lib/query.(*Processor).ExecuteStatement(0xc1416ae0, 0x85a531c, 0xc1427560, 0x84df7a0, 0xc15a0050, 0x8479740, 0x859ab10, 0x0)
github.com/mithrandie/csvq/lib/query/processor.go:189 +0x8c8
github.com/mithrandie/csvq/lib/query.(*Processor).execute(0xc1416ae0, 0x85a531c, 0xc1427560, 0xc159c288, 0x1, 0x1, 0xffffffff, 0x0, 0x839e4bb)
github.com/mithrandie/csvq/lib/query/processor.go:94 +0x60
github.com/mithrandie/csvq/lib/query.(*Processor).Execute(0xc1416ae0, 0x85a531c, 0xc1427560, 0xc159c288, 0x1, 0x1, 0x0, 0x1, 0x0)
github.com/mithrandie/csvq/lib/query/processor.go:83 +0xd6
github.com/mithrandie/csvq/lib/action.Run(0x85a531c, 0xc1427560, 0xc1416ae0, 0xfc63e3db, 0x52, 0x0, 0x0, 0xc15840a0, 0x1e, 0x0, ...)
github.com/mithrandie/csvq/lib/action/run.go:59 +0x26d
main.main.func5(0x85a531c, 0xc1427560, 0xc1482160, 0xc1416ae0, 0x0, 0x0)
github.com/mithrandie/csvq/main.go:259 +0x112
main.commandAction.func1(0xc1482160, 0x0, 0x0)
github.com/mithrandie/csvq/main.go:332 +0x24e
github.com/urfave/cli.HandleAction(0x84703e0, 0xc140e770, 0xc1482160, 0x0, 0x0)
github.com/urfave/[email protected]/app.go:490 +0x63
github.com/urfave/cli.(*App).Run(0xc1401420, 0xc1416060, 0x6, 0x6, 0x0, 0x0)
github.com/urfave/[email protected]/app.go:264 +0x51c
main.main()
github.com/mithrandie/csvq/main.go:265 +0x1509
goroutine 7 [syscall, 1 minutes]:
os/signal.signal_recv(0x0)
runtime/sigqueue.go:168 +0x132
os/signal.loop()
os/signal/signal_unix.go:23 +0x1a
created by os/signal.Notify.func1.1
os/signal/signal.go:151 +0x33
goroutine 17 [chan receive, 1 minutes]:
main.commandAction.func1.2(0xc1456240, 0xc1514000, 0xc140e8f0)
github.com/mithrandie/csvq/main.go:317 +0x39
created by main.commandAction.func1
github.com/mithrandie/csvq/main.go:316 +0x1c1
goroutine 34 [chan send]:
github.com/mithrandie/csvq/lib/query.readRecordSet.func2(0x85a531c, 0xc1427560, 0xc159c2e0, 0x85a1a24, 0xc15a0190, 0x660949a9, 0x0, 0xc159a1fc, 0xc15ba280, 0xc159a210)
github.com/mithrandie/csvq/lib/query/view.go:1122 +0x69
created by github.com/mithrandie/csvq/lib/query.readRecordSet
github.com/mithrandie/csvq/lib/query/view.go:1099 +0x1a3
Does anyone know, whether csvq is provided via an rpm package?
Is there an rpm spec file available?
Similar to #28, but then for rpm based systems.
Please make csvq
available as a Nix package.
Would it be difficult to add another output format similar to TEXT but using Unicode drawing
characters, ie what the parameter "-box" does for sqlite3?
It's just for aesthetics :-)-O
greetings, el
CSVs I get typically have field names with spaces in them. Delimiting the fields references with double quotes kills errors but I always get an Empty RecordSet
. I also tried using backticks (ala MySQL), square brackets (ala SQL Server), and single quotes. Double quotes was the only one that didn't generate an error. But the SELECT query I tried always fails. If I replaces spaces in the header of the CSV and in the query with underscores everything works as expected. If underscores is a hard requirement then why don't double quotes generate an error? Or is this just a bug? BTW, love this tool! THANK YOU for creating it! 😃
Hello,
thank you for csvq!
Is it possible to add a feature to ignore unequal length of columns in a csv-file?
csvq -d ',' fields "test.csv"
results to
data parse error in file test.csv: line 2, column 0: wrong number of fields in line
It seems to be described here:
https://stackoverflow.com/questions/34008600/handle-a-specific-error-in-golang
Thank you and best regards,
csvq
1.11.6 date function DATETIME
reports the incorrect time zone "-04:00" instead of "-05:00" when casting string "2016-04-14 09:03:00" to a datetime:
$ date --iso-8601=seconds
2019-11-13T00:02:35-05:00
$ date '+%z'
-0500
$ csvq "SELECT DATETIME('2016-04-14 09:03:00') AS DateTime"
+----------------------------+
| DateTime |
+----------------------------+
| 2016-04-14T09:03:00-04:00 |
+----------------------------+
$ csvq --datetime-format "%Y-%m-%d %H:%i" "SELECT DATETIME('2016-04-14 09:03') AS DateTime"
+----------------------------+
| DateTime |
+----------------------------+
| 2016-04-14T09:03:00-04:00 |
+----------------------------+
However, date function NOW
reports the correct time zone:
$ csvq "SELECT NOW() AS Now"
+-----------------------------------+
| Now |
+-----------------------------------+
| 2019-11-13T00:09:01.273649-05:00 |
+-----------------------------------+
DATETIME
does not seem to recognize that during Daylight Saving Time (DST), the time zone in Montreal is Eastern Daylight Time (EDT) or "-04:00", but outside DST, the time zone is Eastern Standard Time (EST) or "-05:00".
Note that I tested this on Ubuntu 18.04.3 LTS in Windows Subsystem for Linux on Windows 10 and Ubuntu Server 18.04.3 LTS in a virtual machine in VirtualBox:
$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.3 LTS"
Given a string that contains multiple lower case names (words), how might I capitalize only the initial letter of these names?
I think that the existing set of string functions in csvq cannot perform this operation. However, this can be done by applying a PCRE2 regular expression substitution which converts all initials to upper case.
Unfortunately, according to Regular Expressions 101, Golang does not support the PCRE2 substitution operator \u
, so I presume that the new regular expression functions REGEXP_REPLACE
in csvq v1.14.1-pr.1 likewise also cannot perform this operation. If this is the case, would you consider implementing a specific string function or generic set of string functions to capitalize all words in a string?
ORDER BY
sort is not stable meaning that for rows that have the same sort key, ORDER BY
in csvq
does not preserve the order of these input rows in the output. This is a problem, for example, for applications that involve financial transactions that require the temporal order of transactions to be preserved.
$ cat test.csv | sort --field-separator=, --key=1g > test-sort-unstable-unix.csv
$ cat test.csv | sort --field-separator=, --key=1g --stable > test-sort-stable-unix.csv
$ cat test.csv | mlr --csv sort -f timestamp > test-sort-mlr.csv
$ cat test.csv | csvq --format CSV "SELECT * FROM STDIN ORDER BY timestamp" > test-sort-csvq.csv
$ diff --brief --report-identical-files test-sort-stable-unix.csv test-sort-unstable-unix.csv
Files test-sort-stable-unix.csv and test-sort-unstable-unix.csv differ
$ diff --side-by-side --brief --report-identical-files test-sort-stable-unix.csv test-sort-mlr.csv
Files test-sort-stable-unix.csv and test-sort-mlr.csv are identical
$ diff --side-by-side --brief --report-identical-files test-sort-stable-unix.csv test-sort-csvq.csv
Files test-sort-stable-unix.csv and test-sort-csvq.csv differ
$ diff --side-by-side test-sort-stable-unix.csv test-sort-unstable-unix.csv
.
.
.
1463273940,B,0.04629056 <
1463273940,B,0.0462813 1463273940,B,0.0462813
> 1463273940,B,0.04629056
.
.
.
$ diff --side-by-side test-sort-stable-unix.csv test-sort-unstable-unix.csv
.
.
.
1454175635.753,J,-0.17188999 <
1454175635.753,J,-0.35 1454175635.753,J,-0.35
1454175635.753,J,-0.5 <
1454175635.753,J,-0.64773132 1454175635.753,J,-0.64773132
> 1454175635.753,J,-0.5
> 1454175635.753,J,-0.17188999
.
.
.
1463273940,B,0.04629056 <
1463273940,B,0.0462813 1463273940,B,0.0462813
> 1463273940,B,0.04629056
.
.
.
$
Note that sort --stable
from GNU coreutils and mlr sort
generate sort results that are stable, but ORDER BY
in csvq
changes the order of rows having the same key.
(Input data is available in test.csv
.)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.