csv handle by Linux
echo "column_a,column_b,column_c,column_d
1,foo,10,bar
2,baz,20,qux
3,quux,30,corge
4,grault,40,garply
" > output.csv
column --separator "," --table output.csv
Command Pipeline
# sum the values in column_a and column_c, exclude column_b, and output the result.
# The pipeline reads the CSV file, processes it to sum `column_a` and `column_c`, excludes `column_b`, and adds a new header row to the output. This approach is efficient for processing large CSV files as it uses stream processing without creating intermediate files.
tail --lines=+2 output.csv | awk --field-separator ',' '{sum=$1+$3; print sum","$4}' OFS="," | sed '1i\sum_a_plus_c,column_d'
Pretty
tail --lines=+2 output.csv | awk --field-separator ',' '{sum=$1+$3; print sum","$4}' OFS="," | sed '1i\sum_a_plus_c,column_d' | column --separator "," --table
Here's a breakdown of what each command does:
-
tail -n +2 data.csv
: This command skips the header row of the CSV file.tail -n +2
starts reading from the second line, effectively removing the header. -
awk -F, '{sum=$1+$3; print sum","$4}' OFS=,
: This command usesawk
to process each line of the CSV file.-F,
sets the field separator to a comma.{sum=$1+$3; print sum","$4}
calculates the sum ofcolumn_a
($1
) andcolumn_c
($3
), and prints the result along withcolumn_d
($4
).OFS=,
sets the Output Field Separator to a comma, ensuring the output remains in CSV format.
-
sed '1i\sum_a_plus_c,column_d'
: This command usessed
to insert a new header row. The1i
indicates that the line should be inserted before the first line.
# e.g.
diff file1.csv file2.csv
# e.g.
diff <(sort -t, -k1,1 file1.csv) <(sort -t, -k1,1 file2.csv)
# or
comm -3 <(sort -t, -k1,1 file1.csv) <(sort -t, -k1,1 file2.csv)
Transform each file, Sort the results and Compare the sorted results
comm -3 <(tail --lines=+2 file1.csv | awk -F ',' '{sum=$1+$3; print sum","$4}' | sed '1i\sum_a_plus_c,column_d' | sort) \
<(tail --lines=+2 file2.csv | awk -F ',' '{sum=$1+$3; print sum","$4}' | sed '1i\sum_a_plus_c,column_d' | sort)
tail --lines=+2 file1.csv
: Skip the header row infile1.csv
.awk -F ',' '{sum=$1+$3; print sum","$4}'
: Compute the sum of the first and third columns, then print the sum and the fourth column, separated by a comma.sed '1i\sum_a_plus_c,column_d'
: Add a new header row with the specified column names.sort
: Sort the output to ensure it's in the correct order for comparison.- Repeat the same transformation for
file2.csv
. comm -3
: Compare the sorted results, showing lines unique to each file.
CREATE TABLE information (
id SERIAL PRIMARY KEY,
sum_a_plus_c INT,
column_d INT
);
cat data.csv | psql --dbname=postgres://postgres:123456@localhost:5432/test \
--command "COPY information(sum_a_plus_c, column_d) FROM STDIN WITH (FORMAT csv, DELIMITER ',', HEADER true);"
psql --dbname=postgres://postgres:123456@localhost:5432/test \
--command "select * from information where id >= 1;" --csv
or
psql --dbname=postgres://postgres:123456@localhost:5432/test \
--command "COPY (select * from information where id >= 1) TO STDOUT WITH CSV DELIMITER ','"