Coder Social home page Coder Social logo

artyom-poptsov / guile-dsv Goto Github PK

View Code? Open in Web Editor NEW
16.0 4.0 3.0 509 KB

Delimiter-separated values (DSV) format parser for GNU Guile.

License: GNU General Public License v3.0

Emacs Lisp 0.33% Makefile 7.97% Shell 6.16% Scheme 70.93% M4 14.61%
guile scheme lisp dsv csv rfc-4180 parser

guile-dsv's Introduction

Guile-DSV

https://github.com/artyom-poptsov/guile-dsv/actions/workflows/guile2.2.yml/badge.svg https://github.com/artyom-poptsov/guile-dsv/actions/workflows/guile3.0.yml/badge.svg https://github.com/artyom-poptsov/guile-dsv/actions/workflows/guix.yml/badge.svg

Guile-DSV is a GNU Guile module for working with the delimiter-separated values (DSV) data format.

Guile-DSV supports the Unix-style DSV format and RFC 4180 format.

Also Guile-DSV ships with a program named dsv (source code is here: utils/dsv.in) that allows to read and process DSV format (including delimiter change and conversion from one standard to another.)

Note that if you want to use Guile-DSV from an environment where syslog is unavailable, then you must set the log-driver option for dsv->scm to “file” or “none” to prevent it from trying to log messages to the syslog. See the Texinfo documentation for details.

Requirements

Build dependencies

  • Texinfo (contains makeinfo tool that is required for making the documentation in Texinfo format)
  • Texlive (also is needed for documentation.)
  • help2man

Installation

GNU Guix

$ guix install guile-dsv

Manual

$ git clone https://github.com/artyom-poptsov/guile-dsv.git
$ cd guile-dsv
$ autoreconf -vif
$ ./configure --prefix=/usr
$ make -j$(nproc)
$ sudo make install

For a basic explanation of the installation of the package, see the INSTALL file.

Please note that you will need Automake 1.12 or later to run self-tests with make check (but the library itself can be built with older Automake version such as 1.11).

important You probably want to call configure with the --with-guilesitedir option so that this package is installed in Guile’s default path. But, if you don’t know where your Guile site directory is, run configure without the option, and it will give you a suggestion.

dsv tool

Options

$ dsv --help
Usage: dsv [options] [file]

The default behavior of the program is to print a formatted table from a
<file> to stdout.  The options listed below can be used to change or modify
this behavior.

When no <file> is provided, dsv reads data from stdin.

Options:
  --help, -h                 Print this message and exit.
  --summary, -s              Print summary information for a file.
  --delimiter, -D <delim>    Set a delimiter.
  --guess-delimiter, -d      Guess a file delimiter and print the result.
  --number, -n               Number rows and columns.
  --width, -w <width>        Wrap long lines of text inside cells to fit the table
                             into the specified width.  If with is specified as
                             "auto" (default value) then current terminal width
                             is used.
                             When the required width is too small for the table
                             wrapping, an error will be issued.
                             Zero width means no wrapping so the table might not
                             fit into the screen.
  --map-cell, -m <code>      Apply an arbitrary Scheme code on each cell value
                             before printing.
                             There are three variables that can be used in the code:
                             - $value -- current cell value.
                             - $row   -- current row number
                             - $col   -- current column number.

                             Code examples:
                             '(if (> $value 0) $value 0)'
                             '(string-append "\"" $value "\"")'

                             Note that the code must return a string, that in turn
                             will be printed in a cell.

  --filter-row, -f <code>    Keep only rows for which CODE returns #t.
                             There are two variables that can be used in the code:
                             - $value -- current row content.
                             - $row   -- current row number.

                             For example with this code Guile-DSV keeps only rows
                             that are 5 columns in length:
                             '(= (length $value) 5)'

  --filter-column, -c <procedure>
                             Keep only columns for which PROCEDURE returns #t.
                             There are two variables that can be used in the code:
                             - $value -- current column content as a list.
                             - $row   -- current column number.

                             For example with this code Guile-DSV keeps only the 2nd
                             column from the input data:
                              '(= $col 2)'

  --file-format, -F <fmt>    Set a file format.  Possible formats are:
                             "unix" (default), "rfc4180"
  --with-header, -H          Use the first row of a table as a header when
                             printing the table to the screen.
  --table-borders, -b <spec> Set table borders for printing.  The value can be
                             either a borders specification or a preset name.

                             Spec can be a comma-separated list of key=value
                             pairs that specify the table style.  The list of
                             possible keys can be found below
                             (see "Table parameters".)

                             Also a table preset name can be used as the value.
                             See "Table presets" below.

                             Table preset parameters can be overridden by specifying
                             extra parameters after the preset name.  E.g.:
                               "graphic,bs=3;31"

                             Example values:
                               - "v=|,h=-,j=+"
                               - org

  --table-presets-path <path>
                             Set the table preset path.
                             This option can be also set by
                              "GUILE_DSV_TABLE_PRESETS_PATH" environment
                             variable.
                             Default value: /gnu/store/448pzfcwaaa8smrrdbn1shmk45s7agwx-guile-dsv-git/share/guile-dsv/presets/
  --to, -t <fmt>             Convert a file to a specified format, write
                             the result to stdout.
  --to-delimiter, -T <delim> Convert delimiters to the specified variant.
                             When this option is not used, default delimiters
                             for the chosen output format will be used.
  --version                  Print information about Guile-DSV version.
  --debug                    Enable state machine debugging.

Table parameters:
  bt   border-top                The top border.
  btl  border-top-left           The top left corner.
  btr  border-top-right          The top right corner.
  btj  border-top-joint          The top border joint.
  bl   border-left               The left table border.
  blj  border-left-joint         The left table border joint.
  br   border-right              The right table border.
  brj  border-right-joint        The right table border joint.
  bb   border-bottom             The bottom border.
  bbl  border-bottom-left        The left corner of the bottom border.
  bbr  border-bottom-right       The right corner of the bottom border.
  bbj  border-bottom-joint       The bottom border joint.
  bs   border-style              The style of the borders ("fg;bg".)
  ts   text-style                The text style ("fg;bg".)
  s    shadow                    The table shadow.
  so   shadow-offset             The table shadow offset in format "x;y" (e.g. "2;2".)
  ss   shadow-style              The style of the shadow ("fg;bg".)
  rs   row-separator             The table row separator.
  rj   row-joint                 The row joint.
  cs   column-separator          The table column separator
  hs   header-style              The header style ("fg;bg".)
  ht   header-top                The header top border.
  htl  header-top-left           The header top left border.
  htr  header-top-right          The header top right border.
  htj  header-top-joint          The header top joint.
  hl   header-left               The header left border.
  hr   header-right              The header right border.
  hcs  header-column-separator   The header column separator.
  hb   header-bottom             The header bottom border.
  hbl  header-bottom-left        The header bottom left corner.
  hbr  header-bottom-right       The header bottom right border.
  hbj  header-bottom-joint       The header bottom joint.

Table presets:
  ascii
  graphic-bold
  graphic-double
  graphic
  graphic-with-shadow
  markdown
  org

Print DSV files

To show DSV files (Unix-style) in human-readable manner, just invoke the tool like this:

$ head -4 /etc/passwd | dsv
 root    x  0  0  root    /root      /bin/bash         
 daemon  x  1  1  daemon  /usr/sbin  /usr/sbin/nologin 
 bin     x  2  2  bin     /bin       /usr/sbin/nologin 
 sys     x  3  3  sys     /dev       /usr/sbin/nologin

Show a DSV file as a fancy table with custom borders:

$ head -4 /etc/passwd | dsv -b "rs=-,cs=|,rj=+"
 root   | x | 0 | 0 | root   | /root     | /bin/bash         
--------+---+---+---+--------+-----------+-------------------
 daemon | x | 1 | 1 | daemon | /usr/sbin | /usr/sbin/nologin 
--------+---+---+---+--------+-----------+-------------------
 bin    | x | 2 | 2 | bin    | /bin      | /usr/sbin/nologin 
--------+---+---+---+--------+-----------+-------------------
 sys    | x | 3 | 3 | sys    | /dev      | /usr/sbin/nologin

The same output but with box-drawing characters:

$ head -4 /etc/passwd | dsv -b "rs=─,cs=│,rj=┼"
 root   │ x │ 0 │ 0 │ root   │ /root     │ /bin/bash         
────────┼───┼───┼───┼────────┼───────────┼───────────────────
 daemon │ x │ 1 │ 1 │ daemon │ /usr/sbin │ /usr/sbin/nologin 
────────┼───┼───┼───┼────────┼───────────┼───────────────────
 bin    │ x │ 2 │ 2 │ bin    │ /bin      │ /usr/sbin/nologin 
────────┼───┼───┼───┼────────┼───────────┼───────────────────
 sys    │ x │ 3 │ 3 │ sys    │ /dev      │ /usr/sbin/nologin

Table presets

There are table presets that can be used to draw tables with specified border styles. Some examples:

ascii

$ echo -e "a,b,c\na1,b1,c1\na2,b2,c2\n" | dsv -b "ascii"
.--------------.
| a  | b  | c  |
|----+----+----|
| a1 | b1 | c1 |
|----+----+----|
| a2 | b2 | c2 |
'--------------'

$ echo -e "a,b,c\na1,b1,c1\na2,b2,c2\n" | dsv -b "ascii" --with-header
.--------------.
| a  | b  | c  |
|====+====+====|
| a1 | b1 | c1 |
|----+----+----|
| a2 | b2 | c2 |
'--------------'

graphic

$ echo -e "a,b,c\na1,b1,c1\na2,b2,c2\n" | dsv -b "graphic"
┌────┬────┬────┐
│ a  │ b  │ c  │
├────┼────┼────┤
│ a1 │ b1 │ c1 │
├────┼────┼────┤
│ a2 │ b2 │ c2 │
└────┴────┴────┘
$ echo -e "a,b,c\na1,b1,c1\na2,b2,c2\n" | dsv -b "graphic-bold"
┏━━━━┳━━━━┳━━━━┓
┃ a  ┃ b  ┃ c  ┃
┣━━━━╋━━━━╋━━━━┫
┃ a1 ┃ b1 ┃ c1 ┃
┣━━━━╋━━━━╋━━━━┫
┃ a2 ┃ b2 ┃ c2 ┃
┗━━━━┻━━━━┻━━━━┛
$ echo -e "a,b,c\na1,b1,c1\na2,b2,c2\n" | dsv -b "graphic-double"
╔════╦════╦════╗
║ a  ║ b  ║ c  ║
╠════╬════╬════╣
║ a1 ║ b1 ║ c1 ║
╠════╬════╬════╣
║ a2 ║ b2 ║ c2 ║
╚════╩════╩════╝

org

This is the preset that allows to generate org-mode tables from CSV/DSV data.

$ echo -e "a,b,c\na1,b1,c1\na2,b2,c2\n" | dsv -b "org"
| a  | b  | c  |
| a1 | b1 | c1 |
| a2 | b2 | c2 |
$ echo -e "a,b,c\na1,b1,c1\na2,b2,c2\n" | dsv -b "org" --with-header
| a  | b  | c  |
|----+----+----|
| a1 | b1 | c1 |
| a2 | b2 | c2 |

Guessing the delimiter for a file

$ dsv -d /etc/passwd
:

Getting the summary for a CSV/DSV file

$ dsv -s /etc/passwd
File:      /etc/passwd
Format:    unix
Delimiter: ':' (0x3a)
Records:   50

column       width       
1            19          
2            1           
3            5           
4            5           
5            34          
6            26          
7            17

Converting files between formats

From Unix DSV to RFC4180:

$ dsv -t rfc4180 /etc/passwd | head -4
root,x,0,0,root,/root,/bin/bash
daemon,x,1,1,daemon,/usr/sbin,/usr/sbin/nologin
bin,x,2,2,bin,/bin,/usr/sbin/nologin
sys,x,3,3,sys,/dev,/usr/sbin/nologin

Convert delimiters:

$ dsv -t unix -T "|" /etc/passwd | head -4
root|x|0|0|root|/root|/bin/bash
daemon|x|1|1|daemon|/usr/sbin|/usr/sbin/nologin
bin|x|2|2|bin|/bin|/usr/sbin/nologin
sys|x|3|3|sys|/dev|/usr/sbin/nologin

Apply an arbitrary Scheme code to each cell of a table

Wrap each table value in double quotes:

dsv -m '(string-append "\"" $value "\"")' /etc/group

Table filtering

Remove 2nd row from a table:

$ dsv -f '(not (= $row 1))' /etc/passwd

Remove 2nd column from a table:

$ dsv -f '(not (= $col 1))' /etc/passwd

guile-dsv's People

Contributors

a-sassmannshausen avatar artyom-poptsov avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

guile-dsv's Issues

If a rfc4180 formatted CSV file has lines that end with a '"' (quotation mark), the parser goes funny

To reproduce:

  • last field quotation with closing new line makes parser explode
    (dsv-string->scm "test,hello,\"blah\"\r\n" #:format 'rfc4180)

  • last field quotation with now closing new line works fine
    (dsv-string->scm "test,hello,\"blah\"" #:format 'rfc4180)

  • last field in any non-final line with quotation causes parser errors
    (dsv-string->scm "test,hello,\"blah\"\r\nincorrect,field,parsing" #:format 'rfc4180)

I imagine it is an issue to do with end of line parsing…

Best wishes,

Alex

dsv-0.7.0 breaks tests when using guix build

Hi,

I recently started to use your library in one of my applications to parse csv files. It's been working flawlessly until recently when I did a guix pull and got 0.7.0. After that my tests for my application started failing when running them via guix build -f guix.scm. They work fine if I just run make check manually.

I checked the log for when the tests fail and this is the backtrace it gave me:

/gnu/store/7rscxhk9gzshkn6bq4nfrl9l6bp67w18-inetutils-2.3/bin/logger: cannot connect: No such file or directory
Backtrace:
           7 (primitive-load-path "tests/transaction.scm")
In ice-9/eval.scm:
   293:34  6 (_ #(#(#<directory (tests grade-transaction) 7ffff7…>) …))
In grade/transaction.scm:
    50:14  5 (parse-rbc-transactions #<input: string 7fffee8ac380> # _)
In dsv/rfc4180.scm:
   114:19  4 (dsv->scm _ #:debug-mode? _ #:delimiter _)
In smc/fsm.scm:
   465:31  3 (_ #<fsm current-state: read_first_field_first_char st…> …)
In ice-9/boot-9.scm:
   260:13  2 (for-each #<procedure 7ffff7533c30 at smc/core/log.scm…> …)
   260:13  1 (for-each #<procedure 7ffff7533c00 at smc/core/log.scm…> …)
In smc/core/log.scm:
    165:6  0 (_ _ _ _ _)

smc/core/log.scm:165:6: Could not log a message

In my parse-rbc-transaction procedure I'm using dsv as follows: (dsv->scm port #:format 'rfc4180)

If I understand it correctly guile-smc (which is a new dependency in 0.7.0) is trying to write to the syslog and failing. Which is expected as it is not setup when running via guix build. I checked the source code and dsv should only setup the logger if key argument #:log-driver for dsv->scm is not #f (making this assumption from this line in dsv.scm). And it is #f by default.

Is this a bug or am I missing something? Expected behavior is to not write anything to syslog or any logger for that matter.

Thanks

Distribution in Guix

Hi Artyom,

I'm keen to get guile-dsv to be part of the Guix package manager. I would like to propose a patch to it including the changes I've proposed to you in my 2 pull requests.

I'm happy to wait for a bit until you've had a chance to check things out and for us to have this discussion. Alternatively, if you have moved on from this project, I'd be happy to create a recipe from my fork of the project.

What do you think?

Best wishes,

Alex (atheia)

When a DSV field in 'rfc4180 format contains escaped quotes followed by a ',' the parser errors out.

Hi Artyom,

Love the library — use it all the time and it has proved incredibly robust. I have finally spotted what I think is a bug: when a dsv string or file contains field which contains an escaped quotation mark section immediately followed by a ',' the parser errors out with:

Throw to key `dsv-parser-error' with args `("A field contains unescaped double-quotes" (#<dsv-parser port: #<input: string 7f598e30da80> type: rfc4180 delim: #\, 7f598d41c0c0> "\"test \"\"This contains double parens\"\""))'.

In dsv/rfc4180.scm:
    298:4  0 (fsm-validate #:dsv-list _ #:buffer _ #:field-buffer _ #:record _)

The test case to generate the above was:

(dsv-string->scm
 "test,\" hello\",\"this is, a comma test\",\"test \"\"This contains double parens\"\" , edurneadu derunin\""
 #:format 'rfc4180)

The minimal test case seems to be:

;; An rfc4180 row with a single field consisting of a single quotation mark
;; followed immediately by a comma
(dsv-string->scm "\"\"\",\"" #:format 'rfc4180)
;; Adding a single space causes correct parsing:
(dsv-string->scm "\"\"\" ,\"" #:format 'rfc4180)

Hope this is a helpful bug report — any idea what might be going on here?

Best wishes,
Alex

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.