Coder Social home page Coder Social logo

jonike / rainbow_csv Goto Github PK

View Code? Open in Web Editor NEW

This project forked from mechatroner/rainbow_csv

0.0 3.0 0.0 1.42 MB

rainbow_csv: vim plugin for highlighting columns in csv/tsv files and executing SELECT and UPDATE queries in SQL-like language

License: MIT License

Vim Script 24.38% Python 70.44% JavaScript 2.84% Shell 2.33%

rainbow_csv's Introduction

Overview

Rainbow CSV has 2 main features:

  • Highlights csv columns in different rainbow colors.
  • Provides SELECT and UPDATE queries in RBQL: SQL-like transprogramming query language.

There are 2 ways to enable csv columns highlighting:

  1. CSV autodetection based on file content. File extension doesn't have to be .csv or .tsv
  2. Manual CSV delimiter selection with :RainbowDelim command (So you can use rainbow_csv for non-table files, e.g. to highlight function arguments in different colors)

To run an RBQL query either press F5 or enter the query in vim command line e.g. :Select a1, a2

Demonstration of rainbow_csv highlighting and RBQL queries

demo_screencast

The demo table is demo/movies.tsv. There are also some other test datasets in python/test_datasets. In this demo python expressions were used, but JavaScript is also available.

RBQL (RainBow Query Language) Description

RBQL is a technology which provides SQL-like language that supports SELECT and UPDATE queries with Python or JavaScript expressions.

Main Features

  • Use Python or Java Script expressions inside SELECT, UPDATE, WHERE and ORDER BY statements
  • Output entries appear in the same order as in input unless ORDER BY is provided.
  • Input csv/tsv table may contain varying number of entries (but select query must be written in a way that prevents output of missing values)
  • Result set of any query immediately becomes a first-class table on it's own.

Supported SQL Keywords (Keywords are case insensitive)

  • SELECT [ TOP N ] [ DISTINCT [ COUNT ] ]
  • UPDATE [ SET ]
  • WHERE
  • ORDER BY ... [ DESC | ASC ]
  • [ [ STRICT ] LEFT | INNER ] JOIN

Keywords rules

All keywords have the same meaning as in SQL queries. You can check them online But there are also two new keywords: DISTINCT COUNT and STRICT LEFT JOIN:

  • DISTINCT COUNT is like DISTINCT, but adds a new column to the "distinct" result set: number of occurences of the entry, similar to uniq -c unix command.
  • STRICT LEFT JOIN is like LEFT JOIN, but generates an error if any key in left table "A" doesn't have exactly one matching key in the right table "B".

Some other rules:

  • UPDATE SET is synonym to UPDATE, because in RBQL there is no need to specify the source table.
  • UPDATE has the same semantic as in SQL, but it is actually a special type of SELECT query.
  • JOIN statements must have the following form: <JOIN_KEYWORD> (/path/to/table.tsv | table_name ) ON ai == bj

Special variables

Variable Name Variable Type Variable Description
a1, a2,..., a{N} string Value of i-th column
b1, b2,..., b{N} string Value of i-th column in join table B
NR integer Line number (1-based)
NF integer Number of fields in line

Examples of RBQL queries

With Python expressions

  • select top 100 a1, int(a2) * 10, len(a4) where a1 == "Buy" order by int(a2)
  • select * order by random.random() - random sort, this is an equivalent of bash command sort -R

With JavaScript expressions

  • select top 100 a1, a2 * 10, a4.length where a1 == "Buy" order by parseInt(a2)
  • select * order by Math.random() - random sort, this is an equivalent of bash command sort -R

Plugin description

Rainbow highlighting for non-table files

You can use rainbow highlighting and RBQL even for non-csv/tsv files. E.g. you can highlight records in log files, one-line xmls and other delimited records. You can even highlight function arguments in your programming language using comma as a delimiter for :RainbowDelim command. And you can always turn off the rainbow highlighting using :NoRainbowDelim command.

Here is an example of how to extract some fields from a bunch of uniform single-line xmls:

demo_xml_screencast

Mappings

Key Action
<Leader>d (\d) Print info about current column (under the cursor)
F5 Start query editing for the current csv file
F5 Execute currently edited query

Commands

:Select ...

Allows to enter RBQL select query as vim command. e.g. :Select a1, a2 order by a1

:Update ...

Allows to enter RBQL update query as vim command. e.g. :Update a1 = a1 + " " + a2

:RainbowDelim

Mark current file as a table and highlight it's columns in rainbow colors. Character under the cursor will be used as a delimiter. The delimiter will be saved in a config file for future vim sessions.

You can also use this command for non-csv files, e.g. to highlight function arguments in source code in different colors. To return back to original syntax highlighting run :NoRainbowDelim

:RainbowDelimQuoted

Same as :RainbowDelim but allows delimiters inside fields if the field is double quoted by rules of Excel / RFC 4180

:RainbowMonoColumn

Mark the current file as rainbow table with a single column without delimiters. You will be able to run RBQL queries on it using a1 column variable.

:NoRainbowDelim

This command will disable rainbow columns highlighting for the current file. Use it to cancel :RainbowDelim, :RainbowDelimQuoted and :RainbowMonoColumn effects or when autodection mechanism has failed and marked non-table file as a table

:RainbowName <name>

Assign any name to the current table. You can use this name in join operation instead of the table path. E.g.

JOIN customers ON a1 == b1

intead of:

JOIN /path/to/my/customers/table ON a1 == b1

Configuration

g:rbql_output_format

Default: tsv Allowed values: tsv, csv

Format of RBQL result set tables.

  • tsv format doesn't allow quoted tabs inside fields.
  • csv is Excel-compatible and allows quoted commas.

Essentially format here is a pair: delimiter + quoting policy. This setting for example can be used to convert files between tsv and csv format:

  • To convert csv to tsv: 1. open csv file. 2. :let g:rbql_output_format='tsv' 3. :Select *
  • To convert tsv to csv: 1. open tsv file. 2. :let g:rbql_output_format='csv' 3. :Select *

g:rbql_meta_language

Default: python

Scripting language to use in RBQL expression. Either 'js' or 'python' To use JavaScript add let g:rbql_meta_language = 'js' to .vimrc

g:rcsv_delimiters

Default: ["\t", ","]

By default plugin checks only TAB and comma characters during autodetection stage. You can override this variable to autodetect tables with other separators. e.g. let g:rcsv_delimiters = ["\t", ",", ";"]

g:disable_rainbow_csv_autodetect

csv autodetection mechanism can be disabled by setting this variable value to 1. Manual delimiter selection would still be possible.

g:rcsv_max_columns

Default: 30

Autodetection will fail if buffer has more than g:rcsv_max_columns columns. You can increase or decrease this limit.

Optional "Header" file feature

Rainbow csv allows you to create a special "header" file for any of your table files. It must have the same name as the table file but with ".header" suffix (e.g. for "table.tsv" table the header file is "table.tsv.header"). The only purpose of header file is to provide csv column names for \d key. It is also possible to use :RainbowSetHeader <file_name> command to set a differently named file as a header for the current table.

Installation

Install with your favorite plugin manager.

If you want to use RBQL with JavaScript expressions, make sure you have Node.js installed

Other

How does it work?

Python module rbql.py parses RBQL query, creates a new python worker module, then imports and executes it.

Some more examples of RBQL queries:

With Python expressions

  • select top 20 len(a1) / 10, a2 where a2 in ["car", "plane", "boat"] - use Python's "in" to emulate SQL's "in"
  • update set a3 = 'US' where a3.find('of America') != -1
  • select * where NR <= 10 - this is an equivalent of bash command "head -n 10", NR is 1-based')
  • select a1, a4 - this is an equivalent of bash command "cut -f 1,4"
  • select * order by int(a2) desc - this is an equivalent of bash command "sort -k2,2 -r -n"
  • select NR, * - enumerate lines, NR is 1-based
  • select * where re.match(".*ab.*", a1) is not None - select entries where first column has "ab" pattern
  • select a1, b1, b2 inner join ./countries.txt on a2 == b1 order by a1, a3 - an example of join query
  • select distinct count len(a1) where a2 != 'US'

With JavaScript expressions

  • select top 20 a1.length / 10, a2 where ["car", "plane", "boat"].indexOf(a2) > -1
  • update set a3 = 'US' where a3.indexOf('of America') != -1
  • select * where NR <= 10 - this is an equivalent of bash command "head -n 10", NR is 1-based')
  • select a1, a4 - this is an equivalent of bash command "cut -f 1,4"
  • select * order by parseInt(a2) desc - this is an equivalent of bash command "sort -k2,2 -r -n"
  • select * order by Math.random() - random sort, this is an equivalent of bash command "sort -R"
  • select NR, * - enumerate lines, NR is 1-based
  • select a1, b1, b2 inner join ./countries.txt on a2 == b1 order by a1, a3 - an example of join query
  • select distinct count a1.length where a2 != 'US'

cli_rbql.py script

rainbow_csv comes with cli_rbql.py script which is located in ~/.vim extension folder.
You can use it in standalone mode to execute RBQL queries from command line. Example:

./cli_rbql.py --query "select a1, a2 order by a1" < input.tsv

To find out more about cli_rbql.py and available options, execute:

./cli_rbql.py -h

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.