Coder Social home page Coder Social logo

column-grep's People

Contributors

qualiaa avatar

Watchers

 avatar  avatar

column-grep's Issues

Negate match

User should be able to negate a match. Ideally, this should work and feel consistent for any index, range or name.

Probably the nicest way to do this is following awk, and allowing a leading ! before a particular match value to negate it. For example, /^type-/,!3:5 will match any column starting with "type-" except columns 3-5.

There is an open question whether these should override each other in order of appearance; for now if a match is negated then it should override any other match.

(Input|Output) (Field|Record) separators

Currently, column-grep assumes that the input is a CSV. However, following many other tools, most notably awk, it should allow the user to describe both input and output separators for records and fields.

Long names should be descriptive:

  • -f --input-field-separator
  • -F --output-field-separator
  • -r --input-record-separator
  • -R --output-record-separator

If only the input version is provided for a particular option, the output should default to the provided value.

File IO options

Options to do input and output from files rather than standard streams.

  • -i<file> --input <file>: read input from file, disable stdin
  • -o<file> --output <file>: output to file, disable stdout (allow multiple times)

If <file> is - then it in interpreted as the appropriate standard stream.

Use ByteString.Builder

Currently the output is rebuilt from fields and records using ByteString.intercalate - the documentation highlights that this is slow and that ByteString.Builder is 20% faster.

Case insensitive matching

All regex are currently case-sensitive. PCRE has an compilation flag for case-insensitivity, we should expose this as a flag; following grep:

  • -i, --ignore-case for case-insensitive matching
  • --no-ignore-case for case-sensitive (default).

Could also follow GNU sed with flags on the end of regexes: /re/i for case-insensitive, /re/I for case-sensitive.

Further regex configuration

The following compilation options from pcre_compile are available through Text.Regex.PCRE.CompOpts and should be toggle-able:

PCRE flag Description
PCRE_CASELESS Do caseless matching
PCRE_MULTILINE ^ and $ match newlines within data
PCRE_DOTALL . matches anything including NL
PCRE_NEWLINE_ANY Recognize any Unicode newline sequence
PCRE_NEWLINE_ANYCRLF Recognize CR, LF, and CRLF as newline sequences
PCRE_NEWLINE_CR Set CR as the newline sequence
PCRE_NEWLINE_CRLF Set CRLF as the newline sequence
PCRE_NEWLINE_LF Set LF as the newline sequence
PCRE_UTF8 Run in pcre_compile() UTF-8 mode
PCRE_NO_UTF8_CHECK Do not check the pattern for UTF-8 validity (only relevant if PCRE_UTF8 is set)
PCRE_UCP Use Unicode properties for \d, \w, etc.
PCRE_UNGREEDY Invert greediness of quantifiers

PCRE_UTF8 and PCRE_UNGREEDY should be on by default; PCRE_CASELESS pertains to #2 .

PCRE_NEWLINE_* and PCRE_MULTILINE pertain to #5, as certain values of field and record separators modify their interpretation.

Output-spec ordering and duplicates

Currently, the order in which output-specs appear is ignored, and duplicate columns are ignored. This is a sensible default, but it would be nice to support intentional column duplication and output-ordering.

For example, the following would produce column 1 multiple times:

column-grep --duplicates 1 1 1

This would produce columns 1, then 3, then 2:

column-grep --preserve-order 1 3 2

This would produce 1, then 3, then 2, then 3 again:

column-grep --preserve-order 1 3 2 3

`--no-headers` flag

If --no-headers is given, do not treat the first line specially. Produce an error for any attempts to match by name.

Shorthand could be -n?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.