beyondgrep / ack3 Goto Github PK

ack is a grep-like search tool optimized for source code.

License: Other

Perl 96.52% PHP 0.01% Python 0.01% Ruby 0.07% Shell 0.01% CMake 0.01% Makefile 0.03% C 0.06% Fortran 0.02% Rebol 0.01% HTML 0.16% JavaScript 0.01% Lua 0.01% CSS 0.01% Prolog 3.09% Dockerfile 0.01% ASP.NET 0.01% Classic ASP 0.01% Vim Script 0.01%

ack cli grep perl

ack3's Introduction

beyondgrep.com

This is the source code for the website https://beyondgrep.com, the home of the ack project. If you are looking for ack, see https://github.com/petdance/ack2

If there are changes to be made in content on the site, you can fork this repo, make changes to the source files in the tt/ directory, and send me a pull request.

Support

ack and beyondgrep.com are supported by DigitalOcean.

ack3's People

Contributors

Stargazers

Watchers

Forkers

jacoby 7373lacym n1vux acidburn0zzz shlomif mrmichaeljkelly romascom jeisc hoelzro timgimyee abrogley trapd00r binhonglee mscottford manwar jkeenan salewski dcermak wrigby stungkit isgasho sergeyromanov kd35a sailfish009 sahwar maugly24 shaforostoff adityavs ucifs ximinhan friederbluemle yuuichihosomi tsingyixy77 andys8 gaczm odongohcoder 0xflotus s22-tech chrestomanci seewoo79 j-xella xiaolongn100 mu-l xenu tabulon-ext etanot ekmixon dabedotcom harrisin2037 guidolanzi99 clayne spiridondumitru talexb plan10os petercao lucas1 reidalex jakebman kohaaloha yarikoptic hjluo indekalu mauke nicomen rjbs omansh-krishn

ack3's Issues

Add tests for the various variables that people expect to be able to use with --output

$_
$.
$1 and friends
$&
$`
$'

Fix -w

Redo the -w flag to properly handle words.

Update documentation that this affects.

Ack with --follow will follow cyclic symlinks

I'm on Arch Linux, using ack built from 9cc2407. I haven't written up a test for this yet, but this reproduces the behavior:

mkdir faux-dir
touch faux-dir/test.c
ln -s faux-dir faux-dir/self
ack --follow -f faux-dir/

ack does stop at around 40 traversals of the self symlink, but I haven't investigated why.

Allow --output and --match only on the command line, not in an ackrc

beyondgrep/ack2#414

Make it easy for us to see what ack is outputting in failing tests

It would be nice if the Util test module stored ack's output in temporary files, and on failure, saved these to a location where the user could easily view/share them.

--ignore-case does not work for ą, ę, ś, ć, ń, ó, ł, ż, ź

Not sure what level of support for unicode is expected here, but since it is supposed to be a "better grep", I'd like to be able to search for Polish words :)

$ cat test.txt
e
E
ę
Ę
a
A
ą
Ą
$ ack -i a test.txt
a
A
$ ack -i ą test.txt
ą
$ ack -i Ą test.txt
Ą

Calling it with ack -i '\p{Uppercase_Letter}' seems to match every line, but the output contains a lot of corrupted characters and I do not even know if copy&pasting here makes any sense.
Similary for ack -i '\p{L}'.

I am using the latest version of ack:

ack --version
ack 2.12
Running under Perl 5.10.1 at /usr/bin/perl

--color-match 'BOLD blue' works, but not in .ackrc

If I run

ack --color-match 'BOLD blue' findme

It works and highlights the code in bold blue. If I put

--color-match 'BOLD blue'

in my .ackrc and try and find some code, I get an error that says

Invalid attribute name 'bold at /usr/bin/ack line 2198

New switch -I for --no-smart-case

-I is the negation of -i and also negates --smart-case.

Error with implied --filter

Given -f, when stdin is a pipe (or --filter makes it look like one), ack emits this error, even though it doesn't need to read stdin. I may have missed something trying to narrow this down. Test case:

$ ls  # starting in an empty directory
$ touch some-file
$ ack -f
some-file
$ ack -f <some-file
some-file
$ ack -f </dev/null
some-file
$ cat /dev/null | ack -f
ack: No regular expression found.
$ ack -f --filter
ack: No regular expression found.
$

(After diving through the code, it seems the test case can eliminate all except the first and last invocations of ack, but I'll leave it here in case it's useful to write a test.)

Expected behavior is what is shown, except for the last two error messages. I have ack 2.12 installed locally using the normal Ubuntu 14.04 package, but I get the same results with http://beyondgrep.com/ack-2.14-single-file.

The change below seems to do the right thing (edit: it doesn't), but I failed writing a test case (to be included in the repo tests), more because I don't know perl than anything else. This change does pass all the included tests and fixes the above error.

--- a/ack
+++ b/ack
@@ -1036,7 +1036,7 @@ sub main {
     }

     my $resources;
-    if ( $App::Ack::is_filter_mode && !$opt->{files_from} ) { # probably -x
+    if ( $App::Ack::is_filter_mode && !$opt->{files_from} && !$opt_f ) { # probably -x
         $resources    = App::Ack::Resources->from_stdin( $opt );
         $opt_regex = shift @ARGV if not defined $opt_regex;
         $opt_regex = $opt->{regex} = build_regex( $opt_regex, $opt );

Fixing how match highlighting works

ack2 has at least seven bugs related to match highlighting: https://github.com/petdance/ack2/issues?q=is%3Aissue+is%3Aopen+label%3Ahighlighting

Either don't strip a leading ./ from path name or provide an option to turn off stripping a leading ./

The following has been reported in Debian as #798180 against ack 2.14 and I can still reproduce it with 2.15.01:

When ack prints filenames, it strips leading "./". This is harmful when
you pass these filenames to another program, because a filename starting
with "--" might be interpreted as an option:
$ ls
--interactive=.html
$ ack -f --html --print0 ./ | xargs -0 rm -f
rm: invalid argument ‘.html’ for ‘--interactive’
Valid arguments are:
 - ‘never’, ‘no’, ‘none’
 - ‘once’
 - ‘always’, ‘yes’
Try 'rm --help' for more information.

The given example can be solved with the GNU-ish -- as Debian ships GNU Coreutils' rm command:

$ ack -f --html --print0 ./ | xargs -0 rm -f --

But this may not work with other commands not providing a GNU-ish -- option or with a different rm implementation, e.g. a BSD-ish one.

I do see that stripping a leading ./ from a file's path is a neat feature for human use (I actually like it :-), so I understand if that's considered a feature. In that case, please provide an option to turn that feature off. So far (as of 2.15.01) I haven't found such an option in ack's man page.

Remove the docs about differences from ack 1

ack 2 is littered with references to how it's different from ack 1. It's been four years since ack 2 has come out, and I don't think these are necessary any more. Move them into a separate compatibility document.

Allow output in JSON

http://thomashunter.name/blog/linux-cli-apps-should-have-a-json-flag/

I like this idea, and I think Andy does too.

Look into making all options globals

If we don't pass around the $opt hash, or don't need to, look into making all of the $opt_X variables package variables. It will be safer because we can't mess up hash keys, and it will be faster.

Update CONTRIBUTING.md and DEVELOPERS.md

Update to reflect changes in ack 3.

- not treated as stdin

Because File::Next only returns existing files, there's no way to use "-" interpreted as stdin -- it gets filtered out before ack gets to see it (except see notes 1-2 below).

Example:

$ cat file
xbyz
$ ack b. - <file  # bug
$ # expected was to read stdin and find a match
$ touch ./-  # watch what happens
$ ack b. - <file  # as expected, but for wrong reasons, note-1
xbyz
$ ack b. ./-  # as expected, note-2
$

Notes 1-2 show that ack b. - <file is testing ./- for file existance, but reading stdin.

Possible fixes are changing the sort order from that specfied on the command line (so that you can pull out "-" before File::Next sees it), which is undesirable, or changing File::Next (because of possible name sorting, I couldn't see how to do this otherwise), or maybe something much more exotic with how File::Next is invoked (I looked hard at this first).

Tested against http://beyondgrep.com/ack-2.14-single-file on Ubuntu 14.04.

Revisit all XXXes, SKIPs and TODOs

Every XXX in the code base and every TODO test needs to either be eliminated because it's not a problem, or turned into a ticket here in the Issues.

Arguments silently ignored in filter mode

Ack silently ignores given filenames in filter mode. Expected behavior is to either ignore stdin (this is what grep does), search the specified arguments in addition to stdin, or warn/error. Actual behavior is to silently ignore the arguments. Examples:

$ cat file
xbyz
$ ack b. file  # as expected
xbyz
$ echo abcd | ack b. file  # bug-1
abcd
$ ack b. file --filter  # bug-2
$ echo abcd | ack b. file --filter  # bug-3
$ echo abcd | ack b. file /dev/stdin --nofilter  # as expected
file
1:xbyz

/dev/stdin
1:abcd
$

Tested against http://beyondgrep.com/ack-2.14-single-file on Ubuntu 14.04.

The last example, with /dev/stdin, doesn't make sense in isolation. I started with "-" instead, then found out ack doesn't treat "-" to mean "read stdin". I didn't think that warranted a separate issue, but any fix to this bug could address that too. (edit: it was a deeper problem than I expected, see #269)

$ ack b. - <file
ack: -: No such file or directory

Add $_ and $. to list of variables you can use with --output

We should probably note that $_ is unchomped, or change the behavior so that $_ is passed to --output as a chomped string.

Implement new --ignore-file and --ignore-dir

ack 3 needs to redo --ignore-file and --ignore-dir.

Add a sprintf-like version of --output

We currently use eval for implementing --output. This is fraught with peril.

Redo --output without eval, and maybe with a sprintf-like formatting system.

Document regex in --ignore-dir

--ignore-dir=foo takes a regex, not a glob.

--ignore-dir=abc* will ignore directory ab as well as abc and abcccccc. This is not what someone would expect if they thought that was a glob.

Output of non-ASCII chars garbled if they match inverted character class

I have a file containing one non-ASCII character, e.g. the German Umlaut "ö". Matching the "ö" normally then all output is just fine. However, the output is garbled when the "ö" is matched by using an inverted character class.

My use case is that I'm searching for files that still use other encodings that UTF-8, and for that I use a character class that excludes all "known good" characters. However, this problem also occurs with UTF-8 encoded files.

Here's an example (copy & paste from the console):

[0 mbunkus@chai-latte ~] ack ö hallo.txt
Hallöle
[0 mbunkus@chai-latte ~] ack -i '[^a-z]' hallo.txt
Hall[0m�le
[0 mbunkus@chai-latte ~] cat hallo.txt
Hallöle
[0 mbunkus@chai-latte ~] locale
LANG=en_US.UTF-8
LC_CTYPE=de_DE.UTF-8
LC_NUMERIC=de_DE.UTF-8
LC_TIME=en_DK.UTF-8
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES=en_US.UTF-8
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

Note that the colorization includes only the "ö" in the good case and "[0m�" in the bad case. Meaning the colorization is correct regarding which characters are highlighted and which aren't; just the characters output are wrong.

This happens both with ack 2.04 release and git at 3e498f7.

BTW: I accidentally filed this issue against the wrong ack repo (the old one) as it's not really easy to find a link to this repo on the home page.

Handle ack2 issues

My current thinking is that we create a handful of ack3 issues that point back to their ack2 parents for the things that we're actually wanting to move forward on.

Googling around found this issue importing tool, but I'm not sure that wholesale migration of the issues is a good thing. https://github.com/IQAndreas/github-issues-import

Explain how searching/selection happens

The manual should describe how Ack selects files and how it searches them. It should probably describe the process as multiple steps:

Option Processing

The options are processed from the command line, environment, and configuration files. See "ACKRC LOCATION SEMANTICS" for details.

The most important thing to know about option processing is that it defines various filetypes (like --perl or --ruby) via --type-add and --type-set.

File selection

After option processing, ack traverses the file list. The file list is the list of arguments following the regex to search for. Any directories in the file list are expanded to their contents recursively (unless -n/--no-recurse) is used. If no explicit file list is provided, the current working directory is assumed.

While traversing the file list, each file provided directly on the command line is selected for search. Each file in the file list that came from recursively traversing a directory is run against filters found in the configuration (ex. --perl, --ignore-file=ext:.bak). Append a --dump to the end of your command line and look for --ignore- and other filters to see what rules ack is applying. (--dump is your friend! It's like EXPLAIN for ack!)

To reiterate: non-directory files specified on the command line are always search by ack. In other words, if you do ack --perl *, the --perl option will not filter out any files in the current working directory!

Any file not matching an ignore rule and matching any (or all? VERIFY) of the specified file types (ex. --perl) is selected for search. If no file type is specified (either on the command line, in the environment, or in one of the configuration files), ack will select all non-ignored files in the file list ( VERIFY ).

File searching

Every file that passes selection is searched using the regex provided on the command line. ( FLESH THIS OUT )

Result display (see also beyondgrep/ack2#66)

The results are displayed. ( FLESH THIS OUT )

Talk about -f, -g, and -l?

Add flag to force line number on output

Add --smart-spacing

Create a --smart-spacing that replaces any whitespace in the pattern with \w+
so insert into table becomes insert\s+into\s+table.

Add glossary entry for "project ackrc"

We should explain exactly what "project ackrc" means. It can mean a set of rules specific to some project, but the rules could also be specific to a subdirectory of a project as well (I can imagine having t/.ackrc for Perl projects). We should explain this, or perhaps even reword "project ackrc".

Add proper Unicode support

Ack should support Unicode properly on perls that can handle it.

Needs to run on perl 5.14 or later (I think this is the minimum version, verify)
- This isn't entirely true; we can handle encoding/decoding and normalization with 5.8. We can't, however, have Unicode-aware regular expressions, nor can we use the shiny new stuff that is included in 5.14. More details later.
-g patterns should work properly with Unicode filenames or regexes containing Unicode characters (or stuff from charnames)
- This applies when the composition/decomposition of the regular expression and the source vary.
The same rules apply to the file searching patterns.
We need to make sure we properly encode/decode files (this could be tough)
- How do we determine files' encodings? Do we assume UTF-8? Do we provide an option for use in ackrc?
The output stream should probably be UTF-8 encoded.
Additional options for collation level should probably be provided.

No output when --pager is a missing environment variable

Ack isn't giving me any output when invoked directly with sudo:

$ echo bang | ack ^
bang
$ echo bang | sudo ack ^
$ sudo -s
# echo bang | ack ^

As you can see, it works with a root shell (even when sudo was used to get that), just not directly with sudo.

I can make the output appear under sudo by hacking App::Ack to print to stdout rather than $fh, changing this:

sub print { print {$fh} @_; return; }

to:

sub print { print @_; return; }

In the Perl debugger I get the same value for $fh both with and without sudo, namely:

  DB<7> x $fh
0  GLOB(0x20a7fd8)
   -> *App::Ack::$pager
         FileHandle({*App::Ack::$pager}) => fileno(3)

I've found this with ack 2.04 and 2.12, on Ubuntu 10.04 and 13.10.

Sorry I don't have time to dig into this further right now. If you can't reproduce this, let me know and I'll try installing from Git.

Add feature to group consecutive lines: the clumping feature

—clump would have to be mutex with any of the context flags. Non-consecutive lines get a blank line between them.

$ ack foo

100: foo
101: foo
102: foo

110: foo

168: foo

189: foo
190: foo

202: foo

Make a test file to test all the .pm files

ack2 had a set of t/lib/*.t files that I've deleted. Replace it with a single test to test them all.

Don't treat command line arguments beginning with '+' as options

https://groups.google.com/forum/?fromgroups#!topic/ack-users/YQyIe1Y3BH0

Add mutex option check for -f + -i (and friends)

This includes -i, --smart-case, and others.

Silent hanging on encountering a ./- file

If ack encounters a file called - in the current directory then it silently hangs for ever.

It seems that the file - is being treated as though it were a command-line argument -, which would indicate to read from standard input (and would indeed correctly cause ack to hang if there weren't anything on stdin). But a file called ./- is unambiguously a file; it isn't a command-line argument, and it isn't stdin.

Steps to reproduce. This works as expected:

$ mkdir bug_551
$ cd bug_551
$ echo aiieee > Batman.txt
$ ack i
Batman.txt
1:aiieee

Creating ./- causes the hang:

$ touch ./-
$ ack i

Explicitly listing the files to search avoids ./- getting in the way:

$ ack i *txt
aiieee

Bug #269 is sort-of the opposite of this, but they may overlap: both are influenced by ./-.

output is truncated near the end of a line

One way to reproduce:

Open urxvt and make it 10 characters wide.
echo 12345678 > foo.txt
ack 1

This results in:

foo.txt
1:1234567

which, as you can see, is missing an 8! I say "one way to reproduce" above because some bits of the instructions are needlessly specific. You can use any pattern that matches the line in question, not just "1". Additionally, what really matters about the line being matched is that it be two characters shorter than the terminal's width -- so if you have a convenient way to open an 80-character wide terminal, for example, you could just write a file with a 78-character line. However, which terminal you use does seem to matter -- urxvt and xterm show this behavior, while gnome-terminal seems not to.

This issue was originally raised at https://groups.google.com/forum/#!topic/ack-users/SfZ7biJAEnU which spawned these additional comments:

Reproducible. On my Ubuntu LTS(12.04), i can reproduce with Xterm and Byobu terminal, with both Ubuntu/Debian ack-grep 1.92 (-a flag required) and ack2 standalone 2.13_06 (installed as ~/bin/ack2), both under Perl 5.14, and confirm that Gnome terminal does not reproduce.
https://groups.google.com/group/ack-users/attach/2542904a22f873cb/image.png?part=5&authuser=0

Andy> What happens if you use grep instead of ack in your example? How about if you try ack --nocolor?

ack2 --nocolor does indeed show the 8, as does ack1, confirming Andy's hypothesis.

grep -n 1 foo.txt shows
1:12345678
colorized properly in 10 char Xterm.
https://groups.google.com/group/ack-users/attach/2542904a22f873cb/image.png?part=4&authuser=0

ack --color 1 | cat exhibits the same truncation effect if Xterm is snugged to where the 8 will be

foo.txt:1:12345678<- put margin here
foo.txt:1:1234567

My tests on --pager='less -r' are inconclusive; it doesn't help in Xterm, not sure what Byobu is doing.

Instructions could be "snug the terminal to hug the line:text return, redo command, observe space at end replacing last character". Should include "in a clean directory" of course, since you only want one file; or call by name, ack 1 foo.txt. Since Xterm is most ubiquitous, recommend that being the Terminal of record for this bug.

Workaround - if full line contents is important, don't use color.

Unknowns. What we don't know for sure is that we're not sending the 8. It could be we're sending the 8 but a bug in the Term emulation is eating it. Select full line does NOT copy an 8, so i suspect we can discount this being a XTERM display bug, but that's hardly certain.

TBD. Unclear if there's any way to reproduce this under Test::Harness control? Before attempting fix, an automated test that fails would be desirable. Do we need to find what ENV or TERMCAP or equiv elicits this bug ? ugh.

DX. Assuming it's us, it's a squirly off by one in the fit-buffer-to-TERM calc, influenced by different Term's ENV/TERMCAP/etc.

probably worth trying a 78 char line in 80 wide terminal on Windows and OS X too (unless they can do 10 char wide terminal which is convenient but ugly ! )`

Word boundaries with non-ASCII character

I am trying to look for the presence of a word containing non-ASCII characters, and this is not possible:

ack '\büber'
ack -w über

The first should find me exactly the lines containing a word starting with über, the second should find exactly the lines where über is a single word, shouldn't it? Texts are UTF-8, and dropping the boundaries gives thousands of results, as does searching with pcregrep.

The first line also returns lines containing words that contain über (such as darüberhinaus) and the second one also those containing words ending in über (such as darüber), which seems to suggest that the boundary matches before ü, i.e. ü is not counted as a word character (but should be).

Locale is set to "de_DE.UTF-8", but unsetting it does not change anything.

(ack 2.12 / perl 5.18.2 with Ubuntu 14.04, and ack 2.14 / perl 5.22 on Mac OS X 10.11)

Matches can be made correct, as far as I can see, adding

use feature unicode_strings; # optional?
use re "/u";

Switching to Unicode processing would probably also help to attack #262 .

to the beginning of the ack script (probably also if adding to the library). Maybe this can be made an option for non-ASCII-ists?

Add a timer so we can see what takes a long time to search

I'm running the new 2.0 on a big code repo, and I think it is dragging on certain files that 1.x skipped. I'm wondering what they would be.

What if we had a --profiling or --timer mode that would tell me which files were being ack hogs? This is on a file-by-file basis, not profiling like under Devel::NYTProf (which we still need to do).

Add test for --ignore-dir=FILTER:FILTERARGS

Currently, we only test the --ignore-dir=DIRNAME form of the --ignore-dir option. Tests for the new syntax must be added. Since non-is filters don't work at the moment, these tests should be implemented but marked as TODO.

Redirecting ack's output can result in an infinite loop

If you redirect ack's output to a file under a search location (ex. under the current working directory, and using no targets with ack), and the search filters would pick up that file, there exists the possibility that ack will search that file while also writing to it, resulting in something like this:

I say "possibly", because some times directory traversal order and userspace buffering can keep this from happening.

Possibly related bug: beyondgrep/ack2#393

Corresponding Google groups discussion: https://groups.google.com/forum/#!topic/ack-users/7qlUu1CXlIE

Patch coming later.

Verify that --foo --nobar works

The documentation states that if a file is classified as both type 'foo' and type 'bar', specifiying --foo and --nobar will cause that file to not be selected. Make sure that there is a test for this.

Do we have a test for when --ackrc is in an ackrc file?

Fix line counting

ack 2 has a number of bugs related to line counting.

Fix them.

Optimization Ideas

Optimization isn't a big priority right now, but I thought I'd at least gather my thoughts on how it can be done. Not every idea here is necessarily a good one; consider this a brain dump:

Profiling

Devel::NYTProf
Using strace -e trace=open can show how many files ack is opening on Linux.

Potential Places to Optimize

Implement barfly testing

In the word branch in ack2, there's an implementation of barfly testing. It needs to:

Be callable from a line of code, such as

use Barfly;
barfly_tests( *DATA );

use Barfly;
barfly_tests( 't/barfly/ack-w.txt' );

Implement the code that does the YESLINES sections, which doesn't work yet. This will rely on the --underline getting implemented.

Break out docs into new structure, and add --man=faq and --man=cookbook options

I've found it very difficult to maintain docs in ack. We also need more granular docs.

Create a new doc structure App::Ack::Docs::*. To start with, we'll have

App::Ack::Docs::Man -- current manual
App::Ack::Docs::FAQ
App::Ack::Docs::Cookbook -- A list of examples and recipes for using ack (I'm not set on the name "Cookbook")

All these docs will need to get put into the ack standalone. They'll get put at the end of the file during squashing.

The ack --man switch will now take arguments, so you can say ack --man=faq and ack --man=cookbook.

Make -x and --type mutually exclusive

The --type (and related --perl, --php, etc) flag has no effect when the -x flag is in use. We should make ack warn about this.

Here's the use case:

I was searching for files that use Test::Deep but don't actually use the cmp_bag function. So I used this:

ack Test::Deep -l | ack -x -L cmp_bag

Works just fine, but it gave me hits on plain text files, so I ran it again as

ack Test::Deep -l | ack -x -L cmp_bag --perl

The --perl has no effect because -x is effectively like specifying files on the command line, and the type limiters have no effect in that case. Instead, ack should have warned saying "Too bad, that won't work."

(The way to do this was to add --perl to the first ack invocation.)

ack Test::Deep -l --perl | ack -x -L cmp_bag