livegrep / livegrep Goto Github PK

Interactively grep source code. Source for http://livegrep.com/

License: Other

Shell 0.39% Go 22.42% C++ 50.53% C 0.48% CSS 4.41% JavaScript 12.52% HTML 3.60% Dockerfile 0.40% Starlark 5.25%

livegrep's Introduction

Livegrep

Livegrep is a tool, partially inspired by Google Code Search, for interactive regex search of ~gigabyte-scale source repositories. You can see a running instance at http://livegrep.com/.

Building

livegrep builds using bazel. You will need to install with a version matching that in .bazelversion. Running bazel via bazelisk will download the right version automatically.

livegrep vendors and/or fetches all of its dependencies using bazel, and so should only require a relatively recent C++ compiler to build.

Once you have those dependencies, you can build using

bazel build //...

Note that the initial build will download around 100M of dependencies. These will be cached once downloaded.

Invoking

To run livegrep, you need to invoke both the codesearch backend index/search process, and the livegrep web interface.

To run the sample web interface over livegrep itself, once you have built both codesearch and livegrep:

In one terminal, start the codesearch server like so:

bazel-bin/src/tools/codesearch -grpc localhost:9999 doc/examples/livegrep/index.json

In another, run livegrep:

bazel-bin/cmd/livegrep/livegrep_/livegrep

In a browser, now visit http://localhost:8910/, and you should see a working livegrep.

Using Index Files

The codesearch binary is responsible for reading source code, maintaining an index, and handling searches. livegrep is stateless and relies only on the connection to codesearch over a TCP connection.

By default, codesearch will build an in-memory index over the repositories specified in its configuration file. You can, however, also instruct it to save the index to a file on disk. This has the dual advantages of allowing indexes that are too large to fit in RAM, and of allowing an index file to be reused. You instruct codesearch to generate an index file via the -dump_index flag and to not launch a search server via the -index_only flag:

bazel-bin/src/tools/codesearch -index_only -dump_index livegrep.idx doc/examples/livegrep/index.json

Once codeseach has built the index, this index file can be used for future runs. Index files are standalone, and you no longer need access to the source code repositories, or even a configuration file, once an index has been built. You can just launch a search server like so:

bazel-bin/src/tools/codesearch -load_index livegrep.idx -grpc localhost:9999

The schema for the codesearch configuration file defined using protobuf in src/proto/config.proto.

`livegrep`

The livegrep frontend accepts an optional position argument indicating a JSON configuration file; See doc/examples/livegrep/server.json for an example, and server/config/config.go for documentation of available options.

By default, livegrep will connect to a single local codesearch instance on port 9999, and listen for HTTP connections on port 8910.

github integration

livegrep includes a helper driver, livegrep-github-reindex, which can automatically update and index selected github repositories. To download and index all of my repositories (except for forks), storing the repos in repos/ and writing nelhage.idx, you might run:

bazel-bin/cmd/livegrep-github-reindex/livegrep-github-reindex -user=nelhage -forks=false -name=github.com/nelhage -out nelhage.idx

You can now use nelhage.idx as an argument to codesearch -load_index.

Local repository browser

livegrep provides the ability to view source files directly in livegrep, as an alternative to linking files to external viewers. This was initially implemented by @jboning here. There are a few ways to enable this. The most important steps are to

Generate a config file that livegrep can use to figure out where your source files are (locally).
Pass this config file as an argument to the frontend (-index-config)

Generating index manually

See doc/examples/livegrep/server.json for an example config file, and server/config/config.go for documentation on available options. To enable the file viewer, you must include an IndexConfig block inside of the config file. An example IndexConfig block can be seen at doc/examples/livegrep/index.json.

Tip: For each repository included in your IndexConfig, make sure to include metadata.url_pattern if you would like the file viewer to be able to link out to the external host. You'll see a warning in your browser console if you don't do this.

Generating index with `livegrep-github-reindex`

If you are already using the livegrep-github-reindex tool, an IndexConfig index file is generated for you, by default named "livegrep.json".

Run the indexer

bazel-bin/cmd/livegrep-github-reindex/livegrep-github-reindex_/livegrep-github-reindex -user=xvandish -forks=false -name=github.com/xvandish -out xvandish.idx ```

The indexer will have done these main things:

Clone (or update) all repositories for user=xvandish to/in repos/xvandish
Create an IndexConfig file - repos/livegrep.json
Create a code index, this is whats used to search - ./xvandish.idx

Here's an abbreviated version of what your directory might look like after running the indexer.

livegrep
│   xvandish.idx
└───repos
│   │   livegrep.json
│   └───xvandish
│       └───repo1
│       └───repo2
│       └───repo3

Using your generated index

Now that you generated an index file, it's time to run livegrep with it.

Run the backend:

bazel-bin/src/tools/codesearch -load_index xvandish.idx -grpc localhost:9999

Run the frontend in another shell instance with the path to the index file located at repos/livegrep.json.

bazel-bin/cmd/livegrep/livegrep_/livegrep -index-config ./repos/livegrep.json

In a browser, now visit http://localhost:8910 and you should see a working livegrep. Search for something, and once you get a result, click on the file name or a line number. You should now be taken to the file browser!

Docker images

Livegrep's CI builds Docker images into the livegrep organization docker repository on every merge to main. They should be generally usable. For instance, to build+run a livegrep index of this repository, you could run:

docker run -v $(pwd):/data ghcr.io/livegrep/livegrep/indexer /livegrep/bin/livegrep-github-reindex -repo livegrep/livegrep -http -dir /data
docker network create livegrep
docker run -d --rm -v $(pwd):/data --network livegrep --name livegrep-backend ghcr.io/livegrep/livegrep/base /livegrep/bin/codesearch -load_index /data/livegrep.idx -grpc 0.0.0.0:9999
docker run -d --rm --network livegrep --publish 8910:8910 ghcr.io/livegrep/livegrep/base /livegrep/bin/livegrep -docroot /livegrep/web -listen=0.0.0.0:8910 --connect livegrep-backend:9999

And then access http://localhost:8910/

You can also find the docker-compose config powering livegrep.com in the livegrep/livegrep.com repository.

Resource Usage

livegrep builds an index file of your source code, and then works entirely out of that index, with no further access to the original git repositories.

The index file will vary somewhat in size, but will usually be 3-5x the size of the indexed text. livegrep memory-maps the index file into RAM, so it can work out of index files larger than (available) RAM, but will perform better if the file can be loaded entirely into memory. Barring that, keeping the disk on fast SSDs is recommended for optimal performance.

Regex Support

Livegrep uses Google's re2 regular expression engine, and inherits its supported syntax.

RE2 is mostly PCRE-compatible, but with some mostly-deliberate exceptions

LICENSE

Livegrep is open source. See COPYING for more information.

livegrep's People

Contributors

Stargazers

Watchers

Forkers

dariaphoebe ackalker ternus dsvb hobinjk aquamongoose v2e4lisp dropbox nibalizer aneutrino philz cobjet-dev tofay serialdoom shalecraig paulproteus emaxerrno etel lekkas wfxiang08 mcanthony bkrausz adityavs tpcwang downtocode yiqideren kumarasamy claudiouzelac ajgappmark relaxar j3parker calvinfo rcjavier zarvox joelburget duzhanyuan xiaovictoria8 tammybutow mozsearch chubbymaggie akornor heepamwow gauntletwizard databricks isaacd9 inspirejar ilovejs hariram32 shannonyu ni ilinum ains-stripe jeffcarpenter algotrader-dotcom daviddoran-stripe hunglethanh9 awesome-archive miikka alexdebrie jayashe trucnguyenlam staktrace utsav2 urjitbhatia ericksoen jkingharman markomafs pombredanne benjaminp ekinhbayar thumphries craigfurman beyonddream-productions lzufalcon zawataki backwardn stjordanis milenkomarkovic thangnc2707 rohinibarla alek tarsbase mbrukman doc22940 paulmaidment-awin sandyydk bugout-dev sreeram26 pcj areitz ryanaross brian-yu sumwatshade christopherb-stripe bakks adeepm voltbit sezelt mfkiwl hilkinr

livegrep's Issues

Read and take ideas from a Google study on their internal code-search tool

OK, this isn't really a great fit for a GitHub issue, but this seems like the best medium for it: a while ago in the context of our internal Livegrep instance at Dropbox, someone pointed at this paper: "How Developers Search for Code: A Case Study" which comes from studying the internal code-search tool at Google. What I hear from Googlers about their internal tool sounds awesome, so maybe this paper will contain some good ideas from there.

I haven't yet read the paper; ran across it in my email while looking for past feature requests just now. So this is a note for myself to go back and read it and report back with ideas, and maybe @nelhage and @jboning and others here will be interested in doing so too.

All livegrep search result links are "/null"

When I back up to b459d1f the livegrep README instructions work fine, but when I advance to the next master commit 051b043 then the service, when run using the same README instructions, presents search rules that when moused over all offer only the link http://localhost:8910/null instead.

Literal search by default, regex optional

This is a change we made in our internal livegrep instance at Dropbox, and I've been quite happy with it.

In my experience it's been much more common that I'm searching for a literal string that involves a paren or bracket or backslash or something and needs to be escaped than that I actually want to search for a regex. Regex search is irreplaceable when it's needed, but flipping the default has made the experience smoother.

The way I'd imagine doing this is with a re: colon-operator, the antonym of lit:. Then search clauses with no colon-operator become implicitly lit: instead of re:. (Our internal instance is a fork from 2014, predating colon-operators, so we do it with a checkbox labeled "regex" next to the search box.)

Maybe this default should be instance-configurable -- the fast regex search is a sweet demo, so maybe livegrep.com should keep it front and center by keeping regex as the default.

livegrep.com returns server error 500

Talking to backend: rpc error: code = 4 desc = context deadline exceeded

I think I got this error before locally when the backend isn't using grpc.

Query URL should not include `&fold_case=auto&regex=false`

These get appended to the URL even though they're the default, which makes it harder to skim when copy/pasted somewhere.

The downside to not always including that information is that it will break those links if we ever change the defaults, or make their values default based on a cookie.

Soft wrap long lines of code

I just got this working on a project of mine and I really like it, good work!

I did notice, however, that whenever I had a long single line of code that it wasn't being soft wrapped and instead pushes off the screen:

Would it be possible to investigate introducing a soft wrap like on some text editors:

Permalinks for specific version on file-browse page

Another followup feature to #55. I think @jboning is already thinking about this one, though we don't have it in our existing internal instance.

I'd love to be able to send links around that will continue to mean exactly what I see on my screen, so that e.g. what I say about them in an email or a commit message continues to make sense into the future. Especially useful when the link is to line 623 or so in the file and there's bound to be churn above it changing the line numbering.

GitHub's browser even has a handy keyboard shortcut to switch to a permalink: y. It replaces the commitish (like master) in the current URL with the commit ID of the commit it currently refers to, so you can just copy-paste the link you're now at. I'd like to find a more discoverable way of providing this feature, but that'd be a fine start.

Search filenames

Doable now with . file:FILENAME$, but it'd be nice if it didn't require the . trick (and thus matching on every line of the file).

(I know, I know, just send in a pr. Consider this a reminder to myself to work on it when I have time, unless someone else feels like taking a stab at it first.)

Improve web UI performance

As the match limit gets larger, I observe that browser-side processing time outstrips the the server-side search time (perceptible as interactive lag, but not reported explicitly the UI--reference #93).

There's probably some small wins to be had in tuning the JS (for example, we trigger renders a bit overzealously), but browser layout time is the bigger problem:

It seems likely that fixing this will involve revisiting #51's decisions to try to make our DOM more browser-friendly.

Switching to React as a frontend framework may also help us minimize the amount of DOM manipulation/invalidation performed.

Highlight query matches on clicking through to browse

This is a followup feature that becomes possible with #55 -- on clicking through a search result to a browse page, it'd be neat to highlight the query match there too. (Probably with a parameter in the query-part of the URL or something.)

Feature request: add repo support for other VCS through fs_paths

My use case is that I want to index source code on various VCS with other source browsers that I have already set up. I don't think livegrep should add support for those VCS, but it would be nice to indirectly support them through the file indexer. I think the following tasks will be necessary:

Expand fspaths from arrays of strings to arrays of objects with similar structure as repositories
Support rendering links using a custom format (possibly plumbing metadata through the layers)

What are your thoughts? I'll look to see if I can implement it if you are okay with this approach.

Search timer in UI underreports compared to user-visible latency

The search time displayed in the UI represents the time taken to fetch search results from the server. It doesn't include time spent by our javascript processing the results, or time spent by the browser rendering our DOM.

Queries containing case/lit modifiers may behave strangely if used together

livegrep always combine the line query in this order:
out.Line = strings.TrimSpace(ops[""] + ops["case"] + regexp.QuoteMeta(ops["lit"]))

This produces unexpected results because this can reorder a user's query. See:
https://livegrep.com/search/linux?q=hello+world
https://livegrep.com/search/linux?q=case%3Ahello+world

When browsing a directory, render README files

This is another feature that'd be a nice followup to #55; we've heard it requested for our internal instance at Dropbox.

GitHub does this, and it goes back at least to the '90s in the browse-and-download-files views of popular web servers. It's a handy feature.

MVP: Just look for a file named README and spit out its contents in fixed-width font. This'd already be pretty useful, not to mention a fun way to relive those '90s web servers.
Easy bonus points: do the same for README.md, README.rst, maybe just README*. (If several such files, I'd show the first one lexicographically, so in particular plain README if it exists.)
Bigger bonus points: match GitHub's array of formats (at least .md) and render them appropriately. To do that comprehensively the only reasonable way is probably to use GitHub's own implementation. If deploying a blob of Ruby (and all its friends) sounds like too much of a pain (*), just supporting Markdown (and leaving the rest as raw text in fixed-width font) would be about 70% of the value and can probably be much simpler.

(*) It sure does to me but I'd love someone to prove that assessment wrong.

Allowing a end-user-configurable soft-limit on number of match results

There are a couple of times that I wish livegrep can return more than 50 results. I know that the backend currently supports defining the maximum number of match results as a command line argument, but I think it could be beneficial to have a configurable soft-limit.

For example, I might pass max_matches=200 to the backend, and set the default soft-limit on the frontend to be 50. But if I want to look at more results, I can explicitly change the soft-limit up to 200 to see more results. This could be presented as a hidden url parameter or a drop-down box.

What are your thoughts?

Syntax highlighting

This is a handy feature we have in our internal livegrep instance at Dropbox, as a followup to file browsing #55.

(Come to think of it, we probably should have it for search results too, not only the browse pages. We don't currently, though, and I haven't heard anyone ask for it, so not a big deal.)

I get a warning for 93MB ASCII Text file

I have written a small scripts based on src/tools/codesearch.cc to index a single file without any changes to the defaults in codesearch.cc. I'm able to index small files ( <10MBs ) and search works as expected.

Test Script Pseudo code:

search->set_alloc( make_mem_allocator() );
fs_indexer indexer(search, path , name, 0 ); 
indexer.read_file( path ); 
search->finalize();

The problem is with slightly larger files( 90MB ). After stepping through the code I found that the code was returning at this point in code_searcher::index_file function:

if (memchr(p, 0, len) != NULL){
       return; 
}

After commenting this part I was able to index the 90MB file with a warning that the file is too big. Following are the metrics.

Metrics:

Number of lines: 627941 <-- I added this line
WARN: test:: is too large to be indexed.
indexed in 1.750006s
== begin metrics ==
index.bytes 97395291
index.bytes.dedup 80669472
index.content.chunks 1
index.content.ranges 0
index.data.chunks 1
index.files 1
index.lines 627941
index.lines.dedup 499911
timer.git.walk 0
timer.index.dedup.hash 315
timer.index.divsufsort 7977
timer.index.fixupnl 118
timer.index.index_file 1505
== end metrics ==
Result Size: 0

I wasn't able to perform any searches on this even though the given string was present in the file.
Do I have to tweak some parameters to index big files?

Only the first match per line is highlighted

E.g., this query:
https://livegrep.com/search/linux?q=rx_length_errors+file%3Artnetl
matches a line

	a->rx_length_errors = b->rx_length_errors;

but only the first rx_length_errors is highlighted, not the other one.

This is potentially pretty confusing if e.g. you're searching for places where some field (like rx_length_errors) is read from, and you don't see any -- worse, you see some but not all so you don't immediately realize anything is wrong -- because when you scan the results page for the matches, the one you're looking for isn't highlighted. At a quick test, GNU grep, git grep, ack, ag, and rg` all highlight both matches, so that's also what users will expect.

I think to fix this I'd probably extend the SearchResult type in src/proto/livegrep.proto and the corresponding Result type in server/api/types.go so that their respective Bounds/bounds members represented not one but a sequence of intervals within the line. Then I think it'll be pretty straightforward how each layer in turn adapts to provide/transmit/consume that data.

The alternative I'd consider is to instead send multiple Result / SearchResult messages per line when there are multiple matches. This feels less good to me partly because it's more overhead especially when there are a lot of matches on a line, and maybe more fundamentally because it opens up the possibility of getting a message for an arbitrary subset of the matches on a line, which feels off. @nelhage may have other thoughts or another idea entirely, though.

Thanks to @jhurwitz for pointing this out on our internal instance at Dropbox :)

codesearch crashes on empty repo

If you run codesearch on an empty git repo (eg, one with no branches or commits yet), it crashes:

(gdb) run repos/fail.json
Starting program: /home/bnewbold/.cache/bazel/_bazel_bnewbold/9bebf2ad52920ae55fb82a2d3094cf32/execroot/livegrep/bazel-out/local-fastbuild/bin/src/tools/codesearch repos/fail.json
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff6f80700 (LWP 11764)]
[New Thread 0x7ffff677f700 (LWP 11765)]
[New Thread 0x7ffff5f7e700 (LWP 11766)]
[New Thread 0x7ffff577d700 (LWP 11767)]
Walking repo_spec name=internetarchive/community_tools, path=repos/internetarchive/community_tools
walking HEAD... codesearch: ./src/smart_git.h:69: smart_object<T>::operator T*() [with T = git_commit]: Assertion `obj_' failed.

Thread 1 "codesearch" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007ffff6fb53fa in __GI_abort () at abort.c:89
#2  0x00007ffff6face37 in __assert_fail_base (fmt=<optimized out>, assertion=assertion@entry=0x555555a17d05 "obj_",
    file=file@entry=0x555555a17ce9 "./src/smart_git.h", line=line@entry=69,
    function=function@entry=0x555555a17e80 <smart_object<git_commit>::operator git_commit*()::__PRETTY_FUNCTION__> "smart_object<T>::operator T*() [with T = git_commit]") at assert.c:92
#3  0x00007ffff6facee2 in __GI___assert_fail (assertion=0x555555a17d05 "obj_",
    file=0x555555a17ce9 "./src/smart_git.h", line=69,
    function=0x555555a17e80 <smart_object<git_commit>::operator git_commit*()::__PRETTY_FUNCTION__> "smart_object<T>::operator T*() [with T = git_commit]") at assert.c:101
#4  0x00005555555cb88b in smart_object<git_commit>::operator git_commit*() ()
#5  0x00005555555cabbb in git_indexer::walk(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
#6  0x0000555555596bf3 in build_index(code_searcher*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) ()
#7  0x0000555555596e9b in initialize_search(code_searcher*, code_searcher*, int, char**) ()
#8  0x0000555555597307 in main ()

A trivial example of this is github.com/internetarchive/community_tools.

The return type value of libgit2's git_revparse_single() is not checked at src/git_indexer.cc:46. PR forthcoming.

Highlight definitions in search results, and/or rank by relevance

This is another frequently-requested feature for our internal Livegrep instance at Dropbox -- one of the features that the other, entirely-in-house code-search tool has and that users often say they miss when using our Livegrep instance instead.

A big fraction of the time when I do a search, what I really want is the definition of the thing, not all its uses. If there are 30 uses, I may have to skim through a bunch of them, and if there are 300 the definition may not make it onto the results page at all.

Of course I can try to write the query narrowly to find the definition -- like def foo for Python, though it gets dicier in some other languages like C. (^foo? ^\S.*foo? \S[^(]*foo?) Even in Python, it's more typing, enough to add some friction when I what I really have is a navigational query and I'm hoping to get the answer fast.

So it'd be super handy to have the indexer run some tool to locate definitions, heuristically -- exuberant-ctags? some newer awesomer tool? -- and use that to help me find the definition. Minimally, this might highlight the likely-definition in some way. A fancier option would be, in a post-#51 world, to pull the file(s) with a likely-definition to the top of the results page. The fancier option is actually pretty easy, given the definitions data in the first place, and I don't think a lexicographic filename order is something I'm particularly attached to as a user, so I'd go for that.

The main question for implementing this is to pick an appropriate source for the what's-a-definition data, and add to the build process a step to run that tool and store the metadata it produces.

For an MVP of this feature, I think it's OK if when there are like 3000 non-definition results, more than we want to search through, the definition can go missing from the search results page. With that, I think we don't need any heavy-duty indexing on the what's-a-definition data, and can just query it for decoration to add to each search result we find. In a fancier version, we'd have some kind of index for that that we search in parallel, and guarantee that the definitions take priority before we abort the search for too many results.

Request: Make it more obvious which repo you're searching in

Livegrep saves the last repo you searched in now, which is cool! One thing that's frustrating is that I'll sometimes forget to check which repo I have saved, so I end up searching in the wrong repo. I end up with 0 results and assume that my search term can't be found, when it actually is.

Maybe when 0 results are found, there could be a message saying "0 results found in [repository]" to help people notice that mistake.

Search across several repos, or in one user-chosen repo

At Dropbox like at probably most organizations, we have a number of different repos -- a few big ones (one for most server code, one for the desktop client, etc.), and various miscellaneous ones that don't belong in those or we just haven't gotten around to merging in.

Our internal livegrep instance searches across all of our major repos and some minor ones. There's another, entirely in-house, code-search tool that searches across just one repo at a time, with a big ol' dropdown to choose which one you want.

I'm often searching to convince myself that something isn't used anymore or some bad pattern doesn't exist, or to find out where something is referred to from, like when some code is clearly being invoked and I can't tell how. So I really like having comprehensive search across all the repos related to a project, which is why I built our internal livegrep instance that way.

But it can also get kind of noisy, especially in navigational searches when you're trying to type just a minimal quick pattern (with the power of search-as-you-type!) to find the thing you have in mind, and you know perfectly well which repo it's in. Especially if your work is almost always in a particular repo; which for many people it is, especially so at a company that does a good job of the "monorepo" pattern that Google and Facebook are well known for and Dropbox successfully mostly moved to in 2014 and 2015, where e.g. all your web apps and backend services (and maybe more things) are seamlessly source-controlled together. So the desire to search just one repo is a pretty common reason that users give for preferring the other tool. And we're about evenly split on users who prefer that tool vs. our livegrep instance, so that's a lot of users with that preference.

Crucially, if the last time you used that other tool you searched repository X, then the next time you go to its front page and just type a quick query, you'll get a search over repository X.

I'm not sure how to integrate this well with Livegrep's UI and the all-repos search that I also value. So this is a less concrete feature request than usual. Ideas?

Query modifier UI

Inspired by #71, filing a separate issue to discuss query modifier UI in general.

The query parser currently understands these tags:

file:
repo:
tags:
case:
lit:
max_matches:

I'd group these as follows:

The file: and repo: tags are like extra search terms, filtering query results. (tags: seems also to fall into this category, but additionally slightly modifies treatment of the main content query?)
case: and lit: are modifiers for the meaning of the main query.
max_matches: is a modifier on the overall search behaviour.

For the second group (case and lit), there are a few rough edges:

The default behaviour/setting is not obvious (though it's documented on the help page).
If a user typically uses a particular mode, the setting isn't sticky.
It's not clear how these modifiers interact with file and repo terms--whether you can ask for file:lit:foo.c. Since you can't, there's no way to control the case-sensitivity of a file field (after #72).

One possibility is to move the query modifiers into separate UI controls, something like this:

Case:  (x) auto   ( ) match   ( ) ignore
Query: (x) regex  ( ) literal

This:

makes the query interpretation explicit in the UI (especially the "auto" case-sensitivity detection, which I didn't even realize existed!)
lets us store the user's case/query preferences in a cookie and apply them again at next page load

With this UI, I think it would be least surprising to apply the settings to both the main query and file: terms, which would be a partial reversal of #72.

Built-in file browsing

GitHub is pretty OK as a file browser for livegrep search results to link to, but it has a few limitations:

It's a little slow -- I think of it as quite slow, based on some trials just now it's not bad so maybe it's improved, but it's still not as snappy as livegrep itself;
Because it's not livegrep, we can't add UI for exploring the codebase with further livegrep searches, cross-references, etc.;
Some of us have a different, slower backend -- e.g. at Dropbox we use Phabricator, which I think is a decidedly better code-review tool but the code browsing is not as fast as GitHub let alone livegrep.

This used to be a frequent feature request on our internal instance at Dropbox, and then @christoffer built it :). @jboning plans to send a rebased version of that implementation upstream here soon.

js loading strategy

Forking discussion from #60 since this is largely a separate issue.

I spent a while last week fiddling with https://github.com/dropbox/rules_node, and it seems to work OK. This gives us a reasonable way to make webpack create a bundle of javascript (and other static resources) for the browser to load.

So now the question arises: how do we actually want to organize our bundles? Webpack is wildly flexible about how we can do things, so there are a bazillion options. Here's my rough summary of the options....

Bundling:

Create one big bundle containing everything.
Create separate bundles per page.
Create separate bundles per page, but automatically extract common subset into a separate bundle?

CSS: include in the bundle or not?

Big, broadly-used third-party libraries (jquery, etc):

Roll into our own JS bundles?
Combine into one distinct "third-party" bundle?
Load as standalone resources via <script> tags (as currently)? (Variation: host all of our resources ourselves, or load from an external CDN?)

Continuous re-indexing

This is more of an improvement suggestion. It seems like a shortcoming to have to manually launch a re-indexing every time files change.

Since repos get updated all the time, most likely several times a day, it would be really cool to watch the filesystem or listen for changes and index new or updated files automatically.

Compilation failure

Error message

src/indexer.cc:261:37: error: variable length array of non-POD element type 'intrusive_ptr<IndexKey>'
        intrusive_ptr<IndexKey> keys[nrunes];
                                    ^

GCC Version

[~/m/livegrep]─[⎇ master]─(1)-> gcc -v
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 8.0.0 (clang-800.0.38)
Target: x86_64-apple-darwin16.5.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

All logs

[~/m/livegrep]─[⎇ master]─> bazel build //...
INFO: Found 32 targets...
INFO: From Compiling external/com_github_google_protobuf/src/google/protobuf/compiler/java/java_file.cc [for host]:
external/com_github_google_protobuf/src/google/protobuf/compiler/java/java_file.cc:159:6: warning: unused function 'CompareFieldsByName' [-Wunused-function]
bool CompareFieldsByName(const FieldDescriptor *a, const FieldDescriptor *b) {
     ^
1 warning generated.
INFO: From Compiling external/com_github_google_protobuf/src/google/protobuf/compiler/cpp/cpp_message.cc [for host]:
external/com_github_google_protobuf/src/google/protobuf/compiler/cpp/cpp_message.cc:376:8: warning: unused function 'MessageTypeProtoName' [-Wunused-function]
string MessageTypeProtoName(const FieldDescriptor* field) {
       ^
1 warning generated.
INFO: From Compiling external/com_github_google_protobuf/src/google/protobuf/util/internal/field_mask_utility.cc [for host]:
external/com_github_google_protobuf/src/google/protobuf/util/internal/field_mask_utility.cc:47:14: warning: unused function 'CreatePublicError' [-Wunused-function]
util::Status CreatePublicError(util::error::Code code,
             ^
1 warning generated.
INFO: From Compiling external/com_github_google_protobuf/src/google/protobuf/util/internal/utility.cc [for host]:
external/com_github_google_protobuf/src/google/protobuf/util/internal/utility.cc:52:19: warning: unused function 'SkipWhiteSpace' [-Wunused-function]
const StringPiece SkipWhiteSpace(StringPiece str) {
                  ^
1 warning generated.
INFO: From Compiling external/com_github_google_protobuf/src/google/protobuf/compiler/java/java_enum_lite.cc [for host]:
external/com_github_google_protobuf/src/google/protobuf/compiler/java/java_enum_lite.cc:53:6: warning: unused function 'EnumHasCustomOptions' [-Wunused-function]
bool EnumHasCustomOptions(const EnumDescriptor* descriptor) {
     ^
1 warning generated.
INFO: From Compiling external/com_github_google_protobuf/src/google/protobuf/compiler/js/js_generator.cc [for host]:
external/com_github_google_protobuf/src/google/protobuf/compiler/js/js_generator.cc:249:8: warning: unused function 'GetPath' [-Wunused-function]
string GetPath(const GeneratorOptions& options,
       ^
external/com_github_google_protobuf/src/google/protobuf/compiler/js/js_generator.cc:266:8: warning: unused function 'GetPath' [-Wunused-function]
string GetPath(const GeneratorOptions& options,
       ^
external/com_github_google_protobuf/src/google/protobuf/compiler/js/js_generator.cc:547:8: warning: unused function 'JSMapGetterName' [-Wunused-function]
string JSMapGetterName(const GeneratorOptions& options,
       ^
3 warnings generated.
INFO: From Compiling external/com_github_google_protobuf/src/google/protobuf/compiler/java/java_message_builder_lite.cc [for host]:
external/com_github_google_protobuf/src/google/protobuf/compiler/java/java_message_builder_lite.cc:71:8: warning: unused function 'MapValueImmutableClassdName' [-Wunused-function]
string MapValueImmutableClassdName(const Descriptor* descriptor,
       ^
1 warning generated.
INFO: From Compiling external/com_github_google_protobuf/src/google/protobuf/compiler/java/java_message_lite.cc [for host]:
external/com_github_google_protobuf/src/google/protobuf/compiler/java/java_message_lite.cc:1032:6: warning: unused function 'CheckHasBitsForEqualsAndHashCode' [-Wunused-function]
bool CheckHasBitsForEqualsAndHashCode(const FieldDescriptor* field) {
     ^
1 warning generated.
INFO: From Compiling external/com_github_google_protobuf/src/google/protobuf/text_format.cc [for host]:
external/com_github_google_protobuf/src/google/protobuf/text_format.cc:77:13: warning: unused function 'GetAnyFieldDescriptors' [-Wunused-function]
inline bool GetAnyFieldDescriptors(const Message& message,
            ^
1 warning generated.
INFO: From Compiling external/com_github_google_protobuf/src/google/protobuf/util/time_util.cc [for host]:
external/com_github_google_protobuf/src/google/protobuf/util/time_util.cc:52:18: warning: unused variable 'kMicrosPerMillisecond' [-Wunused-const-variable]
static const int kMicrosPerMillisecond = 1000;
                 ^
external/com_github_google_protobuf/src/google/protobuf/util/time_util.cc:56:19: warning: unused variable 'kTimestampFormat' [-Wunused-const-variable]
static const char kTimestampFormat[] = "%E4Y-%m-%dT%H:%M:%S";
                  ^
external/com_github_google_protobuf/src/google/protobuf/util/time_util.cc:380:6: warning: unused function 'ToUint128' [-Wunused-function]
void ToUint128(const Timestamp& value, uint128* result, bool* negative) {
     ^
external/com_github_google_protobuf/src/google/protobuf/util/time_util.cc:405:6: warning: unused function 'ToTimestamp' [-Wunused-function]
void ToTimestamp(const uint128& value, bool negative, Timestamp* timestamp) {
     ^
4 warnings generated.
INFO: From Compiling external/com_github_google_protobuf/src/google/protobuf/util/internal/json_escaping.cc [for host]:
external/com_github_google_protobuf/src/google/protobuf/util/internal/json_escaping.cc:90:13: warning: unused function 'IsHighSurrogate' [-Wunused-function]
inline bool IsHighSurrogate(uint16 c) {
            ^
external/com_github_google_protobuf/src/google/protobuf/util/internal/json_escaping.cc:102:13: warning: unused function 'IsLowSurrogate' [-Wunused-function]
inline bool IsLowSurrogate(uint16 c) {
            ^
external/com_github_google_protobuf/src/google/protobuf/util/internal/json_escaping.cc:122:13: warning: unused function 'IsSupplementalCodePoint' [-Wunused-function]
inline bool IsSupplementalCodePoint(uint32 cp) {
            ^
external/com_github_google_protobuf/src/google/protobuf/util/internal/json_escaping.cc:138:15: warning: unused function 'ToCodePoint' [-Wunused-function]
inline uint32 ToCodePoint(uint16 high, uint16 low) {
              ^
4 warnings generated.
INFO: From Compiling external/com_github_google_protobuf/src/google/protobuf/generated_message_reflection.cc [for host]:
external/com_github_google_protobuf/src/google/protobuf/generated_message_reflection.cc:76:13: warning: unused function 'SupportsArenas' [-Wunused-function]
inline bool SupportsArenas(const Descriptor* descriptor) {
            ^
1 warning generated.
INFO: From Compiling external/com_github_libgit2/deps/zlib/inflate.c:
external/com_github_libgit2/deps/zlib/inflate.c:1507:61: warning: shifting a negative signed value is undefined [-Wshift-negative-value]
    if (strm == Z_NULL || strm->state == Z_NULL) return -1L << 16;
                                                        ~~~ ^
1 warning generated.
INFO: From Compiling external/com_github_google_protobuf/src/google/protobuf/util/internal/json_escaping.cc:
external/com_github_google_protobuf/src/google/protobuf/util/internal/json_escaping.cc:90:13: warning: unused function 'IsHighSurrogate' [-Wunused-function]
inline bool IsHighSurrogate(uint16 c) {
            ^
external/com_github_google_protobuf/src/google/protobuf/util/internal/json_escaping.cc:102:13: warning: unused function 'IsLowSurrogate' [-Wunused-function]
inline bool IsLowSurrogate(uint16 c) {
            ^
external/com_github_google_protobuf/src/google/protobuf/util/internal/json_escaping.cc:122:13: warning: unused function 'IsSupplementalCodePoint' [-Wunused-function]
inline bool IsSupplementalCodePoint(uint32 cp) {
            ^
external/com_github_google_protobuf/src/google/protobuf/util/internal/json_escaping.cc:138:15: warning: unused function 'ToCodePoint' [-Wunused-function]
inline uint32 ToCodePoint(uint16 high, uint16 low) {
              ^
4 warnings generated.
INFO: From Compiling external/com_github_google_protobuf/src/google/protobuf/util/internal/utility.cc:
external/com_github_google_protobuf/src/google/protobuf/util/internal/utility.cc:52:19: warning: unused function 'SkipWhiteSpace' [-Wunused-function]
const StringPiece SkipWhiteSpace(StringPiece str) {
                  ^
1 warning generated.
INFO: From Compiling external/com_github_google_protobuf/src/google/protobuf/generated_message_reflection.cc:
external/com_github_google_protobuf/src/google/protobuf/generated_message_reflection.cc:76:13: warning: unused function 'SupportsArenas' [-Wunused-function]
inline bool SupportsArenas(const Descriptor* descriptor) {
            ^
1 warning generated.
INFO: From Compiling external/com_github_google_protobuf/src/google/protobuf/util/time_util.cc:
external/com_github_google_protobuf/src/google/protobuf/util/time_util.cc:52:18: warning: unused variable 'kMicrosPerMillisecond' [-Wunused-const-variable]
static const int kMicrosPerMillisecond = 1000;
                 ^
external/com_github_google_protobuf/src/google/protobuf/util/time_util.cc:56:19: warning: unused variable 'kTimestampFormat' [-Wunused-const-variable]
static const char kTimestampFormat[] = "%E4Y-%m-%dT%H:%M:%S";
                  ^
external/com_github_google_protobuf/src/google/protobuf/util/time_util.cc:380:6: warning: unused function 'ToUint128' [-Wunused-function]
void ToUint128(const Timestamp& value, uint128* result, bool* negative) {
     ^
external/com_github_google_protobuf/src/google/protobuf/util/time_util.cc:405:6: warning: unused function 'ToTimestamp' [-Wunused-function]
void ToTimestamp(const uint128& value, bool negative, Timestamp* timestamp) {
     ^
4 warnings generated.
INFO: From Compiling external/com_github_google_protobuf/src/google/protobuf/util/internal/field_mask_utility.cc:
external/com_github_google_protobuf/src/google/protobuf/util/internal/field_mask_utility.cc:47:14: warning: unused function 'CreatePublicError' [-Wunused-function]
util::Status CreatePublicError(util::error::Code code,
             ^
1 warning generated.
INFO: From Linking external/com_googlesource_code_re2/libre2.so:
clang: warning: argument unused during compilation: '-pthread'
INFO: From Linking external/boost/libsmart_ptr.a:
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/libtool: file: bazel-out/local-fastbuild/bin/external/boost/_objs/smart_ptr/external/boost/libs/smart_ptr/src/sp_collector.pic.o has no symbols
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/libtool: file: bazel-out/local-fastbuild/bin/external/boost/_objs/smart_ptr/external/boost/libs/smart_ptr/src/sp_debug_hooks.pic.o has no symbols
warning: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/libtool: warning for library: bazel-out/local-fastbuild/bin/external/boost/libsmart_ptr.a the table of contents is empty (no object file members in the library define global symbols)
INFO: From Linking external/com_github_google_protobuf/libprotobuf_lite.a [for host]:
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/libtool: file: bazel-out/host/bin/external/com_github_google_protobuf/_objs/protobuf_lite/external/com_github_google_protobuf/src/google/protobuf/stubs/atomicops_internals_x86_gcc.o has no symbols
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/libtool: file: bazel-out/host/bin/external/com_github_google_protobuf/_objs/protobuf_lite/external/com_github_google_protobuf/src/google/protobuf/stubs/atomicops_internals_x86_msvc.o has no symbols
INFO: From Linking external/com_github_json_c/libjson.a:
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/libtool: file: bazel-out/local-fastbuild/bin/external/com_github_json_c/_objs/json/external/com_github_json_c/json-c/libjson.pic.o has no symbols
INFO: From Linking external/com_github_libgit2/liblibgit2.a:
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/libtool: file: bazel-out/local-fastbuild/bin/external/com_github_libgit2/_objs/libgit2/external/com_github_libgit2/src/stransport_stream.pic.o has no symbols
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/libtool: file: bazel-out/local-fastbuild/bin/external/com_github_libgit2/_objs/libgit2/external/com_github_libgit2/src/transports/auth_negotiate.pic.o has no symbols
INFO: From Linking external/boost/libfilesystem.a:
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/libtool: file: bazel-out/local-fastbuild/bin/external/boost/_objs/filesystem/external/boost/libs/filesystem/src/windows_file_codecvt.pic.o has no symbols
INFO: From Linking external/com_github_google_protobuf/libprotobuf.a [for host]:
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/libtool: file: bazel-out/host/bin/external/com_github_google_protobuf/_objs/protobuf/external/com_github_google_protobuf/src/google/protobuf/io/gzip_stream.o has no symbols
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/libtool: file: bazel-out/host/bin/external/com_github_google_protobuf/_objs/protobuf/external/com_github_google_protobuf/src/google/protobuf/util/internal/error_listener.o has no symbols
ERROR: /Users/joel/me/livegrep/src/BUILD:1:1: C++ compilation of rule '//src:codesearch' failed: Process exited with status 1 [sandboxed].
src/indexer.cc:71:24: warning: expression result unused [-Wunused-value]
    edges_.insert(val).first;
    ~~~~~~~~~~~~~~~~~~ ^~~~~
src/indexer.cc:261:37: error: variable length array of non-POD element type 'intrusive_ptr<IndexKey>'
        intrusive_ptr<IndexKey> keys[nrunes];
                                    ^
1 warning and 1 error generated.
Use --strategy=CppCompile=standalone to disable sandboxing for the failing actions.
INFO: Elapsed time: 685.630s, Critical Path: 130.93s

Attempting build results in "no such remote ref"

For some reason Bazel's attempt to fetch the pinned revision of com_github_nelhage_boost is resulted in an error, even though I think that ref does exist on the remote:

$ bazel build //...
ERROR: error loading package '': Encountered error while reading extension file 'boost/boost.bzl': no such package '@com_github_nelhage_boost//': Traceback (most recent call last):
        File "/home/brhodes/.cache/bazel/_bazel_brhodes/7f6357a54daf9c6f2234f2479b3b3fc6/external/bazel_tools/tools/build_defs/repo/git.bzl", line 69
                _clone_or_update(ctx)
        File "/home/brhodes/.cache/bazel/_bazel_brhodes/7f6357a54daf9c6f2234f2479b3b3fc6/external/bazel_tools/tools/build_defs/repo/git.bzl", line 44, in _clone_or_update
                fail(("error cloning %s:\n%s" % (ctx....)))
error cloning com_github_nelhage_boost:
+ cd /home/brhodes/.cache/bazel/_bazel_brhodes/7f6357a54daf9c6f2234f2479b3b3fc6/external
+ rm -rf /home/brhodes/.cache/bazel/_bazel_brhodes/7f6357a54daf9c6f2234f2479b3b3fc6/external/com_github_nelhage_boost
+ git clone --depth=1 https://github.com/nelhage/rules_boost /home/brhodes/.cache/bazel/_bazel_brhodes/7f6357a54daf9c6f2234f2479b3b3fc6/external/com_github_nelhage_boost
Cloning into '/home/brhodes/.cache/bazel/_bazel_brhodes/7f6357a54daf9c6f2234f2479b3b3fc6/external/com_github_nelhage_boost'...
+ cd /home/brhodes/.cache/bazel/_bazel_brhodes/7f6357a54daf9c6f2234f2479b3b3fc6/external/com_github_nelhage_boost
+ git reset --hard d6446dc9de6e43b039af07482a9361bdc6da5237
fatal: Could not parse object 'd6446dc9de6e43b039af07482a9361bdc6da5237'.
+ git fetch --depth=1 origin d6446dc9de6e43b039af07482a9361bdc6da5237:d6446dc9de6e43b039af07482a9361bdc6da5237
error: no such remote ref d6446dc9de6e43b039af07482a9361bdc6da5237

Filter by file type/language (as sugar for filename extensions)

Small feature that I think would be quite handy: Pretty often I want to search something like only Python files. I can type file:\.py$, which works fine but it's a little cumbersome.

A simple variation that could be handy: ext:py. This would be very simple sugar meaning the same as file:\.py$. (Or maybe "the file extension is .py", with some subtler notion of "file extension"? E.g., I think I'd say that a file named .gitignore has no extension, not extension .gitignore. But this probably doesn't matter.)

Alternatively, type:py, meaning "a file which from its name looks like a Python file". This becomes extra handy in cases like type:cpp where you want to match not only one extension cpp but also h and probably also cc, C, and maybe others. Even for Python, these days it should probably include pyi as well as py.

The main ingredient in building this feature is probably finding a good, well-maintained external list of file "types" and corresponding lists of extensions -- no sense maintaining our own here. Maybe from another code-search tool, probably a command-line one? Maybe ripgrep:

$ rg --type-list | grep ^cpp
cpp: *.C, *.H, *.cc, *.cpp, *.cxx, *.h, *.hh, *.hpp

Yep, OK, that's about four more that should be on that list and I hadn't thought of. 😛 And it's pretty actively maintained. I think ripgrep may win -- as a strawman implementation, I'd build-depend on it and just shell out to ripgrep --type-list at build time to get this list.

File/Repo case sensitivity is coupled with line case sensitivity

This is not obvious to the user, and it can be confusing.

For example, this returns no results:
https://livegrep.com/search/linux?q=Linux+project+file%3Acredits

But this does:
https://livegrep.com/search/linux?q=Linux+project+file%3ACREDITS

Option for less context and way more results

When I use grep or the like locally, I often use -C but also often don't. Skipping context is good if I have a lot of results and want to take in a comprehensive view of where there are matches and/or what the matches look like.

Currently livegrep always does the equivalent of -C3. It'd be great to have the option of -C0. (Maybe also other options like -C1 if doing so fits naturally in the UI.)

Relatedly, sometimes I want a comprehensive picture of where some pattern appears even though there are a few hundred of them. Thanks to #49, all the infrastructure is in place for the front end to offer more matches. Could be a UI widget, maybe, or perhaps just request more when it knows it's showing less context and can handle more.

Livegrep web interface dependencies not installed

After issuing make the livegrep dependencies won't get installed:

go get -t -d github.com/livegrep/livegrep/client \
                        github.com/livegrep/livegrep/server \
                        github.com/livegrep/livegrep/livegrep \
                        github.com/livegrep/livegrep/lg
flag provided but not defined: -t
usage: get [-a] [-d] [-fix] [-n] [-p n] [-u] [-v] [-x] [packages]

Go version:

$ go version
go version go1

OS: Ubuntu 12.04

support standalone `file:` queries

A query of file:foo results in an error: "You must specify a regex to match". We should allow this, and only report matching files.

Some bad content in the backend JSON is silently ignored

Specifically: revisions: "HEAD" is ignored, the correct input is: revisions: ["HEAD"]

Git annotate/blame information

This is another of our most-requested features for our internal livegrep instance at Dropbox. It's a follow-up feature to #55 -- one we haven't yet built, though we do link to Diffusion which provides it (and is just slower.)

The basic idea is, in the file browser (i.e. #55), to identify next to each line the commit that last touched it, a la git blame or svn annotate.

E.g., browse some code in Diffusion, and hit "Enable Blame" in the options box at the right. I think Diffusion does this pretty well, though it's a little slower than I'd like. Should be cacheable to make it really fast.

Some digression:

Personally, I tend to see reliance on blame information as an antipattern -- a bad heuristic for seeing who’s responsible for code, that actually becomes a cultural problem when people rely on it (and talk about relying on it) so much. Among other issues, it often finds the last person who did a refactoring or a cleanup, rather than the people who actually wrote the code — which is noise, and which moreover creates a disincentive for people to improve the codebase because of a perception that “you touch it, you own it."

But what it's really pointing at is that people want to understand the history of where the code comes from, and they don't have better tools to do that. (Git has some really excellent functionality for this in things like git log -S -- but they're not well documented or discoverable and most people don't know them.) And git blame is way better than nothing.

So it'd be great to build a high-quality code-history-exploring experience that gives everyone the same power that folks like @nelhage and me can get with a series of git log commands with a bunch of different arguments; a good fast version of this would be something I'd switch to most of the time, because a web UI could do it in a much more conveniently interlinked way than the CLI. Building the equivalent of git blame will be a good start that includes a lot of the needed underlying functionality, and I can file followup issues if/when someone builds it. :-)

gtest archive link 404

The zip for gtest-1.7.0 seems broken to me. I updated rules_protobuf to use a newer build file that depends on github instead of googlecode. If you'd like to stay with 1.7.0, the updated link is https://github.com/google/googletest/archive/release-1.7.0.zip. If you'd like to move to gtest 1.8, here's a link to the BUILD file I'm using: https://github.com/pubref/rules_protobuf/blob/master/bzl/build_file/gtest.BUILD

livegrep for documents?

I was amazed by livegrep's speed.
Has anyone used for indexing and searching documents (pdf, docx, etc.)?

Dockerfile

Is there a Dockerfile for installing livegrep?

Found this one but it's not complete:
https://github.com/livegrep/livegrep/blob/master/.circleci/Dockerfile

Uncaught TypeError: Cannot read property 'q' of undefined

Very cool project. While using search I encountered this error in the JS console:

Uncaught TypeError: Cannot read property 'q' of undefined

related to this line:

   if (this.model.search_map[this.model.get('displaying')].q === '' ||
       this.model.get('error')) {
      this.$el.hide();
      return this;
    }

Looks like you may only want to consider calling .q when search_map is not undefined. Or, you may want to change how you return data to the client.

`/` on the search results page should jump to search bar

And possibly clear it out, as well? This is a generalization of the keybinding on the file viewer page.

Unable to install node dependencies using bazel

I get this error:
ERROR: /Users/raymond/code/livegrep/web/npm/css-loader/BUILD:7:1: installing node modules from web/npm/css-loader/npm-shrinkwrap.json failed (Exit 1).
/private/var/tmp/_bazel_raymond/b8234531df81349f097554a459f6baeb/bazel-sandbox/8119839809326862884/execroot/com_github_livegrep_livegrep/bazel-out/host/bin/external/org_dropbox_rules_node/node/tools/npm/install.runfiles/nodejs/bin/node: /private/var/tmp/_bazel_raymond/b8234531df81349f097554a459f6baeb/bazel-sandbox/8119839809326862884/execroot/com_github_livegrep_livegrep/bazel-out/host/bin/external/org_dropbox_rules_node/node/tools/npm/install.runfiles/nodejs/bin/node: cannot execute binary file
Traceback (most recent call last):
File "/private/var/tmp/_bazel_raymond/b8234531df81349f097554a459f6baeb/bazel-sandbox/8119839809326862884/execroot/com_github_livegrep_livegrep/bazel-out/host/bin/external/org_dropbox_rules_node/node/tools/npm/install.runfiles/com_github_livegrep_livegrep/../org_dropbox_rules_node/node/tools/npm/install.py", line 49, in
main()
File "/private/var/tmp/_bazel_raymond/b8234531df81349f097554a459f6baeb/bazel-sandbox/8119839809326862884/execroot/com_github_livegrep_livegrep/bazel-out/host/bin/external/org_dropbox_rules_node/node/tools/npm/install.runfiles/com_github_livegrep_livegrep/../org_dropbox_rules_node/node/tools/npm/install.py", line 45, in main
npm_install(args.shrinkwrap, args.output)
File "/private/var/tmp/_bazel_raymond/b8234531df81349f097554a459f6baeb/bazel-sandbox/8119839809326862884/execroot/com_github_livegrep_livegrep/bazel-out/host/bin/external/org_dropbox_rules_node/node/tools/npm/install.runfiles/com_github_livegrep_livegrep/../org_dropbox_rules_node/node/tools/npm/install.py", line 27, in npm_install
run_npm(['install'], env=env, cwd=output)
File "/private/var/tmp/_bazel_raymond/b8234531df81349f097554a459f6baeb/bazel-sandbox/8119839809326862884/execroot/com_github_livegrep_livegrep/bazel-out/host/bin/external/org_dropbox_rules_node/node/tools/npm/install.runfiles/org_dropbox_rules_node/node/tools/npm/utils.py", line 93, in run_npm
full_cmd, env=full_env, cwd=cwd, stderr=subprocess.STDOUT
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 573, in check_output
raise CalledProcessError(retcode, cmd, output=output)
subprocess.CalledProcessError: Command '['/private/var/tmp/_bazel_raymond/b8234531df81349f097554a459f6baeb/bazel-sandbox/8119839809326862884/execroot/com_github_livegrep_livegrep/bazel-out/host/bin/external/org_dropbox_rules_node/node/tools/npm/install.runfiles/nodejs/bin/npm', 'install']' returned non-zero exit status 126
INFO: Elapsed time: 101.550s, Critical Path: 10.35s

Bazel version:
Build label: 0.5.4-homebrew
Build target: bazel-out/darwin_x86_64-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Fri Aug 25 16:54:42 2017 (1503680082)
Build timestamp: 1503680082
Build timestamp as int: 1503680082

Livegrep server won't build

I get the following error when trying to build the livegrep binary:

$ go build -o bin/livegrep ./livegrep
# _/home/b/tools/livegrep/server
server/server.go:120: method srv.ServeRoot is not an expression, must be called
server/server.go:121: method srv.ServeSearch is not an expression, must be called
server/server.go:122: method srv.ServeSearch is not an expression, must be called
server/server.go:123: method srv.ServeAbout is not an expression, must be called
server/server.go:124: method srv.ServeOpensearch is not an expression, must be called
server/server.go:126: method srv.ServeAPISearch is not an expression, must be called
server/server.go:127: method srv.ServeAPISearch is not an expression, must be called
server/server.go:131: method srv.HandleWebsocket is not an expression, must be called

Rank results by relevance

Once #51 is in, we'll have the power to conveniently rank some files above others in the results based on whatever we think will best help people find what they want.

#64 contains one idea for that (rank definitions, heuristically identified, highly). That one will be great but requires some infrastructure to make it work.

Here are a couple of other ranking features that would be easy to build:

Rank files that sound like test files to the bottom. A fairly simple filename pattern (just a little subtler than /test/i) should do a good job here.
Maybe rank files with lots of hits to the top. I'm not 100% sold on this as a good feature, but I've heard the suggestion and it seems plausible.

Bonus feature: provide the same is-this-a-test pattern as an option in queries, e.g. spelled is:test (or as people will probably use it more often, -is:test.)

problems with go while building bazel

I'm doing bazel build //... and getting:


____Downloading https://codeload.github.com/golang/tools/zip/3d92dd60033c312e3ae7cac319c792271cf67e37: 2,432,110 bytes
ERROR: /livegrep/cmd/livegrep-github-reindex/BUILD:3:1: no such package '@org_golang_x_oauth2//': failed to fetch org_go
lang_x_oauth2: 2017/11/20 12:21:21 get "golang.org/x/oauth2": found meta tag vcs.metaImport{Prefix:"golang.org/x/oauth2"
, VCS:"git", RepoRoot:"https://go.googlesource.com/oauth2"} at https://golang.org/x/oauth2?go-get=1
# cd .; git clone https://go.googlesource.com/oauth2 /root/.cache/bazel/_bazel_root/c6085061b636a555ead049ebc2588e2d/ext
ernal/org_golang_x_oauth2
Cloning into '/root/.cache/bazel/_bazel_root/c6085061b636a555ead049ebc2588e2d/external/org_golang_x_oauth2'...
fatal: unable to access 'https://go.googlesource.com/oauth2/': gnutls_handshake() failed: The TLS connection was non-pro
perly terminated.
2017/11/20 12:21:36 exit status 128
 and referenced by '//cmd/livegrep-github-reindex:go_default_library'.
ERROR: Analysis of target '//cmd/livegrep-github-reindex:livegrep-github-reindex' failed; build aborted.

file: modifier occasionally produces incorrect results

The file modifier does not always return the correct results. Try running the following queries a few times and you should see that the results differ occasionally:
https://livegrep.com/search/linux?q=fatal+file%3A%5C.tmpl
https://livegrep.com/search/linux?q=open+file%3Ath%5C.c

(By the way, thank you for livegrep!)

Using case insensitivity and searching for a non-alpha character crashes

The error is:

codesearch: src/indexer.cc:216: boost::intrusive_ptr<IndexKey> {anonymous}::CaseFoldLiteral(re2::Rune): Assertion `r >= 'a' && r <= 'z'' failed.

Pie-in-the-sky feature request: searching for definitions using ctags results

Let me first say that this possibly falls outside of what livegrep should reasonably do, but I'll throw it out there and see what you think.

It is great that livegrep can search for any text in the code, just like plain old grep, but I often have a symbol and I just want to go to the definition quickly. This can be troublesome for symbols that are common, such as open. It is arguable that the user should craft a better regex to narrow down the search, such as int.*open( file:.c, but I wonder if we can do better than that.

One possibility is to implement an indexer that uses the output of ctags and only add the tag name to the suffix array instead of entire lines. If we can maintain the line context, then users who want to search for symbols can search for it in the "tags" repo and the results will only contain definitions.

Another possibility is to implement a different backend that implements its own mechanism using the tags file for returning the results to the frontend.

Is this outside the scope of livegrep?

Unable to build project due to TypeError on npm-shrinkwrap node module

I get this error when installing node the module npm-shrinkwrap

>> bazel version     
Build label: 0.8.0- (@non-git)
Build target: bazel-out/k8-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Thu Nov 30 21:03:56 2017 (1512075836)
Build timestamp: 1512075836
Build timestamp as int: 1512075836

>> bazel build //... 
INFO: Analysed 47 targets (96 packages loaded).
INFO: Found 47 targets...
ERROR: /home/franco/codes/livegrep/web/npm/style-loader/BUILD:7:1: installing node modules from web/npm/style-loader/npm-shrinkwrap.json failed (Exit 1)
Traceback (most recent call last):
  File "/home/franco/.cache/bazel/_bazel_franco/4a7657ede77e0c03e21187f1da75c41c/bazel-sandbox/5265276621200780402/execroot/com_github_livegrep_livegrep/bazel-out/host/bin/external/org_dropbox_rules_node/node/tools/npm/install.runfiles/com_github_livegrep_livegrep/../org_dropbox_rules_node/node/tools/npm/install.py", line 49, in <module>
    main()
  File "/home/franco/.cache/bazel/_bazel_franco/4a7657ede77e0c03e21187f1da75c41c/bazel-sandbox/5265276621200780402/execroot/com_github_livegrep_livegrep/bazel-out/host/bin/external/org_dropbox_rules_node/node/tools/npm/install.runfiles/com_github_livegrep_livegrep/../org_dropbox_rules_node/node/tools/npm/install.py", line 45, in main
    npm_install(args.shrinkwrap, args.output)
  File "/home/franco/.cache/bazel/_bazel_franco/4a7657ede77e0c03e21187f1da75c41c/bazel-sandbox/5265276621200780402/execroot/com_github_livegrep_livegrep/bazel-out/host/bin/external/org_dropbox_rules_node/node/tools/npm/install.runfiles/com_github_livegrep_livegrep/../org_dropbox_rules_node/node/tools/npm/install.py", line 27, in npm_install
    run_npm(['install'], env=env, cwd=output)
  File "/home/franco/.cache/bazel/_bazel_franco/4a7657ede77e0c03e21187f1da75c41c/bazel-sandbox/5265276621200780402/execroot/com_github_livegrep_livegrep/bazel-out/host/bin/external/org_dropbox_rules_node/node/tools/npm/install.runfiles/org_dropbox_rules_node/node/tools/npm/utils.py", line 84, in run_npm
    full_env = dict(full_env.items() + env.items())
TypeError: unsupported operand type(s) for +: 'dict_items' and 'dict_items'
INFO: Elapsed time: 7.924s, Critical Path: 1.01s
FAILED: Build did NOT complete successfully

Looks like a python version issue, but couldn't find a way around it.

Bazel 0.6 build only works with deprecation flag

Building livegrep with the current stable upstream bazel:

bnewbold@bnewbold-dev$ bazel version
Build label: 0.6.1
Build target: bazel-out/local-fastbuild/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Thu Oct 5 21:54:59 2017 (1507240499)
Build timestamp: 1507240499
Build timestamp as int: 1507240499

I get:

ERROR: /1/livegrep/src/proto/BUILD:6:1: Traceback (most recent call last):
        File "/1/livegrep/src/proto/BUILD", line 6
                go_proto_library(name = "go_proto", protos = ["live..."], ...)
        File "/1/tmp/bazel/_bazel_bnewbold/d91485bda7a9c297f0372930fcc1e422/external/org_pubref_rules_protobuf/go/rules.bzl", line 97, in go_proto_library
                go_library(name = name, srcs = (srcs + [(name...")]), <2 more arguments>)
        File "/1/tmp/bazel/_bazel_bnewbold/d91485bda7a9c297f0372930fcc1e422/external/org_pubref_rules_protobuf/go/rules.bzl", line 100, in go_library
                list(set(((deps + proto_deps) + go_pr...)))
        File "/1/tmp/bazel/_bazel_bnewbold/d91485bda7a9c297f0372930fcc1e422/external/org_pubref_rules_protobuf/go/rules.bzl", line 100, in list
                set(((deps + proto_deps) + go_proto_...))
The `set` constructor for depsets is deprecated and will be removed. Please use the `depset` constructor instead. You can temporarily enable the deprecated `set` constructor by passing the flag --incompatible_disallow_set_constructor=false
ERROR: /1/livegrep/src/proto/BUILD:12:1: Traceback (most recent call last):
        File "/1/livegrep/src/proto/BUILD", line 12
                cc_proto_library(name = "cc_proto", protos = ["live..."], ...)
        File "/1/tmp/bazel/_bazel_bnewbold/d91485bda7a9c297f0372930fcc1e422/external/org_pubref_rules_protobuf/cpp/rules.bzl", line 87, in cc_proto_library
                native.cc_library(name = name, srcs = (srcs + [(name...")]), <2 more arguments>)
        File "/1/tmp/bazel/_bazel_bnewbold/d91485bda7a9c297f0372930fcc1e422/external/org_pubref_rules_protobuf/cpp/rules.bzl", line 90, in native.cc_library
                list(set(((deps + proto_deps) + compi...)))
        File "/1/tmp/bazel/_bazel_bnewbold/d91485bda7a9c297f0372930fcc1e422/external/org_pubref_rules_protobuf/cpp/rules.bzl", line 90, in list
                set(((deps + proto_deps) + compile_d...))
The `set` constructor for depsets is deprecated and will be removed. Please use the `depset` constructor instead. You can temporarily enable the deprecated `set` constructor by passing the flag --incompatible_disallow_set_constructor=false
ERROR: package contains errors: src/proto
ERROR: error loading package 'src/proto': Package 'src/proto' contains errors
INFO: Elapsed time: 0.161s
FAILED: Build did NOT complete successfully (0 packages loaded)

Passing the --incompatible_disallow_set_constructor=false flag to the bazel command does work. I don't know enough about bazel to recommend a fix, or if this is just something to be expected.

How to do multiline searches?

Single line searches are working as expected but multiline searches doesn't seem to work.
Is it supported?
For example:

"hello




world"
Regex like hello.{1,10}world

I have enabled dot_nl ( dot matches newline ) in RE2 opts.