Coder Social home page Coder Social logo

document-indexer's Introduction

Document indexer

A simple way to document all your development related files into one elastic search instance, which can be searched through a set of command line utilities.

Installation

Install the haskell platform

Debian based:

apt-get install haskell-platform

Nix based:

nix-env -i -A nixpkgs.haskellPlatform

Add ~/.cabal/bin to your path

echo 'export PATH="$HOME/.cabal/bin/:$PATH' >> ~/.bash_profile

or

echo 'export PATH="$HOME/.cabal/bin/:$PATH' >> ~/.zshenv

or

echo 'export PATH="$HOME/.cabal/bin/:$PATH' >> ~/.shrc

Run the following commands in the build directory:

cabal configure cabal build cabal install

Install elasticsearch*

nix-env -i -A nixpkgs.elasticsearch

Start the elasticsearch instance (in the source directory, in the non beta version it will be a system service):

start.sh

And initialize it:

init_elasticsearch_document_indexer

Usage

Index all your manpages:

-N 8 runs it on 8 cores, it builds them in parallel.

index_man_pages /usr/share/man /home/eklerks/.nix-profile/share/man /usr/local/share/man/man1 +RTS -N8

Search through the content of the manpages:

search_man_pages unsigned long *num_items_return

It will give this back as result:

Title                                                        Section                Highlighting
XFilterEvent                                                 3                    , **Window** Specifies the **event** to filter. Specifies the **window** for which the filter is to be
XButtonEvent                                                 3                     request */         Display *display;       /* Display the **event** was read from */         **Window** **window**
XMotionEvent                                                 3                     request */         Display *display;       /* Display the **event** was read from */         **Window** **window**
XKeyEvent                                                    3                     request */         Display *display;       /* Display the **event** was read from */         **Window** **window**
XSelectInput                                                 3                    , unless the do_not_propagate mask prohibits it.  Setting the **event**-mask attribute of a **window** over
XCrossingEvent                                               3                    ;       /* Display the **event** was read from */         **Window** **window**;  /* ``**event**'' **window** reported
XDestroySubwindows                                           3                     DestroyNotify **event** for each **window**.  The **window** should never be referenced again.  If the
XReparentWindow                                              3                     override_redirect member returned in this **event** is set to the window's correspond- ing attribute.  **Window** manager
XGravityEvent                                                3                     */         **Window** **event**;         **Window** **window**;         int x, y; } XGravityEvent; When you receive this **event**

Index the tags and the content of your projects:

index_project_dir ~/sources/my-project my-project

Then search throught the tags:

search_source_files --tags void initialize_window

or search for a file:

search_sources_files --files window.py

or through the content:

search_source_files --content fuck

Keep a list of your sources for rebuild

(Not working yet)

You can create a conf file with all your sources you want to have in your index, so you can easily reupdate them.

# Add the nix store path [nix]

nixpath = ["/nix"]

# add manpages [man]

man = ["/usr/share/man", "/home/eklerks/.nix-profile/share/man"]

# Add projects, the label is the project name [projects]

contlib = ["/home/eklerks/sources/sanoma/content-library"] home-conf = ["/home/eklerks/sources/vim-zsh-vimperator-xmonad-configuration"]

Creating your own searchers

It is quite easy to create your own searches, like search_man_pages. See SearchAllManPages for an example. For building a query you can refer to the elasticsearch documentation. You can alter your elasticsearch instance by changing Init.hs

Errata

There is still some stuff not working correctly. I want to search throught the nix store, but I have to decide how to analyze the nix store dir and how to store it, so it is useful.

When things change the index has to be rebuilt. Haven't got a update strategy yet. Path is pretty unique, so I should go for that.

The manpages titles are not searched. This is not a big issue, because the title is included in the man page.

When source files are search, all tags in the file are returned, we only want to show the relevant tags.

More analyze strategies are needed for source files. Comments should be recognized and indexed. Types should be calculated for static languages. Maybe vulnerabilities should be searched and indexed.

The rebuild commando is not working at the moment.

For haskell files are not taggable. Hasktags should be installed for that. I also would like to switch to excuberant ctags, but the tags file is more complex to parse than the etags format. So I haven't done that yet.

document-indexer's People

Contributors

edgarklerks avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.