Coder Social home page Coder Social logo

andreteixeira1998 / ambar Goto Github PK

View Code? Open in Web Editor NEW

This project forked from pleasemarkdarkly/ambar

0.0 0.0 0.0 55.76 MB

:mag: Ambar: Document Search Engine

Home Page: https://ambar.cloud/

License: MIT License

JavaScript 71.06% CSS 11.19% HTML 0.15% Shell 2.08% Python 13.48% Dockerfile 2.04%

ambar's Introduction

Version License

🔍 Ambar: Document Search Engine

Ambar Search

Ambar is an open-source document search engine with automated crawling, OCR, tagging and instant full-text search.

Ambar defines a new way to implement a full-text document search into yor workflow:

  • Easily deploy Ambar with a single docker-compose file
  • Perform a Google-like search through your documents and images contents
  • Ambar supports all popular document formats, performs OCR if needed
  • Tag your documents
  • Use a simple REST Api to integrate Ambar into your workflow

Install

Increase Docker's available memory

Docker Desktop

Run the following docker command

docker run -it --privileged --name rd17-ambar -v ~/docker/rd17-ambar:/root/docker -p 20080:20080 -p 20022:20022 land007/rd17-ambar:latest

Or run the following docker-compose command

#first step(Download compose yml)
wget https://raw.githubusercontent.com/land007/ambar/master/docker-compose.yml

#third step(Clear docker environment variables)
unset ${!DOCKER_*}

#the fourth step Run
docker-compose up -d
#Creating ambar_serviceapi_1 ... done
#Creating ambar_pipeline0_1  ... done
#Creating ambar_webapi_1     ... done
#Creating ambar_crawler2_1   ... done
#Creating ambar_frontend_1   ... done
#Creating ambar_node-http-proxy_1   ... done

Open your address

http://${your ip address}:20080

Enter user name admin password 1234567

Updates

  • Added build check for nvm and set the node version to 8.10
  • Consolidated container mounted volumes under Container_Data
  • Updated build to default to English
  • Added node_http_proxy to project and included in the build process
  • Changed the username of the proxy - to change look at the Dockerfile
  • Available from https://hub.docker.com/u/pleasemarkdarkly

Features

Search

Tutorial: Mastering Ambar Search Queries

  • Fuzzy Search (John~3)
  • Phrase Search ("John Smith")
  • Search By Author (author:John)
  • Search By File Path (filename:*.txt)
  • Search By Date (when: yesterday, today, lastweek, etc)
  • Search By Size (size>1M)
  • Search By Tags (tags:ocr)
  • Search As You Type
  • Supported language analyzers: English ambar_en, Russian ambar_ru, German ambar_de, Italian ambar_it, Polish ambar_pl, Chinese ambar_cn, CJK ambar_cjk

Crawling

Ambar 2.0 only supports local fs crawling, if you need to crawl an SMB share of an FTP location - just mount it using standard linux tools. Crawling is automatic, no schedule is needed since the crawler monitors fs events and automatically processes new files.

Content Extraction

  • Ambar supports large files (>30MB)
  • ZIP archives
  • Mail archives (PST)
  • MS Office documents (Word, Excel, Powerpoint, Visio, Publisher)
  • OCR over images
  • Email messages with attachments
  • Adobe PDF (with OCR)
  • OCR languages: Eng, Rus, Ita, Deu, Fra, Spa, Pl, Nld
  • OpenOffice documents
  • RTF, Plaintext
  • HTML / XHTML
  • Multithread processing

Installation

Notice: Ambar requires Docker to run, it can't run w/o Docker

You can build Docker images by yourself or buy prebuilt Docker images for $50 here.

  • The installation instruction for prebuilt images can be found here
  • Tutorial on how to build images from scratch see below

If you want to see how Ambar works w/o installing it, try our live demo. No signup required.

Building the images yourself

All of the images required to run Ambar can be built by the user. In general, each image can be built by navigating into the directory of the component in question, performing any compilation steps required, then building the image like so:

# From project root
$ cd FrontEnd
$ docker build . -t <image_name>

The resulting image can be referred to by the name specified, and run by the containerization tooling of your choice.

In order to use a local Dockerfile with docker-compose, simply change the image option to build, setting the value to the relative path of the directory containing the dockerfile. Then run docker-compose build to build the relevant images. For example:

# docker-compose.yml from project root, referencing local dockerfiles
pipeline0:
  build: ./Pipeline/
image: chazu/ambar-pipeline
  localcrawler:
    image: ./LocalCrawler/

Note that some of the components require compilation or other build steps be performed on the host before the docker images can be built. For example, FrontEnd:

# Assuming a suitable version of node.js is installed (docker uses 8.10)
$ npm install
$ npm run compile

FAQ

Is it open-source?

Yes, it's fully open-source.

Is it free?

Yes, it is forever free and open-source.

Does it perform OCR?

Yes, it performs OCR on images (jpg, tiff, bmp, etc) and PDF's. OCR is perfomed by well-known open-source library Tesseract. We tuned it to achieve best perfomance and quality on scanned documents. You can easily find all files on which OCR was perfomed with tags:ocr query

Which languages are supported for OCR?

Supported languages: Eng, Rus, Ita, Deu, Fra, Spa, Pl, Nld. If you miss your language please contact us on [email protected].

Does it support tagging?

Yes!

What about searching in PDF?

Yes, it can search through any PDF, even badly encoded or with scans inside. We did our best to make search over any kind of pdf document smooth.

What is the maximum file size it can handle?

It's limited by amount of RAM on your machine, typically it's 500MB. It's an awesome result, as typical document managment systems offer 30MB maximum file size to be processed.

I have a problem what should I do?

Request a dedicated support session by mailing us on [email protected]

Sponsors

Change Log

Change Log

Privacy Policy

Privacy Policy

License

MIT License

ambar's People

Contributors

bkanuka avatar chazu avatar fr2019 avatar isido993 avatar land007 avatar pleasemarkdarkly avatar sgarwood avatar sochix avatar temberature avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.