Coder Social home page Coder Social logo

svanteschubert / odf-re-isearch Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 113.91 MB

This project provides Re-ISearch with an ODF adapter (based on ODFDOM from the ODF Toolkit)

License: Apache License 2.0

Java 83.27% Python 0.12% HTML 8.97% XSLT 0.12% Shell 0.08% Perl 4.16% CSS 0.17% JavaScript 3.13%

odf-re-isearch's Introduction

ODF Re-ISearch

Purpose

Based on ODFDOM of the ODF Toolkit the ODFDOM JAR was altered to an extractor of relevant data for Re-ISearch lead by Edward Zimmermann.

Build

Build the ODFDOM module via mvn clean install using Maven and JDK >=9 (JDK 8 build still shows problems with dependencies). Sucessfully tested with JDK LTS 11 and 17. (You may also compile of Java into a binary using GraalVM on Linux. This was successfully tested for Linux without any visible performance gain nor loss).

Usage

Test the JAR by command line without parameter

java -jar odfdom/target/odfdom-search-1.0.0-jar-with-dependencies.jar

will return something like:

Re-ISearch ODF extractor (build 2022-09-30T23:10:18)
from https://github.com/svanteschubert/odf-re-isearch supporting ODF 1.2

Run the JAR by command line with an ODT as parameter

java -jar odfdom/target/odfdom-search-1.0.0-jar-with-dependencies.jar <ODT_PATH>

For example by using as ODT the URL to the OASIS ODF 1.3 specification https://docs.oasis-open.org/office/OpenDocument/v1.3/os/part3-schema/OpenDocument-v1.3-os-part3-schema.odt will return all relevant search data to standard out. Piped into a file the data will be usable by the re-ISearch engine.

Installing Re-ISearch for ODT (using this JAR as bundled Plugin) for Linux

The search engine Re-ISearch is bundling the deliverable of this repository as a plugin to extract search information from OpenDocument Test documents. Here is a description how to install Re-ISearch on your Linux machine and use this ODT plugin to search within ODT documents.

  1. You need to copy the Re-Isearch sources via
git clone https://github.com/re-Isearch/re-Isearch

There is a helpful INSTALLATION help file for Re-ISearch engine. 2. It is suggested to add manual the directory for the plugin, which is assumed by default and make it write accessible: /opt/nonmonotonic/ib/lib/plugins/ 3. The script <RE_ISEARCH_ROOT>/bin/odt-search still needs to be added to the $PATH to be able to be found by ISearch (known usability issue) 4. In the <RE_ISEARCH_ROOT>/build directory build the search-engine (for complication look INSTALLATION cheat file or full handbook, e.g.

cd re-Isearch/build
make -j4
  1. In the <RE_ISEARCH_ROOT>/build directory build the plugins:
make plugins -j4
  1. As the plugin is new and not being taken by default yet - choose it explicitly for indexing via:
../bin/Iindex -d <INDEX_DIRECTORY> -recursive -t odt2: -include "*.odt"  <ODT_DIR_INPUT_PATH>
  1. The validness of the new index and the index structure can be checked via
../bin/Iutil -d <INDEX_DIRECTORY> -vf
  1. Finally, any search can be executed, for example returning all sentences containing the <SEARCH_STRING>
../bin/Isearch -show -d <INDEX_DIRECTORY> -P PAGE\PARAGRAPH\SENTENCE <SEARCH_STRING>

Known Issue(s) of Re-ISearch

  • The ./bin/odt-search script has to be add explicitly to the user's PATH variable (and there is still an error message if you do)

Support

This project was funded through the NGI0 Discovery Fund, a fund established by NLnet with financial support from the European Commission's Next Generation Internet programme, under the aegis of DG Communications Networks, Content and Technology under grant agreement No 825322.

NLnet Foundation NGI0 Search     EU

odf-re-isearch's People

Contributors

aadrian avatar dependabot[bot] avatar georgfuechsle avatar gtache avatar heikostudt avatar mistmist avatar sebkur avatar sergey-s-betke avatar smehrbrodt avatar snyk-bot avatar svanteschubert avatar thorstenb avatar uwettc avatar wetneb avatar wglas85 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.