Coder Social home page Coder Social logo

ezpaarse-project / ezpaarse Goto Github PK

View Code? Open in Web Editor NEW
68.0 32.0 27.0 44.95 MB

ezPAARSE can ingest your (proxy) log files and show how users access suscribed electronic ressources.

Home Page: http://www.ezpaarse.org

License: Other

JavaScript 66.61% Shell 3.89% Makefile 0.91% HTML 0.53% Dockerfile 0.14% Vue 27.01% EJS 0.90%
ezpaarse javascript logfile nodejs docker

ezpaarse's Introduction

ezPAARSE

Build Status Dependencies Status bitHound Overall Score Docker stars Docker Pulls Ezpaarse tweeter

ezPAARSE is an open-source software that can ingest your (proxy) log files and show how users access subscribed electronic resources. It filters, extracts and enriches the consultation events that were spotted and produces a CSV file following COUNTER codes of practice. This document describes how to install and run ezPAARSE on your computer.

Moreover, have a look to the ezpaarse demo, it will show you a nice user interface where you can register and test to process your own proxy logs.

Built-in proxies supported log formats are: ezproxy, bibliopam, and squid

Table of content

Recommended system requirements

  • a linux box or VM (eg: Ubuntu)
  • 50Gb disk space (to be adjusted, depending on the quantity and size of logfiles to be simultaneously processed)
  • 2 cores of CPU
  • 2 to 4 Gb of RAM space

Prerequisites

The tools you need to let ezPAARSE run are:

  • Linux OS: See the prerequisites for those OSes
  • Standard Linux tools: bash, make, grep, sed ...
  • python
  • gcc and g++
  • curl (used by nvm)
  • git >= 1.7.10 (required to clone ezpaarse from github and to keep your ezpaarse copy up to date)
  • MongoDB >= 3.2

ezPAARSE then comes with all the elements it needs to run. When the prerequesites are met, you can run the make command (see below) that will run all installation steps.

Installation quickstart

If you are a Windows user, you can install ezPAARSE on your computer as a docker image. Please refer to the docker section below.

To install the latest stable version of ezPAARSE on a Unix-type system, open a terminal and type:

git clone https://github.com/ezpaarse-project/ezpaarse.git
cd ezpaarse
git checkout `git describe --tags --abbrev=0`
make

If you want to install the version in development (unstable), open a terminal and type:

git clone https://github.com/ezpaarse-project/ezpaarse.git
cd ezpaarse
make

Test the installation

This step allows you to validate that your install is working.

make start
make test

Usage

Anonymised example logfiles are made available in the repositories of ezPAARSE.

You need to make sure that ezPAARSE is started. To do so, type the following command:

make start

If you are not computer-savvy, the easiest way to work with ezPAARSE is to use its HTML form, accessible from your favorite webbrowser and open the following URL: http://localhost:59599/

If you are computer-savvy, you can use an HTTP client (like curl) to send a logfile (for this example, we will use ./test/dataset/sd.2012-11-30.300.log) to ezPAARSE's Web service and get a CSV stream of consultation events as a response.

curl -X POST http://127.0.0.1:59599 \
             -v --proxy "" --no-buffer \
             --data-binary @./test/dataset/sd.2012-11-30.300.log

Or you can use the command ./bin/loginjector ezPAARSE provides you with to send the logfile to the web service in a simpler way:

. ./bin/env
cat ./test/dataset/sd.2012-11-30.300.log | ./bin/loginjector

You can also see quick countings on your data if you add the command ./bin/csvtotalizer at the end of the command line. Doing so, you will get an overview of the consultation events extracted from your logs by ezPAARSE:

. ./bin/env
cat ./test/dataset/sd.2012-11-30.300.log | ./bin/loginjector | ./bin/csvtotalizer

To stop ezPAARSE, you have to type the following command:

make stop

Go further

To go further, you can consult the full documentation

Advanced parameters

The default ezPAARSE parameters can be found in the config.json file. All these parameters can be changed. A good practice is to define a new file called config.local.json containing just the parameters you need to override.

For example, to change the ezPAARSE listening port (59599 by default), you can override the EZPAARSE_NODEJS_PORT by defining a new config.local.json file this way:

{
  "EZPAARSE_NODEJS_PORT": 45000
}

Use with docker

ezPAARSE is available as a docker image.

You need:

Then, you can run the dockerized ezpaarse this way:

mkdir ezpaarse/
wget --no-check-certificate https://raw.githubusercontent.com/ezpaarse-project/ezpaarse/master/docker-compose.yml
test -f config.local.json || echo '{}' > config.local.json

# compose v1
docker-compose pull
docker-compose up -d

# compose v2
docker compose pull
docker compose up -d

Then ezpaarse is available at this URL: http://127.0.0.1:59599

To have a look to the ezpaarse system logs, you can run: docker logs -f ezpaarse

ezpaarse's People

Contributors

aloukili avatar cecifabry avatar claussi avatar crow-eh avatar ctgraham avatar dominique-r avatar dzoladz avatar felixleo22 avatar francksylvie avatar fredericinist avatar gitter-badger avatar joemontibello avatar kerphi avatar loukil avatar mberkowski avatar nojhamster avatar oxypomme avatar pseudom avatar scooper54 avatar tjouneau avatar tporquet avatar wilmouths avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ezpaarse's Issues

Install error on a new lubuntu system

Trying to install ezpaarse based on the Readme instructions, I encounter an error message :

ln: impossible de créer le lien symbolique '/home/ezpaarse/test/ezpaarse/build/nvm/bin/latest': Aucun fichier ou dossier de ce type
Makefile:98 : la recette pour la cible « nodejs » a échouée

It can be fixed by manually running the command:

mkdir build/nvm/bin/

from the install directory but it should be done by the makefile. Sorry for just opening the issue and not submitting a patch but my Makefile skills are below the required level :)

Use the first N lines to test the log format

As of now, only the first line is used to test the log format. But sometimes, the very first line can contain anomalies for some reason, causing the job to fail. ezPAARSE should try to parse the N first lines. This number would be customizable via a header (something like ezPAARSE-Max-Parse-Fails) with a default value around 10-20.

Docker builds failing

Docker builds are failing at Docker Hub, breaking docker-compose.yml which is pointing to an image that never built.

Travis build fails since node v6.6.0

Travis didn't build the commit dedicated to the migration to node 6, so it's not clear if the problem has been introduced by node 6 or by the modules update. I can't reproduce the problem on my machine, so it's seems to be specific to the Travis environment.

Graylog geolocation question

Hi,
thanks for this great tool, very useful!
We use Graylog https://www.graylog.org to visualize and ingest the date produced by ezpaarse.
Is it possible to specify format and fields for geolocation ?
Graylog produce natively a map widget based on latitude,longitude but it must be formatted on an unique field containing "latitude" + "," + "longitude"
It is possible to do this ?
Thanks in advance !

Johan
Université de Rennes 1

Error not immediatly fired on client-side

When the format of the first line can't be recognized, the server ends the response but the client doesn't do anything, and the progress bar freezes at its current progress.

The other errors are correctly handled on client side, probably because the response is immediate in those cases. In case of unknown format, the request has to be read a bit before the error can be sent.

4003 : Line format was not recognized

Hi,
While processing the sample file available for download. I am getting the below error:

4003 : Line format was not recognized

Can you kindly assist. Thank you.

In case of unknown format, 1/2 requests freeze

When sending a file with an unknown format several times in a row, one request over two behaves correctly, and the other just freeze and doesn't produce any output on server side. But closing ezpaarse does terminate the request.

Reproduced in chrome and firefox.

The admin.md doc is partly obsolete

the routes described in the Knowledge bases management section have changed
/parsers/status
/pkb/status lead to 404 or "Cannot PUT /pkb/status" and "Cannot PUT /parsers/status" messages.

There is now only one route available : /platforms/status on which we can GET or POST.
For example :
'curl -X GET -u "admin:password" http://localhost:59599/platforms/status'
yields the following response :
{"current":"b87ecaa","head":"b87ecaa","tag":"","from-head":"uptodate","from-tag":"uptodate","local-commits":false,"local-changes":false}

disociate ezpaarse unittest from ezpaarse-platform unittest

So it could be easier to have quick feedback when a parser is changed or when a pkb is added or updated.
Moreover, ezpaarse unittest are huge and a refactoring like that could be a small step to improve the lisibility and usability of the unittest.

slow git commit

Because of the lint hook, each git commit is very slow (about 5 seconds on my machine)
I like to commit lot of atomic things but this slow step makes you want to commit only once a lot of things into the same commit.

Moreover, it checks also none ezpaarse code, for example platform one (which is not link checked on its side) so it's frustrating to be not allowed to abort its own commit cause someone else did not respect lint in the platforme code.

Optimize the paquet handling in the crossref middleware

Currently, we browse the crossref results once for each EC until we find a matching DOI, which means that we loop paquetSize / 2 times in average.

A better implementation would be to iterate once over the results to make a DOI<->Result map, and then iterate once over the ECs and use this map to merge the results. This would greatly decrease the number of loops.

Support for R5

Are there any plans for ezpaarse to support the new R5 spec ?

French Rtype fields.json descriptions in English

Currently the rtypes from fields.json are pulled into the ezPaarse documentation. This is great and I support this mirroring, but I'm wondering if we can update the rtype descriptions in fields.json to support English descriptions. This would make it easier for English speaking users of ezPaarse. These are basic descriptions of the rtype fields, I may be able to get away with translating these programmatically and creating a pull request. Should I leave in the French rtype descriptions as well, or just convert it to English? I would like to find a way to describe rtypes in fields.json in both English and French if possible, unless it would make the fields.json file difficult to read.

Can it use system node?

This project downloads nvm (?) and uses node v8.6.0.

Is this a hard dependency, or can it run on more recent versions of node?
If it will work on more recent versions of node, how to configure this?

I see relevant-looking stuff in config.json and bin/buildnode but would appreciate advice.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.