Coder Social home page Coder Social logo

nlppln / nlppln Goto Github PK

View Code? Open in Web Editor NEW
34.0 2.0 3.0 272 KB

NLP pipeline software using common workflow language

Home Page: http://nlppln.readthedocs.io/en/latest/

License: Apache License 2.0

Python 64.31% Common Workflow Language 35.56% Dockerfile 0.13%
nlp cwl workflow pipeline text-mining

nlppln's Introduction

NLP Pipeline

Codacy Badge Build Status Documentation Status PyPI version PyPI DOI

nlppln is a python package for creating NLP pipelines using Common Workflow Language (CWL). It provides steps for (generic) NLP functionality, such as tokenization, lemmatization, and part of speech tagging, and helps users to construct workflows from these steps.

A text processing step consist of a (Python) command line tool and a CWL specification to use this tool. Most tools provided by nppln wrap existing NLP functionality. The command line tools are made with Click, a Python package for creating command line interfaces.

To create a workflow, you have to write a Python script:

from nlppln import WorkflowGenerator

with WorkflowGenerator() as wf:
  txt_dir = wf.add_input(txt_dir='Directory')

  frogout = wf.frog_dir(in_dir=txt_dir)
  saf = wf.frog_to_saf(in_files=frogout)
  ner_stats = wf.save_ner_data(in_files=saf)
  new_saf = wf.replace_ner(metadata=ner_stats, in_files=saf)
  txt = wf.saf_to_txt(in_files=new_saf)

  wf.add_outputs(ner_stats=ner_stats, txt=txt)

  wf.save('anonymize.cwl')

The resulting workflow can be run using a CWL runner, such as cwltool:

cwltool anonymize.cwl --txt_dir /path/to/directory/with/txt/files/

For creating new (e.g., project specific) NLP functionality, you can use nlppln-gen to generate boilerplate (i.e., empty) command line tools and CWL specifications.

The full documentation can be found on Read the Docs.

Installation

Install nlppln using pip:

pip install nlppln

Please check the installation guidelines for additional required software.

License

Copyright (c) 2016-2018, Netherlands eScience Center, University of Twente

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

nlppln's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

nlppln's Issues

Make CWL types constants for tab-completion

When adding inputs, it would be nice if there was a list of CWL types available in nlppln, so that I could use tab-completion when entering them, e.g.

with nlppln.WorkflowGenerator() as wf:
  something = wf.add_input(something=nlppln.cwl_type. [TAB TAB TAB]

This way, I don't have to know all the CWL types by heart.

Installation requirements failed on pyjq

Running "pip install -r requirements.txt" failed on package pyjq.
Apparently this package has its own, additional dependencies (which were not yet installed on my computer)
After running "sudo apt install -y autoconf automake build-essential libtool python-dev" installation was succesful

Give python commands meaningful name

Now all commands have (function) name 'command', perhaps it is better to give them a more meaningful name (useful for automatically generating cwl steps from click commands).

Make output directory an option for all commands

For each command, output files should be written to the current directory by default. The output directory can be changed by adding a -o option.

This is much more transparent than the current situation.

Error when running commands in Python 3

When I try running any command in Python 3.6, I get the following error:

python -m nlppln.commands.ls .
Traceback (most recent call last):
  File "/home/dafne/anaconda2/envs/adh/lib/python3.6/runpy.py", line 183, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/home/dafne/anaconda2/envs/adh/lib/python3.6/runpy.py", line 153, in _get_module_details
    code = loader.get_code(mod_name)
AttributeError: 'Py2Fixer' object has no attribute 'get_code'

How to create ocr post correction from beginning?

I am an intern in one company and they gave me this ocr post correction project but i am new about machine learning and python. Your project exactly does what i want but i am getting lost while analyzing where to start and which steps should i follow. I know this takes your time but it is very important for me to understand and implement this project. I hope you will help. Thanks.

Remove option out_dir from cwl files

Setting a different directory to write output to using the --out_dir option in the cwl specifications doesn't work. The output directory should be changed using cwltool (--outdir).

The option --out_dir in the Python commands does work. So, it shouldn't be removed from the Python commands.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.