Coder Social home page Coder Social logo

smartdataanalytics / horus-ner Goto Github PK

View Code? Open in Web Editor NEW
51.0 10.0 5.0 971.43 MB

HORUS: A framework to boost NLP tasks

License: Apache License 2.0

C++ 0.69% Python 97.91% PLpgSQL 0.58% HTML 0.81%
ner named-entity-recognition microblog twitter noise information-retrieval machine-learning computer-vision text-mining horus

horus-ner's People

Contributors

diegoesteves avatar duartejulio avatar geraltofrivia avatar hady44 avatar khattaksaad avatar nilesh-c avatar pijusch avatar rafaelperes avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

horus-ner's Issues

Installation: conda env

I executed: conda env create -f environment.yml

And I get:

Traceback (most recent call last):
  File "/home/rspeck/anaconda2/lib/python2.7/site-packages/conda/exceptions.py", line 626, in conda_exception_handler
    return_value = func(*args, **kwargs)
  File "/home/rspeck/anaconda2/lib/python2.7/site-packages/conda_env/cli/main_create.py", line 78, in execute
    directory=os.getcwd())
  File "/home/rspeck/anaconda2/lib/python2.7/site-packages/conda_env/specs/__init__.py", line 23, in detect
    raise SpecNotFound(build_message(specs))
SpecNotFound: Runtime error: Can't process without a name
Conda Env Exception: environment.yml file not found
There is no requirements.txt

Some hints.

Index Flickr

horus/core/search_engines.py and horus/core/service.py files.

Stanford NER

  • check preliminar results using Stanford NER (to integrate in horus_matrix after merge twitter NLP)

Architecture

  • architecture does not support package installation, check that for portability and horus as a service

d_theta and y' (correction): final annotation function()

sentence: coca/NOUN/NN/LOC cola/NOUN/NN/LOC has/VERB/VBZ/0 a/DET/DT/0 strange/ADJ/JJ/0 flavor/NOUN/NN/LOC

  • distance_theta: should not annotate flavor as LOC, once dtheta is = 1 and flavor got 2,2,0, thus dtheta = 0!

  • coca coca should be updated, once the compound returned correctly 2,3,0 with high bias to not(LOC) => -40 !!!

These errors are propagating and directly impacting the performance measures. This sentence, for instance, should have 100% accuracy and is getting 0 ๐Ÿ‘Ž

POS encoding

Each POS tag model may have a different tag set, which makes the encoder fails if an unseen POS tag is used as feature, which is obvious. Technically the solution is simple (vector containing all possibilities), but can lead to a worse predictor, once you increase (unnecessarily) the number of dimensions without adding extra value. Have to check that carefully later.

Issues on a fresh clone

I tried to clone the repository and I have some problems getting it up and running.

Initially, I had the horus_dist.ini to ~/horus.ini copied and set the parameters.

Next, I tried to install the module with:
python setup.py install --record files.txt
And it failed with the following error message:
error: package directory 'src/horus/sift' does not exist

The next thing I tried was to run the source code directly without having the whole module installed. I had two issues:

  1. The horus.ini file that I copied from the src/horus/resource/horus_dist.init has some missing keys. I just added added an empty key for those keys but I'm not sure if that's going to resolve all the issues or not.

  2. I did not had the database in advance. I tried to create the database with ./src/horus/components/util/script_db.py. The db model in this file is obsolete. I tried to run ./src/horus/components/webservice/rest.py and the model needed a some columns that did not exist in my database.

CRF layer performance

Compare standard feature function x dynamic feature buider. There is still a 0.1 dif in F1. Check why! Notebook src/training/notebooks/horus_v1/03-horus-training-ner-crf.ipynb

Cache

  • At some point we could also caching the visual features (extractors) in order to speed up the pipeline

Noun Phrase Parsing

Some basic sequences like {[Tom Jobim] was born in [Rio de Janeiro]} had just partial annotations e.g. {[Tom Jobim] was born in Rio de Janeiro}.

*.none

error on saving some images leads to *.None extension

Dimensionality Reduction

What's the reasonable threshold we can get when applying PCA/SVD techniques in order to improve the processing time of HORUS (CV mod) ? Getting a reasonable approximation of the current data without having to store everything might be a valid workaround...

Index Wikipedia

horus/core/search_engines.py and horus/core/service.py files.

Last Features Checking

media_mod1 0.58243437863
media_mod2 0.620822320117 --> RandomForest / 20 estimators
media_mod3 0.520944537582
media_mod4 0.547179384203
media_mod5 0.646284496886 --> ensemble.Voting (1,2,3)
media_mod6 0.527850685331
media_mod7 0.462354638825

Search Engine (others)

Integrate as many datasets as possible
e.g.: MIRFLICKR
then use word2vec + stemmer + wordnet + possible further similarity functions to obtain related terms and minimize the sparsity of the main dataset

HORUS_MODEL_EXP_000 - Baseline

  • baseline experiment (no extra features)
  • ritter dataset (test)
  • no tuning parameters, max always wins (greedy strategy)
  • stanford POS tagger (no optimised solution)

HORUS_MODEL_EXP_003_POS_Tagger

New Experimental Idea:

  • Try to minimize the error propagation (POS Tagger) by considering 2 (or more) sequence of POS annotations.
  • Obtain a new set of NOUNS Ns

If the the hypothesis 3 is correct, then it will leverage the overall classification.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.