Coder Social home page Coder Social logo

anukat2015 / sentence-boundary-detection-nn Goto Github PK

View Code? Open in Web Editor NEW

This project forked from knub/sentence-boundary-detection-nn

0.0 3.0 0.0 2.14 MB

Kognetics -> Sentence Boundary Detection using Deep Neural Networks.

Makefile 0.59% Gnuplot 1.78% Shell 5.39% Python 61.94% TeX 22.86% CSS 0.53% JavaScript 3.19% HTML 3.72%

sentence-boundary-detection-nn's Introduction

Sentence Boundary Detecting using Deep Neural Networks

We try to detect sentence boundaries using deep learning. Created as part of the "Practical Applications of Multimedia Retrieval" seminar at the Hasso-Plattner-Institute, Potsdam, Germany.

Setup Demo

We build a python-based demo using caffe.

#####Prerequirements:

  1. Clone this repository
  2. Install python 2.7 including the following packages from requirements.txt

pip install requirements.txt

  1. Use the nltk downloader to download averaged_perceptron_tagger and punkt models:

python -m nltk.downloader

  1. Setup caffe, like described here
  2. Add path to the repository to your python path:

export PYTHONPATH=/path/to/sentence-boundary-detection-nn/python:$PYTHONPATH

  1. Download Google Word Vector (GoogleNews-vectors-negative300.bin.gz) from here or use directly this url and extract the result into the sentence-boundary-detection-nn/python/demo_data directory
  2. Paste your trained models into a demo data folder, for example sentence-boundary-detection-nn/python/demo_data with the following structure:
  • lexical_models : containing all pretrained models you want to use in a seperate directory. Each models needs a
    • .ini
    • .caffemodel
    • net.prototxt file.
  • text_data: containing all possible text files, which should be used as prediction input
  • audio_models: containing all pretrainied audio models, each in a seperate directory. Each needs the same files as described for lexical models
  • audio_examples: containing all audio files, which should be available during the demo. Each one in a seperate directory containing the ctm, energy and pitch files.

#####Start up

Change into the repository directory and execute, this should work right out of the box, unless you are using a custom demo_data folder:

python web_demo/web.py

Optionally you can specify the location of the word vector and the demo data. Otherwise default values are used. For further information execute:

python web_demo/web.py -h

sentence-boundary-detection-nn's People

Contributors

tabergma avatar knub avatar ricarda-schueler avatar jopyth avatar

Watchers

James Cloos avatar anukat2015 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.