Coder Social home page Coder Social logo

emdaniels / character-extraction Goto Github PK

View Code? Open in Web Editor NEW
49.0 10.0 4.0 709 KB

Extracts character names from a text file and performs analysis of text sentences containing the names.

Python 100.00%
character natural-language-processing nltk analysis character-extraction gutenberg

character-extraction's Introduction

Character Extraction

The purpose of this program is to extract the names of fictional characters from a novel and analyze the sentences the characters appear in or are referenced in within the text in order to build a profile containing data specific to each character. It was created using the 32-bit version of Python 2.7 with the Natural Language Toolkit 2.0.4 and Pattern 2.6 libraries.

To change the book to be analyzed, add the book as a text file to the same file directory as the program, change the name of the text file on line 25 of the file and rerun the program. You can also have the book file in a different directory and reference the file path to the book instead.

References

Oliver Twist

This and all associated files of various formats will be found in: http://www.gutenberg.org/7/3/730/

Produced by Peggy Gaugy and Leigh Little. HTML version by Al Haines. This eBook is for the use of anyone anywhere at no cost and with almost no restrictions whatsoever. You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included with this eBook or online at www.gutenberg.net

NLTK

Bird, Steven, Edward Loper and Ewan Klein (2009). Natural Language Processing with Python. O'Reilly Media Inc.

NLTK -- the Natural Language Toolkit -- is a suite of open source Python modules, data sets and tutorials supporting research and development in Natural Language Processing.

NLTK source code is distributed under the Apache 2.0 License. NLTK documentation is distributed under the Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States license. NLTK corpora are provided under the terms given in the README file for each corpus; all are redistributable, and available for non-commercial use. NLTK may be freely redistributed, subject to the provisions of these licenses.

https://github.com/nltk/nltk/blob/develop/LICENSE.txt

Pattern

De Smedt, T., Daelemans, W. (2012). Pattern for Python. Journal of Machine Learning Research, 13, 2031–2035.

Pattern is a web mining module for Python. It has tools for data mining (web services for Google, Twitter and Wikipedia, web crawler, HTML DOM parser), natural language processing (part-of-speech taggers, n-gram search, sentiment analysis, WordNet), machine learning (vector space model, clustering, classification using KNN, SVM, Perceptron) and network analysis (graph centrality and visualization). It is well documented and bundled with 50+ examples and 350+ unit tests. The source code is licensed under BSD and available from http://www.clips.ua.ac.be/pages/pattern.

https://github.com/clips/pattern/blob/master/README.txt

License

Copyright 2014-2015 Emily Daniels

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

character-extraction's People

Contributors

emdaniels avatar endolith avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

character-extraction's Issues

Download resources automatically?

λ python characterExtraction.py
Traceback (most recent call last):
  File "C:\Users\endolith\Documents\Engineering documents\Machine learning neural networks\Language models\character-extraction\characterExtraction.py", line 195, in <module>
    chunkedSentences = chunkSentences(text)
  File "C:\Users\endolith\Documents\Engineering documents\Machine learning neural networks\Language models\character-extraction\characterExtraction.py", line 44, in chunkSentences
    chunkedSentences = nltk.ne_chunk_sents(taggedSentences, binary=True)
  File "C:\Users\endolith\anaconda3\lib\site-packages\nltk\chunk\__init__.py", line 196, in ne_chunk_sents
    chunker = load(chunker_pickle)
  File "C:\Users\endolith\anaconda3\lib\site-packages\nltk\data.py", line 750, in load
    opened_resource = _open(resource_url)
  File "C:\Users\endolith\anaconda3\lib\site-packages\nltk\data.py", line 876, in _open
    return find(path_, path + [""]).open()
  File "C:\Users\endolith\anaconda3\lib\site-packages\nltk\data.py", line 583, in find
    raise LookupError(resource_not_found)
LookupError:
**********************************************************************
  Resource maxent_ne_chunker not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('maxent_ne_chunker')

  For more information see: https://www.nltk.org/data.html

  Attempted to load chunkers/maxent_ne_chunker/english_ace_binary.pickle

  Searched in:
    - 'C:\\Users\\endolith/nltk_data'
    - 'C:\\Users\\endolith\\anaconda3\\nltk_data'
    - 'C:\\Users\\endolith\\anaconda3\\share\\nltk_data'
    - 'C:\\Users\\endolith\\anaconda3\\lib\\nltk_data'
    - 'C:\\Users\\endolith\\AppData\\Roaming\\nltk_data'
    - 'C:\\nltk_data'
    - 'D:\\nltk_data'
    - 'E:\\nltk_data'
    - ''
**********************************************************************

says the same for several resources and I have to install them, restart the script, get another error, etc.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.