Coder Social home page Coder Social logo

jonathandunn / text_analytics Goto Github PK

View Code? Open in Web Editor NEW
118.0 14.0 53.0 82.34 MB

Basic text analytics and natural language processing in Python

License: GNU General Public License v3.0

Python 100.00%
computational-linguistics natural-language-processing nlp teaching-tools text-analytics text-classification nlp-teaching

text_analytics's People

Contributors

damiansastre avatar jonathandunn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

text_analytics's Issues

'import text_analytics' fails with ValueError on macOS

I installed the text_analytics package based on the instructions from Lab 1.2:
pip3 install git+https://github.com/jonathandunn/text_analytics.git
and tried to run the first part of the lab code locally. It does not work, failing with a ValueError.

The following code reproduces the bug for me. I'm using an Intel mac running macOS 11.5:

#!/usr/bin/env python3
from text_analytics import text_analytics
print('Done')

A copy of the full error message is here: error.txt

TF-IDF errors

Hello, I've never been able to use the TF-IDF function. All of the other functions I've used worked fine.
When running the code -
ai.fit_tfidf(df)
I'm getting the error -

 TypeError                                 Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_20356/3821690321.py in <module>
----> 1 ai.fit_tfidf(df, non_eng = False)
      2 print("Done!")

~\odrive\slechnoGDrive\MachineLearning\TextAnalyticsCourse\text_analytics.py in fit_tfidf(self, df, min_count, non_eng)
     85 
     86                 #Get MWEs
---> 87                 self.fit_phrases(df, min_count = min_count, non_eng = non_eng)
     88                 print("Finished finding phrases.")
     89 

~\odrive\slechnoGDrive\MachineLearning\TextAnalyticsCourse\text_analytics.py in fit_phrases(self, df, min_count, non_eng)
    141                         common_terms = []
    142 
--> 143 		phrases = Phrases(
    144                                                 sentences = self.read_clean(df),
    145                                                 min_count = min_count,

TypeError: __init__() got an unexpected keyword argument 'common_terms'

I'm running the notebooks locally. Using Python version 3.9 with anaconda.
The file I'm currently testing is "Twitter_by_Country.gz"
Thanks a lot,

Sigal

edX assessment 3: fit_tfidf() fails with memory error

In assessment 3 of the edX course, fit_tfidf() failed with a memory error. There's a report from another student in the edX discussion, so I am not the only one having this problem.

import os
import pandas as pd
from text_analytics import text_analytics

ai = text_analytics()

raw_data = os.path.join(ai.data_dir, 'Twitter_by_Country.gz')
data = pd.read_csv(raw_data, index_col=0)

# Memory error happens here
ai.fit_tfidf(data)

As a work-around, I used an existing model, following the approach from lab 3.2:

# imports
import os
import pandas as pd
from text_analytics import text_analytics, load

ai_state = load('tf-idf.Twitter_by_Country')

# open file
ai = text_analytics()
ai.phrases = ai_state.phrases
ai.tfidf_vectorizer = ai_state.tfidf_vectorizer

raw_data = os.path.join(ai.data_dir, 'Twitter_by_Country.gz')
data = pd.read_csv(raw_data, index_col=0)

# classify based on country
ai.svm(data, labels='Country', features='content')

This code ran, but generated some warnings:

/usr/lib/python3.7/site-packages/sklearn/base.py:318: UserWarning: Trying to unpickle estimator CountVectorizer from version 0.23.1 when using version 0.22. This might lead to breaking code or invalid results. Use at your own risk.
  UserWarning)
/usr/lib/python3.7/site-packages/sklearn/base.py:318: UserWarning: Trying to unpickle estimator TfidfTransformer from version 0.23.1 when using version 0.22. This might lead to breaking code or invalid results. Use at your own risk.
  UserWarning)
/usr/lib/python3.7/site-packages/sklearn/base.py:318: UserWarning: Trying to unpickle estimator TfidfVectorizer from version 0.23.1 when using version 0.22. This might lead to breaking code or invalid results. Use at your own risk.
  UserWarning)

Trouble running first notebook, 1.2

I wanted to try and run the notebooks locally, so I downloaded 1.2. Accessing the Data and tried running from Jupyter that was installed with Anaconda3, but I ran into this error:

ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

I bailed on Anaconda and went for installing everything in a virtualenv:

% python --version
Python 3.9.1

% pip install jupyterlab
Collecting jupyterlab
  Downloading jupyterlab-3.1.10-py3-none-any.whl (8.5 MB)
...

% pip install textanalytics
...
tensorflow 2.6.0 requires numpy~=1.19.2, but you'll have numpy 1.21.2 which is incompatible.
numba 0.54.0 requires numpy<1.21,>=1.17, but you'll have numpy 1.21.2 which is incompatible.
...

so...

% pip install numpy==1.19.2
...
h5py 3.1.0 requires numpy>=1.19.3; python_version >= "3.9", but you'll have numpy 1.19.2 which is incompatible.

so...

 % pip install numpy==1.19.3

re-run Jupyter and the 1.2 notebook, and I'm back to:

ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

Is there a prescribed setup to use text_analytics?

Failed to load the native TensorFlow runtime.

Hi! Jonathan!

When I run the following codes:

from text_analytics import text_analytics
import os
import pandas as pd
print("Done!")

the following message shows up.

How can I solve this?

Thank you!

ImportError Traceback (most recent call last)
File /opt/homebrew/lib/python3.9/site-packages/tensorflow/python/init.py:61, in
60 sys.setdlopenflags(_default_dlopen_flags | ctypes.RTLD_GLOBAL)
---> 61 from tensorflow.python import pywrap_tensorflow
62 sys.setdlopenflags(_default_dlopen_flags)

File /opt/homebrew/lib/python3.9/site-packages/tensorflow/python/pywrap_tensorflow.py:28, in
27 return _mod
---> 28 _pywrap_tensorflow = swig_import_helper()
29 del swig_import_helper

File /opt/homebrew/lib/python3.9/site-packages/tensorflow/python/pywrap_tensorflow.py:24, in swig_import_helper()
23 try:
---> 24 _mod = imp.load_module('_pywrap_tensorflow', fp, pathname, description)
25 finally:

File /opt/homebrew/Cellar/[email protected]/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/imp.py:242, in load_module(name, file, filename, details)
241 else:
--> 242 return load_dynamic(name, filename, file)
243 elif type_ == PKG_DIRECTORY:

File /opt/homebrew/Cellar/[email protected]/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/imp.py:342, in load_dynamic(name, path, file)
340 spec = importlib.machinery.ModuleSpec(
341 name=name, loader=loader, origin=path)
--> 342 return _load(spec)

ImportError: dlopen(/opt/homebrew/lib/python3.9/site-packages/tensorflow/python/_pywrap_tensorflow.so, 0x000A): tried: '/opt/homebrew/lib/python3.9/site-packages/tensorflow/python/_pywrap_tensorflow.so' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e'))

During handling of the above exception, another exception occurred:

ImportError Traceback (most recent call last)
Input In [1], in <cell line: 1>()
----> 1 from text_analytics import text_analytics
2 import os
3 import pandas as pd

File /opt/homebrew/lib/python3.9/site-packages/text_analytics/text_analytics.py:27, in
25 from matplotlib import pyplot as plt
26 from wordcloud import WordCloud
---> 27 import tensorflow as tf
28 import spacy
30 #---------------------------------

File /opt/homebrew/lib/python3.9/site-packages/tensorflow/init.py:24, in
21 from future import print_function
23 # pylint: disable=wildcard-import
---> 24 from tensorflow.python import *
25 # pylint: enable=wildcard-import
26
27 # Lazily import the tf.contrib module. This avoids loading all of the
28 # dependencies of tf.contrib at import tensorflow time.
29 class _LazyContribLoader(object):

File /opt/homebrew/lib/python3.9/site-packages/tensorflow/python/init.py:72, in
67 except ImportError:
68 msg = """%s\n\nFailed to load the native TensorFlow runtime.\n
69 See https://github.com/tensorflow/tensorflow/blob/master/tensorflow/g3doc/get_started/os_setup.md#import_error\n
70 for some common reasons and solutions. Include the entire stack trace
71 above this error message when asking for help.""" % traceback.format_exc()
---> 72 raise ImportError(msg)
74 # Protocol buffers
75 from tensorflow.core.framework.graph_pb2 import *

ImportError: Traceback (most recent call last):
File "/opt/homebrew/lib/python3.9/site-packages/tensorflow/python/init.py", line 61, in
from tensorflow.python import pywrap_tensorflow
File "/opt/homebrew/lib/python3.9/site-packages/tensorflow/python/pywrap_tensorflow.py", line 28, in
_pywrap_tensorflow = swig_import_helper()
File "/opt/homebrew/lib/python3.9/site-packages/tensorflow/python/pywrap_tensorflow.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow', fp, pathname, description)
File "/opt/homebrew/Cellar/[email protected]/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/imp.py", line 242, in load_module
return load_dynamic(name, filename, file)
File "/opt/homebrew/Cellar/[email protected]/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/imp.py", line 342, in load_dynamic
return _load(spec)
ImportError: dlopen(/opt/homebrew/lib/python3.9/site-packages/tensorflow/python/_pywrap_tensorflow.so, 0x000A): tried: '/opt/homebrew/lib/python3.9/site-packages/tensorflow/python/_pywrap_tensorflow.so' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e'))

Failed to load the native TensorFlow runtime.

See https://github.com/tensorflow/tensorflow/blob/master/tensorflow/g3doc/get_started/os_setup.md#import_error

for some common reasons and solutions. Include the entire stack trace
above this error message when asking for help.

metaclass conflict error

Hello all,

after I run the first step of lab1.2

from text_analytics import text_analytics
import os
import pandas as pd
print("Done!")

I got the following error "TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases".

I'm new to phyton, so, how can I solve this?

In addition to that, Is there a way I can download and see the structure of data we have to work on (e.g. "Wordclouds.Business_Insider.gz")? I cannot find it in this package https://github.com/jonathandunn/text_analytics.

Thank you
Alessandra

ModuleNotFoundError: No module named 'swagger_client'

Hello:

I installed text_analytics for use within Anaconda3, but I'm running into a slight problem. I have used the import line recommended in the README, and it produces the error below. Can someone provide any insight?

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
/var/folders/6k/vnb8jggn53b911h224fkq4w80000gn/T/ipykernel_83448/2800784288.py in <module>
----> 1 from text_analytics import TextAnalytics

/opt/anaconda3/lib/python3.8/site-packages/text_analytics/__init__.py in <module>
     17 
     18 # import apis into sdk package
---> 19 from swagger_client.api.default_api import DefaultApi
     20 
     21 # import ApiClient

ModuleNotFoundError: No module named 'swagger_client'

Resources

Would it be possible to include the /resources that go with the text analytics course. Jupyter only allows downloading one by one.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.