jonathandunn / text_analytics Goto Github PK

Basic text analytics and natural language processing in Python

License: GNU General Public License v3.0

Python 100.00%

computational-linguistics natural-language-processing nlp teaching-tools text-analytics text-classification nlp-teaching

text_analytics's People

Contributors

Stargazers

Watchers

Forkers

annaviper hmsameera manueldiaz50 araceli252 fanyak debaratisj arindam-halder yazabat fasemoreakinyemi abdelouahabf mady1258 tensorady hlhnguyen rgabeflores satishkbe athithiyaraj earlpanda sssiva tararavi bsort95 yassermustfa shraddhakh prendkola jkarlhos stupidkiddy mroap nelly-elly baltazaralexis3 fuyinglin leticb peacelovingng zenosenshi xueyingzhao8 stewardwht rkoonireddy burakozturkdot alisonzhu vmnakano mbouchaqour arundavidp shiva-mss salz-schneider princedede avoutsas67 barliant sofiafigueiredosantos tianhaofu tgr2uk berserker006 cristinnebr nredondo pablobec93

text_analytics's Issues

'import text_analytics' fails with ValueError on macOS

I installed the text_analytics package based on the instructions from Lab 1.2:
pip3 install git+https://github.com/jonathandunn/text_analytics.git
and tried to run the first part of the lab code locally. It does not work, failing with a ValueError.

The following code reproduces the bug for me. I'm using an Intel mac running macOS 11.5:

#!/usr/bin/env python3
from text_analytics import text_analytics
print('Done')

A copy of the full error message is here: error.txt

TF-IDF errors

Hello, I've never been able to use the TF-IDF function. All of the other functions I've used worked fine.
When running the code -
ai.fit_tfidf(df)
I'm getting the error -

 TypeError                                 Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_20356/3821690321.py in <module>
----> 1 ai.fit_tfidf(df, non_eng = False)
      2 print("Done!")

~\odrive\slechnoGDrive\MachineLearning\TextAnalyticsCourse\text_analytics.py in fit_tfidf(self, df, min_count, non_eng)
     85 
     86                 #Get MWEs
---> 87                 self.fit_phrases(df, min_count = min_count, non_eng = non_eng)
     88                 print("Finished finding phrases.")
     89 

~\odrive\slechnoGDrive\MachineLearning\TextAnalyticsCourse\text_analytics.py in fit_phrases(self, df, min_count, non_eng)
    141                         common_terms = []
    142 
--> 143 		phrases = Phrases(
    144                                                 sentences = self.read_clean(df),
    145                                                 min_count = min_count,

TypeError: __init__() got an unexpected keyword argument 'common_terms'

I'm running the notebooks locally. Using Python version 3.9 with anaconda.
The file I'm currently testing is "Twitter_by_Country.gz"
Thanks a lot,

Sigal

edX assessment 3: fit_tfidf() fails with memory error

In assessment 3 of the edX course, fit_tfidf() failed with a memory error. There's a report from another student in the edX discussion, so I am not the only one having this problem.

import os
import pandas as pd
from text_analytics import text_analytics

ai = text_analytics()

raw_data = os.path.join(ai.data_dir, 'Twitter_by_Country.gz')
data = pd.read_csv(raw_data, index_col=0)

# Memory error happens here
ai.fit_tfidf(data)

As a work-around, I used an existing model, following the approach from lab 3.2:

# imports
import os
import pandas as pd
from text_analytics import text_analytics, load

ai_state = load('tf-idf.Twitter_by_Country')

# open file
ai = text_analytics()
ai.phrases = ai_state.phrases
ai.tfidf_vectorizer = ai_state.tfidf_vectorizer

raw_data = os.path.join(ai.data_dir, 'Twitter_by_Country.gz')
data = pd.read_csv(raw_data, index_col=0)

# classify based on country
ai.svm(data, labels='Country', features='content')

This code ran, but generated some warnings:

/usr/lib/python3.7/site-packages/sklearn/base.py:318: UserWarning: Trying to unpickle estimator CountVectorizer from version 0.23.1 when using version 0.22. This might lead to breaking code or invalid results. Use at your own risk.
  UserWarning)
/usr/lib/python3.7/site-packages/sklearn/base.py:318: UserWarning: Trying to unpickle estimator TfidfTransformer from version 0.23.1 when using version 0.22. This might lead to breaking code or invalid results. Use at your own risk.
  UserWarning)
/usr/lib/python3.7/site-packages/sklearn/base.py:318: UserWarning: Trying to unpickle estimator TfidfVectorizer from version 0.23.1 when using version 0.22. This might lead to breaking code or invalid results. Use at your own risk.
  UserWarning)

Trouble running first notebook, 1.2

I wanted to try and run the notebooks locally, so I downloaded 1.2. Accessing the Data and tried running from Jupyter that was installed with Anaconda3, but I ran into this error:

ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

I bailed on Anaconda and went for installing everything in a virtualenv:

% python --version
Python 3.9.1

% pip install jupyterlab
Collecting jupyterlab
  Downloading jupyterlab-3.1.10-py3-none-any.whl (8.5 MB)
...

% pip install textanalytics
...
tensorflow 2.6.0 requires numpy~=1.19.2, but you'll have numpy 1.21.2 which is incompatible.
numba 0.54.0 requires numpy<1.21,>=1.17, but you'll have numpy 1.21.2 which is incompatible.
...

so...

% pip install numpy==1.19.2
...
h5py 3.1.0 requires numpy>=1.19.3; python_version >= "3.9", but you'll have numpy 1.19.2 which is incompatible.

so...

 % pip install numpy==1.19.3

re-run Jupyter and the 1.2 notebook, and I'm back to:

ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

Is there a prescribed setup to use text_analytics?

Failed to load the native TensorFlow runtime.

Hi! Jonathan!

When I run the following codes:

from text_analytics import text_analytics
import os
import pandas as pd
print("Done!")

the following message shows up.

How can I solve this?

Thank you!

ImportError Traceback (most recent call last)
File /opt/homebrew/lib/python3.9/site-packages/tensorflow/python/init.py:61, in
60 sys.setdlopenflags(_default_dlopen_flags | ctypes.RTLD_GLOBAL)
---> 61 from tensorflow.python import pywrap_tensorflow
62 sys.setdlopenflags(_default_dlopen_flags)

File /opt/homebrew/lib/python3.9/site-packages/tensorflow/python/pywrap_tensorflow.py:28, in
27 return _mod
---> 28 _pywrap_tensorflow = swig_import_helper()
29 del swig_import_helper

File /opt/homebrew/lib/python3.9/site-packages/tensorflow/python/pywrap_tensorflow.py:24, in swig_import_helper()
23 try:
---> 24 _mod = imp.load_module('_pywrap_tensorflow', fp, pathname, description)
25 finally:

File /opt/homebrew/Cellar/[email protected]/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/imp.py:242, in load_module(name, file, filename, details)
241 else:
--> 242 return load_dynamic(name, filename, file)
243 elif type_ == PKG_DIRECTORY:

File /opt/homebrew/Cellar/[email protected]/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/imp.py:342, in load_dynamic(name, path, file)
340 spec = importlib.machinery.ModuleSpec(
341 name=name, loader=loader, origin=path)
--> 342 return _load(spec)

ImportError: dlopen(/opt/homebrew/lib/python3.9/site-packages/tensorflow/python/_pywrap_tensorflow.so, 0x000A): tried: '/opt/homebrew/lib/python3.9/site-packages/tensorflow/python/_pywrap_tensorflow.so' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e'))

During handling of the above exception, another exception occurred:

ImportError Traceback (most recent call last)
Input In [1], in <cell line: 1>()
----> 1 from text_analytics import text_analytics
2 import os
3 import pandas as pd

File /opt/homebrew/lib/python3.9/site-packages/text_analytics/text_analytics.py:27, in
25 from matplotlib import pyplot as plt
26 from wordcloud import WordCloud
---> 27 import tensorflow as tf
28 import spacy
30 #---------------------------------

File /opt/homebrew/lib/python3.9/site-packages/tensorflow/init.py:24, in
21 from future import print_function
23 # pylint: disable=wildcard-import
---> 24 from tensorflow.python import *
25 # pylint: enable=wildcard-import
26
27 # Lazily import the tf.contrib module. This avoids loading all of the
28 # dependencies of tf.contrib at import tensorflow time.
29 class _LazyContribLoader(object):

File /opt/homebrew/lib/python3.9/site-packages/tensorflow/python/init.py:72, in
67 except ImportError:
68 msg = """%s\n\nFailed to load the native TensorFlow runtime.\n
69 See https://github.com/tensorflow/tensorflow/blob/master/tensorflow/g3doc/get_started/os_setup.md#import_error\n
70 for some common reasons and solutions. Include the entire stack trace
71 above this error message when asking for help.""" % traceback.format_exc()
---> 72 raise ImportError(msg)
74 # Protocol buffers
75 from tensorflow.core.framework.graph_pb2 import *

ImportError: Traceback (most recent call last):
File "/opt/homebrew/lib/python3.9/site-packages/tensorflow/python/init.py", line 61, in
from tensorflow.python import pywrap_tensorflow
File "/opt/homebrew/lib/python3.9/site-packages/tensorflow/python/pywrap_tensorflow.py", line 28, in
_pywrap_tensorflow = swig_import_helper()
File "/opt/homebrew/lib/python3.9/site-packages/tensorflow/python/pywrap_tensorflow.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow', fp, pathname, description)
File "/opt/homebrew/Cellar/[email protected]/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/imp.py", line 242, in load_module
return load_dynamic(name, filename, file)
File "/opt/homebrew/Cellar/[email protected]/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/imp.py", line 342, in load_dynamic
return _load(spec)
ImportError: dlopen(/opt/homebrew/lib/python3.9/site-packages/tensorflow/python/_pywrap_tensorflow.so, 0x000A): tried: '/opt/homebrew/lib/python3.9/site-packages/tensorflow/python/_pywrap_tensorflow.so' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e'))

Failed to load the native TensorFlow runtime.

See https://github.com/tensorflow/tensorflow/blob/master/tensorflow/g3doc/get_started/os_setup.md#import_error

for some common reasons and solutions. Include the entire stack trace
above this error message when asking for help.

metaclass conflict error

Hello all,

after I run the first step of lab1.2

from text_analytics import text_analytics
import os
import pandas as pd
print("Done!")

I got the following error "TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases".

I'm new to phyton, so, how can I solve this?

In addition to that, Is there a way I can download and see the structure of data we have to work on (e.g. "Wordclouds.Business_Insider.gz")? I cannot find it in this package https://github.com/jonathandunn/text_analytics.

Thank you
Alessandra

Not able to import textanalytics in the edx course notbooks

ModuleNotFoundError: No module named 'swagger_client'

Hello:

I installed text_analytics for use within Anaconda3, but I'm running into a slight problem. I have used the import line recommended in the README, and it produces the error below. Can someone provide any insight?

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
/var/folders/6k/vnb8jggn53b911h224fkq4w80000gn/T/ipykernel_83448/2800784288.py in <module>
----> 1 from text_analytics import TextAnalytics

/opt/anaconda3/lib/python3.8/site-packages/text_analytics/__init__.py in <module>
     17 
     18 # import apis into sdk package
---> 19 from swagger_client.api.default_api import DefaultApi
     20 
     21 # import ApiClient

ModuleNotFoundError: No module named 'swagger_client'

Resources

Would it be possible to include the /resources that go with the text analytics course. Jupyter only allows downloading one by one.

jonathandunn / text_analytics Goto Github PK

text_analytics's People

Contributors

Stargazers

Watchers

Forkers

text_analytics's Issues

'import text_analytics' fails with ValueError on macOS

TF-IDF errors

edX assessment 3: fit_tfidf() fails with memory error

Trouble running first notebook, 1.2

Failed to load the native TensorFlow runtime.

Thank you!

metaclass conflict error

Not able to import textanalytics in the edx course notbooks

ModuleNotFoundError: No module named 'swagger_client'

Resources

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent