jonathandunn / text_analytics Goto Github PK
View Code? Open in Web Editor NEWBasic text analytics and natural language processing in Python
License: GNU General Public License v3.0
Basic text analytics and natural language processing in Python
License: GNU General Public License v3.0
I installed the text_analytics package based on the instructions from Lab 1.2:
pip3 install git+https://github.com/jonathandunn/text_analytics.git
and tried to run the first part of the lab code locally. It does not work, failing with a ValueError.
The following code reproduces the bug for me. I'm using an Intel mac running macOS 11.5:
#!/usr/bin/env python3
from text_analytics import text_analytics
print('Done')
A copy of the full error message is here: error.txt
Hello, I've never been able to use the TF-IDF function. All of the other functions I've used worked fine.
When running the code -
ai.fit_tfidf(df)
I'm getting the error -
TypeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_20356/3821690321.py in <module>
----> 1 ai.fit_tfidf(df, non_eng = False)
2 print("Done!")
~\odrive\slechnoGDrive\MachineLearning\TextAnalyticsCourse\text_analytics.py in fit_tfidf(self, df, min_count, non_eng)
85
86 #Get MWEs
---> 87 self.fit_phrases(df, min_count = min_count, non_eng = non_eng)
88 print("Finished finding phrases.")
89
~\odrive\slechnoGDrive\MachineLearning\TextAnalyticsCourse\text_analytics.py in fit_phrases(self, df, min_count, non_eng)
141 common_terms = []
142
--> 143 phrases = Phrases(
144 sentences = self.read_clean(df),
145 min_count = min_count,
TypeError: __init__() got an unexpected keyword argument 'common_terms'
I'm running the notebooks locally. Using Python version 3.9 with anaconda.
The file I'm currently testing is "Twitter_by_Country.gz"
Thanks a lot,
Sigal
In assessment 3 of the edX course, fit_tfidf() failed with a memory error. There's a report from another student in the edX discussion, so I am not the only one having this problem.
import os
import pandas as pd
from text_analytics import text_analytics
ai = text_analytics()
raw_data = os.path.join(ai.data_dir, 'Twitter_by_Country.gz')
data = pd.read_csv(raw_data, index_col=0)
# Memory error happens here
ai.fit_tfidf(data)
As a work-around, I used an existing model, following the approach from lab 3.2:
# imports
import os
import pandas as pd
from text_analytics import text_analytics, load
ai_state = load('tf-idf.Twitter_by_Country')
# open file
ai = text_analytics()
ai.phrases = ai_state.phrases
ai.tfidf_vectorizer = ai_state.tfidf_vectorizer
raw_data = os.path.join(ai.data_dir, 'Twitter_by_Country.gz')
data = pd.read_csv(raw_data, index_col=0)
# classify based on country
ai.svm(data, labels='Country', features='content')
This code ran, but generated some warnings:
/usr/lib/python3.7/site-packages/sklearn/base.py:318: UserWarning: Trying to unpickle estimator CountVectorizer from version 0.23.1 when using version 0.22. This might lead to breaking code or invalid results. Use at your own risk.
UserWarning)
/usr/lib/python3.7/site-packages/sklearn/base.py:318: UserWarning: Trying to unpickle estimator TfidfTransformer from version 0.23.1 when using version 0.22. This might lead to breaking code or invalid results. Use at your own risk.
UserWarning)
/usr/lib/python3.7/site-packages/sklearn/base.py:318: UserWarning: Trying to unpickle estimator TfidfVectorizer from version 0.23.1 when using version 0.22. This might lead to breaking code or invalid results. Use at your own risk.
UserWarning)
I wanted to try and run the notebooks locally, so I downloaded 1.2. Accessing the Data and tried running from Jupyter that was installed with Anaconda3, but I ran into this error:
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject
I bailed on Anaconda and went for installing everything in a virtualenv:
% python --version
Python 3.9.1
% pip install jupyterlab
Collecting jupyterlab
Downloading jupyterlab-3.1.10-py3-none-any.whl (8.5 MB)
...
% pip install textanalytics
...
tensorflow 2.6.0 requires numpy~=1.19.2, but you'll have numpy 1.21.2 which is incompatible.
numba 0.54.0 requires numpy<1.21,>=1.17, but you'll have numpy 1.21.2 which is incompatible.
...
so...
% pip install numpy==1.19.2
...
h5py 3.1.0 requires numpy>=1.19.3; python_version >= "3.9", but you'll have numpy 1.19.2 which is incompatible.
so...
% pip install numpy==1.19.3
re-run Jupyter and the 1.2 notebook, and I'm back to:
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject
Is there a prescribed setup to use text_analytics?
Hi! Jonathan!
When I run the following codes:
from text_analytics import text_analytics
import os
import pandas as pd
print("Done!")
the following message shows up.
How can I solve this?
ImportError Traceback (most recent call last)
File /opt/homebrew/lib/python3.9/site-packages/tensorflow/python/init.py:61, in
60 sys.setdlopenflags(_default_dlopen_flags | ctypes.RTLD_GLOBAL)
---> 61 from tensorflow.python import pywrap_tensorflow
62 sys.setdlopenflags(_default_dlopen_flags)
File /opt/homebrew/lib/python3.9/site-packages/tensorflow/python/pywrap_tensorflow.py:28, in
27 return _mod
---> 28 _pywrap_tensorflow = swig_import_helper()
29 del swig_import_helper
File /opt/homebrew/lib/python3.9/site-packages/tensorflow/python/pywrap_tensorflow.py:24, in swig_import_helper()
23 try:
---> 24 _mod = imp.load_module('_pywrap_tensorflow', fp, pathname, description)
25 finally:
File /opt/homebrew/Cellar/[email protected]/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/imp.py:242, in load_module(name, file, filename, details)
241 else:
--> 242 return load_dynamic(name, filename, file)
243 elif type_ == PKG_DIRECTORY:
File /opt/homebrew/Cellar/[email protected]/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/imp.py:342, in load_dynamic(name, path, file)
340 spec = importlib.machinery.ModuleSpec(
341 name=name, loader=loader, origin=path)
--> 342 return _load(spec)
ImportError: dlopen(/opt/homebrew/lib/python3.9/site-packages/tensorflow/python/_pywrap_tensorflow.so, 0x000A): tried: '/opt/homebrew/lib/python3.9/site-packages/tensorflow/python/_pywrap_tensorflow.so' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e'))
During handling of the above exception, another exception occurred:
ImportError Traceback (most recent call last)
Input In [1], in <cell line: 1>()
----> 1 from text_analytics import text_analytics
2 import os
3 import pandas as pd
File /opt/homebrew/lib/python3.9/site-packages/text_analytics/text_analytics.py:27, in
25 from matplotlib import pyplot as plt
26 from wordcloud import WordCloud
---> 27 import tensorflow as tf
28 import spacy
30 #---------------------------------
File /opt/homebrew/lib/python3.9/site-packages/tensorflow/init.py:24, in
21 from future import print_function
23 # pylint: disable=wildcard-import
---> 24 from tensorflow.python import *
25 # pylint: enable=wildcard-import
26
27 # Lazily import the tf.contrib
module. This avoids loading all of the
28 # dependencies of tf.contrib
at import tensorflow
time.
29 class _LazyContribLoader(object):
File /opt/homebrew/lib/python3.9/site-packages/tensorflow/python/init.py:72, in
67 except ImportError:
68 msg = """%s\n\nFailed to load the native TensorFlow runtime.\n
69 See https://github.com/tensorflow/tensorflow/blob/master/tensorflow/g3doc/get_started/os_setup.md#import_error\n
70 for some common reasons and solutions. Include the entire stack trace
71 above this error message when asking for help.""" % traceback.format_exc()
---> 72 raise ImportError(msg)
74 # Protocol buffers
75 from tensorflow.core.framework.graph_pb2 import *
ImportError: Traceback (most recent call last):
File "/opt/homebrew/lib/python3.9/site-packages/tensorflow/python/init.py", line 61, in
from tensorflow.python import pywrap_tensorflow
File "/opt/homebrew/lib/python3.9/site-packages/tensorflow/python/pywrap_tensorflow.py", line 28, in
_pywrap_tensorflow = swig_import_helper()
File "/opt/homebrew/lib/python3.9/site-packages/tensorflow/python/pywrap_tensorflow.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow', fp, pathname, description)
File "/opt/homebrew/Cellar/[email protected]/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/imp.py", line 242, in load_module
return load_dynamic(name, filename, file)
File "/opt/homebrew/Cellar/[email protected]/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/imp.py", line 342, in load_dynamic
return _load(spec)
ImportError: dlopen(/opt/homebrew/lib/python3.9/site-packages/tensorflow/python/_pywrap_tensorflow.so, 0x000A): tried: '/opt/homebrew/lib/python3.9/site-packages/tensorflow/python/_pywrap_tensorflow.so' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e'))
Failed to load the native TensorFlow runtime.
for some common reasons and solutions. Include the entire stack trace
above this error message when asking for help.
Hello all,
after I run the first step of lab1.2
from text_analytics import text_analytics
import os
import pandas as pd
print("Done!")
I got the following error "TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases".
I'm new to phyton, so, how can I solve this?
In addition to that, Is there a way I can download and see the structure of data we have to work on (e.g. "Wordclouds.Business_Insider.gz")? I cannot find it in this package https://github.com/jonathandunn/text_analytics.
Thank you
Alessandra
Hello:
I installed text_analytics for use within Anaconda3, but I'm running into a slight problem. I have used the import line recommended in the README, and it produces the error below. Can someone provide any insight?
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
/var/folders/6k/vnb8jggn53b911h224fkq4w80000gn/T/ipykernel_83448/2800784288.py in <module>
----> 1 from text_analytics import TextAnalytics
/opt/anaconda3/lib/python3.8/site-packages/text_analytics/__init__.py in <module>
17
18 # import apis into sdk package
---> 19 from swagger_client.api.default_api import DefaultApi
20
21 # import ApiClient
ModuleNotFoundError: No module named 'swagger_client'
Would it be possible to include the /resources that go with the text analytics course. Jupyter only allows downloading one by one.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.