hse-aml / natural-language-processing Goto Github PK
View Code? Open in Web Editor NEWResources for "Natural Language Processing" Coursera course.
Home Page: https://www.coursera.org/learn/language-processing
Resources for "Natural Language Processing" Coursera course.
Home Page: https://www.coursera.org/learn/language-processing
In Week1 module during Multiclass training, scikit-learn raises this kind of exception:
"Scikit Learn Multilabel Classification: ValueError: You appear to be using a legacy multi-label data representation..."
So, I've found out that we should use MultiLabelBinarizer in order to preprocess labels, done.
But when we need to evaluate "val" dataset on trained classifiers, there is "mlb" variable referenced, which was not instantiated. I assume that it refers to "MultiLabelBinarizer". As you see, there is an inconsistency here, which currently should be manually fixed.
Can you please add week 5 support?
You need to sign into Google account first or you won't see the GitHub tab from the README instruction
Hello
tf.nn.rnn_cell.BasicLSTMCell is deprecated and tf.nn.rnn_cell.LSTMCell is replaced.
And there is some text about optimization with GPU:
"Note that this cell is not optimized for performance. Please use tf.contrib.cudnn_rnn.CudnnLSTM for better performance on GPU"
Hi,
Having issues getting the access to work for Docker and would really appreciate some straightforward advice as I have never used Docker before. The Docker installation is the toolbox version on Windows 10 (not pro).
I have reached the point where I have Docker Quickstart Terminal and a Jupyter Notebook session running. Note that the Docker tutorial on Github fails at this point:
David@DESKTOP-TLE6KHC MINGW64 /c/Program Files/Docker Toolbox
$ docker run -it -p 8080:8080 --name coursera-aml-nlp --user root -v /C:/Users/David/natural-language-processing-master
/week3/data:/root/coursera
"docker run" requires at least 1 argument.
See 'docker run --help'.
Usage: docker run [OPTIONS] IMAGE [COMMAND] [ARG...]
Run a command in a new container
So, as I am using the toolbox I follow the instructions from Dr Shahin Rostami at:
https://shahinrostami.com/posts/tools/docker/docker-toolbox-windows-7-and-shared-volumes/
Which brings me to the point in the instructions:
Sharing Folders with a Docker Container
To create a Docker container from the jupyter/scipy-notebook image, type the following command and wait for it to complete execution: docker run --name="scipy" --user root -v /h/work:/home/jovyan -d -e GRANT_SUDO=yes -p 8888:8888 jupyter/scipy-notebook start-notebook.sh --NotebookApp.token=''
This may take some time, as it will need to download and extract the image. Once it's finished, you should be able to access the Jupyter notebook using 127.0.0.1:8888. I hope this helps you get up and running with Docker Toolbox and shared folders. Of course, the process is typically easier when using the non-legacy Docker solutions.
I'm really not sure what the token or password represent here. The access to the folder is all I am after but I do not know what to try next.
Thanks,
David
tf.nn.softmax_cross_entropy_with_logits is deprecated and tf.nn.softmax_cross_entropy_with_logits_v2 is replaced.
I guess in task description it should be 12th row (index 11) instead of 11th row (index 10) for which the number many non-zero elements has to be determined. Not sure if I am understanding something wrong about indexing csr matrices.
Currently my code works when I use
row = X_train_mybag[11].toarray()[0]
If you name contains any funny characters in Telegram, the bot will crash
Ready to talk!
An update received.
Traceback (most recent call last):
File "main_bot.py", line 111, in <module>
main()
File "main_bot.py", line 103, in main
print("Update content: {}".format(update))
UnicodeEncodeError: 'ascii' codec can't encode character '\xf8' in position 153: ordinal not in range(128)
Although adding some more computational complexity, adding the following function
def cast_to_utf_8(old_dict):
"""
Encodes the string content of a dict to utf-8
Parameters
----------
old_dict : dict
The dict to encode
Returns
-------
new_dict : dict
The encoded dict
"""
def walk(node):
"""
Recursively traverses a node ande encodes all strings to utf-8
Parameters
----------
node : dict
The node to traverse
Returns
-------
node : dict
The node where the strings are encoded to utf-8
"""
for key, item in node.items():
if type(item)==dict:
walk(item)
elif type(item)==list:
for i, elem in enumerate(item):
if type(elem) == str:
node[key][i] = elem.encode('utf-8')
elif type(item)==str:
node[key] = item.encode('utf-8')
return node
new_dict = walk(old_dict)
return new_dict
and calling it like this in main()
if is_unicode(text):
update = cast_to_utf_8(update)
print("Update content: {}".format(update))
bot.send_message(chat_id, bot.get_answer(update["message"]["text"]))
else:
bot.send_message(chat_id, "Hmm, you are sending some weird characters to me...")
was a remedy for me
Hi,
module common
not found. Tried on colab and on the docker as well.
here is the traceback:
`ImportError Traceback (most recent call last)
in ()
1 import sys
2 sys.path.append("..")
----> 3 from common.download_utils import download_week1_resources
4
5 download_week1_resources()
ImportError: No module named 'common'`
In the function test_my_bag_of_words
, answers
is defined as a list of list while it should be just list
Original:
answers = [[1, 1, 0, 1]]
Should be:
answers = [1, 1, 0, 1]
as it is a return of my_bag_of_words
function which takes text as input and returns np array
Hi,
Can we have Colab environment support?
Thanks
I am not able to download dependencies using google colab. Please help me solve this issue.
https://github.com/hse-aml/natural-language-processing/blob/master/week1/lemmatization_demo.ipynb
If you look at cell 5, the string is text = "operates operative operating"
, but in cell 6 after you apply the Porter Stemmer, the stemmed string applied on the one from cell 5 is different - in particular, there are no common characters in each word, most likely due to a previous string already cached prior to running the cell: u'feet cat wolv talk'
. The same for the lemmatized string in cell 7.
The intent of the code is clear though, but the results should be changed in the future.
Running the Week 3 notebook on google colab (after previously encountering #33), I see
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-7-e70e92d32c6e> in <module>()
----> 1 import gensim
3 frames
/usr/local/lib/python3.6/dist-packages/gensim/models/ldamodel.py in <module>()
49
50 # log(sum(exp(x))) that tries to avoid overflow
---> 51 from scipy.misc import logsumexp
52
53
ImportError: cannot import name 'logsumexp'
---------------------------------------------------------------------------
NOTE: If your import is failing due to a missing package, you can
manually install dependencies using either !pip or !apt.
To view examples of installing some common dependencies, click the
"Open Examples" button below.
---------------------------------------------------------------------------
Hello,
This code returns an "Invalid Syntax" error.
`REPLACE_BY_SPACE_RE = re.compile('[/(){}[]|@,;]')
BAD_SYMBOLS_RE = re.compile('[^0-9a-z #+_]')
STOPWORDS = set(stopwords.words('english'))
def text_prepare(text):
"""
text: a string
return: modified initial string
"""
text = # lowercase text
text = # replace REPLACE_BY_SPACE_RE symbols by space in text
text = # delete symbols which are in BAD_SYMBOLS_RE from text
text = # delete stopwords from text
return text`
Kindly take a look to the provided Docker Tutorial link in the Readme, it isn;t working - https://github.com/hse-aml/natural-language-processing/blob/master/(Docker-tutorial.md)
The following link in the Week 2 assignment notebook appears to be broken:
I believe it should point instead to https://www.tensorflow.org/api_docs/python/tf/compat/v1/placeholder
Hello, this piece of code returns aftermentioned errors:
print(test_text_prepare())
`---------------------------------------------------------------------------
NameError Traceback (most recent call last)
in ()
----> 1 print(test_text_prepare())
in test_text_prepare()
5 "free c++ memory vectorint arr"]
6 for ex, ans in zip(examples, answers):
----> 7 if text_prepare(ex) != ans:
8 return "Wrong answer for the case: '%s'" % ex
9 return 'Basic tests are passed.'
NameError: name 'text_prepare' is not defined`
Sorry, stupid question, but how do I open week 1 in Colab? Usually for my own files, there is always a "open in Colab" button, but there is none for week one task?
Link to Clippin in perform_optimization
function documentation is broken.
I am using google colab for week 3 assignment and at the last when i am finally submitting it, the compiler is not recognizing my E-mail id. Please tell me the solution.
----> 1 STUDENT_EMAIL = [email protected]# EMAIL
2 STUDENT_TOKEN = AT5ZyzLxuQfnEhNg# TOKEN
3 grader.status()
NameError: name 'mandloi19faraday96' is not defined
This is the error i am getting how should i submit my assignment please help.
Would it be possible to migrate it to Tf 2.0?
Using the docker container environment I am getting a UnicodeDecodeError
. More speciffically:
prepared_questions = []
for line in open('data/text_prepare_tests.tsv'):
line = text_prepare(line.strip())
prepared_questions.append(line)
text_prepare_results = '\n'.join(prepared_questions)
grader.submit_tag('TextPrepare', text_prepare_results)
Is giving the following error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 79: ordinal not in range(128)
In order to run it I had to change it to:
prepared_questions = []
for line in open('data/text_prepare_tests.tsv', encoding='utf-8'):
line = text_prepare(line.strip())
prepared_questions.append(line)
text_prepare_results = '\n'.join(prepared_questions)
grader.submit_tag('TextPrepare', text_prepare_results)
Can also be solved by using pd.read_csv
.
Is this error reproducible to anyone else?
When I open one of the notebooks in CoLab, specifically week1-MultilabelClassification.ipynb:
https://colab.research.google.com/github/hse-aml/natural-language-processing/blob/master/week1/week1-MultilabelClassification.ipynb
When I try to run the notebook I get ModuleNotFound error on this line:
from common.download_utils import download_week1_resources
I have never run code from GitHub in CoLab and am not sure if I need to do something so that it can find the common module.
I retrieved the docker image like so:
# docker pull akashin/coursera-aml-nlp
#python3 --version
shows that this has python 3.5 installed
Unfortuantely python 3.5 has this bug. Because of this test_my_bag_of_words() fails as the dict order is not maintained. This works correctly in colab as the colab python version is 3.6.
I tried upgrading python 3.5 on the docker image to python 3.7 using this post. The upgrade seems to work. I also upgrade the jupyter notebook. But then the notebook doesn't work properly.
Is it possible to provide a docker image with an upgraded python version?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.