mmihaltz / word2vec-googlenews-vectors Goto Github PK

View Code? Open in Web Editor NEW

495.0 495.0 313.0 4 KB

word2vec Google News model

word2vec-googlenews-vectors's People

Contributors

Stargazers

Watchers

Forkers

jeyavignesh wuzhongdehua yezichou asiagood rfelixmg susannecoffaro yofayed muzaluisa sushmabhat jianbo-lab izolius wudaclark ai2010 zero91 xywen4buct roisevege diogoncalves praneetdutta monkeydunkey xinkou enterstudio anshul-cached huahuajhu adegie mvl1208 dwwu joelone prannayk misamabbas benedictking jchanaf wangpeng3891 vsvarrier neverforged amehta1993 davidmartinezros sabirdvd apiknowledge snassimr samithaj lammchi kaisbh sw1001 manirupa manuvinakurike jpbatz hasibgithub reflectioncat juandixonformvp ashwinthotads sanket0211 danzschulman boardmad dimitriscc johbln linron84 gracegay mathlf2015 berryhn takeshwari steven-hh-ding apoorvasurat yuktik shahlearner zoyanhui shubhampachori12110095 xiaozhewen fromppf semihyumusak txjp3 duochirou paresh494 thorzhong xinke0802 temporarygiraccount y12uc231 jayaharsha makamus amallia xiner235 boutibi searchmodel py-ranoid akshit1223 sandy4321 rekinyz will007008 kwajiehao ramkumarm maxgrossenbacher asitison momo19950627 ezrawilliam batspock afcarl syx528911137 nashidshaila lorenzowy gdsttian codingafuture

word2vec-googlenews-vectors's Issues

You seem to be over your data quota

git clone https://github.com/mmihaltz/word2vec-GoogleNews-vectors
Cloning into 'word2vec-GoogleNews-vectors'...
remote: Enumerating objects: 20, done.
remote: Total 20 (delta 0), reused 0 (delta 0), pack-reused 20
Unpacking objects: 100% (20/20), done.
Downloading GoogleNews-vectors-negative300.bin.gz (1.6 GB)
Error downloading object: GoogleNews-vectors-negative300.bin.gz (21c05ae): Smudge error: Error downloading GoogleNews-vectors-negative300.bin.gz (21c05ae916a67a4da59b1d006903355cced7de7da1e42bff9f0504198c748da8): batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.

Errors logged to D:\data\subjects\datascience\nlp\pretrained_embeddings\word2vec-GoogleNews-vectors\.git\lfs\logs\20201123T202346.0092006.log
Use `git lfs logs last` to view the log.
error: external filter 'git-lfs filter-process' failed
fatal: GoogleNews-vectors-negative300.bin.gz: smudge filter lfs failed
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry with 'git restore --source=HEAD :/'

Repository quota exceeded

It seems GitHub does set a bandwidth quota for all the downloads.

$  git lfs fetch
Fetching master
Git LFS: (0 of 1 files) 0 B / 1.53 GB                                          
batch response: http: This repository is over its data quota. Purchase more data packs to restore access.
Docs: https://help.github.com/articles/purchasing-additional-storage-and-bandwidth-for-a-personal-account/
Warning: errors occurred

I am relatively new to this and I am attempting to extract the .bin.gz file (that I have on my computer). Trying to use gunzip -k {filename} gave me an error, so I looked it up online and was told to just remove the .gz extension, which doesn't seem right. How do I get the data I'm attempting to find?

Thanks!

Unable to download anymore - "This repository is over its data quota"

I get the following message on doing git lfs clone :

batch response: http: This repository is over its data quota. Purchase more data packs to restore access.
Docs: https://help.github.com/articles/purchasing-additional-storage-and-bandwidth-for-a-personal-account/

word2vec-GoogleNews-corpous

how to train word2vec-GoogleNews-corpous data set?

Error downloading object [...] This repository is over its data quota.

Is there a way I can donate to get this project some more data packs to restore access?

mirror - In case you run out of data

In case you run out of quota again, I uploaded the same file on this repo:

https://github.com/dataf3l/word2vec-GoogleNews-vectors-negative300.bin

I hope this helps people, thank you @mmihaltz for creating this repository, it has helped me personally a lot in order to further my studies.

Please continue creating cool stuff! :)

Over data quota

It says to me that the repository its over its data quota:

git-lfs clone https://github.com/mmihaltz/word2vec-GoogleNews-vectors
WARNING: 'git lfs clone' is deprecated and will not be updated
          with new flags from 'git clone'

'git clone' has been updated in upstream Git to have comparable
speeds to 'git lfs clone'.
Cloning into 'word2vec-GoogleNews-vectors'...
remote: Enumerating objects: 20, done.
remote: Total 20 (delta 0), reused 0 (delta 0), pack-reused 20
Unpacking objects: 100% (20/20), done.
batch response: This repository is over its data quota. Purchase more data packs to restore access.                                    
error: failed to fetch some objects from 'https://github.com/mmihaltz/word2vec-GoogleNews-vectors.git/info/lfs'

how to download with code

Error downloading

Can you help me?

git clone https://github.com/mmihaltz/word2vec-GoogleNews-vectors
Cloning into 'word2vec-GoogleNews-vectors'...
remote: Enumerating objects: 20, done.
remote: Total 20 (delta 0), reused 0 (delta 0), pack-reused 20
Unpacking objects: 100% (20/20), done.
Downloading GoogleNews-vectors-negative300.bin.gz (1.6 GB)
Error downloading object: GoogleNews-vectors-negative300.bin.gz (21c05ae): Smudge error: Error downloading GoogleNews-vectors-negative300.bin.gz (21c05ae916a67a4da59b1d006903355cced7de7da1e42bff9f0504198c748da8): batch response: This repository is over its data quota. Purchase more data packs to restore access.

Errors logged to D:\data\datasets\word2vec-GoogleNews-vectors.git\lfs\logs\20190430T014012.9958653.log
Use git lfs logs last to view the log.
error: external filter 'git-lfs filter-process' failed
fatal: GoogleNews-vectors-negative300.bin.gz: smudge filter lfs failed
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry the checkout with 'git checkout -f HEAD'

What should I do after get the GoogleNews-vectors-negative300.bin

I have got the GoogleNews-vectors-negative300.bin and I wonder how to get the word2vec.txt in cifar10 dataset

How can I get the frequency of these 3MB words?

If I want to know the frequency of each word, I have to take a count from the Google News corpus? Anyway else? Do you know where can we get the results?
Thank you.