Coder Social home page Coder Social logo

word2vecpy's People

Contributors

deborausujono avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

word2vecpy's Issues

About Multiprocessing

Hi, I met some difficulties about multiprocessing. When the num_processes is changed to more than 1, the assignment of tasks among the workers has something wrong. Do you have any idea about this issue?

missing MAX_SEN_LEN and EPOCH

Compare to the original C code released by Google, MAX_SEN_LEN and EPOCH is missing, which caused these two problem.

[1] In the sub training process, each process read lines from file start and end. Once the input file contains only the one line (for example text8 corpus), following code snippets would caused bug.

 while fi.tell() < end:
        line = fi.readline().strip()
        # Skip blank lines
        if not line:
            continue

line = fi.readline().strip() would load the whole tokens from start.

[2] EPOCH would create embedding with more training samples.

TypeError: 'float' object cannot be interpreted as an index

Hi, I tried running your code, and got this error message:

Reading word 11690000
Unknown vocab size: 68558
Total words in training file: 11690125
Total bytes in training file: 85775698
Vocab size: 45151
Initializing unigram table
Traceback (most recent call last):
File "word2vec.py", line 388, in
args.min_count, args.num_processes, bool(args.binary))
File "word2vec.py", line 354, in train
table = UnigramTable(vocab)
File "word2vec.py", line 175, in __init__
table = np.zeros(table_size, dtype=np.uint32)
TypeError: 'float' object cannot be interpreted as an index`

I can't figure out how to fix it. Could you look into it? Thanks.

About negative sampling method

I found that the negative samples selected in the negative sampling method here may be the same as the positive example.
//indices = np.random.randint(low=0, high=len(self.table), size=count)
The index value generated by this line of code may be the same as the token

numpy.ctypeslib.c_double_Array_100

Hi,I download the code and run it .However It gets the follow problem :
pickle.PicklingError: Can't pickle <class 'numpy.ctypeslib.c_double_Array_100'>: it's not found as numpy.ctypeslib.c_double_Array_100 when it run into the line "pool = Pool(processes=num_processes, initializer=__init_process,nitargs=(vocab, syn0, syn1, table, cbow, neg,dim, alpha, win, num_processes,global_word_count, fi))".Can you get me some information?Thank you

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.