Coder Social home page Coder Social logo

mmourafiq / philo2vec Goto Github PK

View Code? Open in Web Editor NEW
35.0 3.0 7.0 35.23 MB

An implementation of word2vec applied to [stanford philosophy encyclopedia](http://plato.stanford.edu/)

License: MIT License

Python 100.00%
word2vec vector-representations deep-learning tensorflow philosophy-encyclopedia negative-samples skips embeddings crawled-data

philo2vec's Introduction

philo2vec

A Tensorflow implementation of word2vec applied to stanford philosophy encyclopedia, the implementation supports both cbow and skip gram

for more reference, please have a look at this papers:

After training the model returns some interesting results, see interesting results part

Evaluating hume - empiricist + rationalist:

descartes
malebranche
spinoza
hobbes
herder

screen shot 2016-08-12 at 19 19 22

Some interesting results

Similarities

Similar words to death:

untimely
ravages
grief
torment

Similar words to god:

divine
De Providentia
christ
Hesiod

Similar words to love:

friendship
affection
christ
reverence

Similar words to life:

career
live
lifetime
community
society

Similar words to brain:

neurological
senile
nerve
nervous

operations

Evaluating hume - empiricist + rationalist:

descartes
malebranche
spinoza
hobbes
herder

Evaluating ethics - rational:

hiroshima

Evaluating ethic - reason:

inegalitarian
anti-naturalist
austere

Evaluating moral - rational:

commonsense

Evaluating life - death + love:

self-positing
friendship
care
harmony

Evaluating death + choice:

regret
agony
misfortune
impending

Evaluating god + human:

divine
inviolable
yahweh
god-like
man

Evaluating god + religion:

amida
torah
scripture
buddha
sokushinbutsu

Evaluating politic + moral:

rights-oriented
normative
ethics
integrity

The repo contains:

  • an object to crawl data from the philosophy encyclopedia; PlatoData
  • a object to build the vocabulary based on the crawled data; VocabBuilder
  • the model that computes the continuous distributed representations of words; Philo2Vec

Installation

The dependencies used for this module can be easily installed with pip:

> pip install -r requirements.txt

The params for the VocabBuilder:

  • min_frequency: the minimum frequency of the words to be used in the model.
  • size: the size of the data, the model then use the top size most frequenct words.

The hyperparams of the model:

  • optimizer: an instance of tensorflow Optimizer, such as GradientDescentOptimizer, AdagradOptimizer, or MomentumOptimizer.
  • model: the model to use to create the vectorized representation, possible values: CBOW, SKIP_GRAM.
  • loss_fct: the loss function used to calculate the error, possible values: SOFTMAX, NCE.
  • embedding_size: dimensionality of word embeddings.
  • neg_sample_size: number of negative samples for each positive sample
  • num_skips: numer of skips for a SKIP_GRAM model.
  • context_window: window size, this window is used to create the context for calculating the vector representations [ window target window ].

Quick usage:

params = {
    'model': Philo2Vec.CBOW,
    'loss_fct': Philo2Vec.NCE,
    'context_window': 5,
}
x_train = get_data()
validation_words = ['kant', 'descartes', 'human', 'natural']
x_validation = [StemmingLookup.stem(w) for w in validation_words]
vb = VocabBuilder(x_train, min_frequency=5)
pv = Philo2Vec(vb, **params)
pv.fit(epochs=30, validation_data=x_validation)
params = {
    'model': Philo2Vec.SKIP_GRAM,
    'loss_fct': Philo2Vec.SOFTMAX,
    'context_window': 2,
    'num_skips': 4,
    'neg_sample_size': 2,
}
x_train = get_data()
validation_words = ['kant', 'descartes', 'human', 'natural']
x_validation = [StemmingLookup.stem(w) for w in validation_words]
vb = VocabBuilder(x_train, min_frequency=5)
pv = Philo2Vec(vb, **params)
pv.fit(epochs=30, validation_data=x_validation)

about stemming

Since the words are stemmed as part of the preprocessing, some operation are sometimes necessary

StemmingLookup.stem('religious')  # returns "religi"

StemmingLookup.original_form('religi')  # returns "religion"

Getting similarities

pv.get_similar_words(['rationalist', 'empirist'])

Evaluating operations

pv.evaluate_operation('moral - rational')

plotting vectorized words

pv.plot(['hume', 'empiricist', 'descart', 'rationalist'])

Training details

skip_gram:

skip_gram_loss

skip_gram_embeddings

skip_gram_w

skip_gram_b

cbow:

cbow_loss

cbow_embedding

cbow_w

cbow_b

philo2vec's People

Contributors

mmourafiq avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

philo2vec's Issues

Any plans for an update?

Hi
So i was trying to update your repository to be compatible with python 3.7. I got the crawler and vocab builder working but i have no idea of the tensorflow implementation of word2vec. Specifically it seems that it's the requirement for tensorflow version 0.9.0 that causes trouble, as this version isn't available any longer and current versions aren't compatible. Any chance you're planning to update the repository? Or if somehow there is a way to get it working?

Sorry if my questions seems weirdly formulated, still learning to program.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.