Coder Social home page Coder Social logo

tkm's People

Contributors

johntailor avatar sdalaee avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

tkm's Issues

Error when running setup file

Hi,
I tried to run ur code, it raises an error which I could not resolve it
there is little work around that in the net

/home/saria/anaconda3/envs/condaenv27/bin/python2.7 /home/saria/Downloads/tkm-master/setup.py
Warning: Extension name '_topicAssign' does not match fully qualified name 'tkm-master._topicAssign' of '_topicAssign.pyx'
Compiling _topicAssign.pyx because it changed.
[1/1] Cythonizing _topicAssign.pyx

Error compiling Cython file:

...
#cython: language_level=3
^

_topicAssign.pyx:1:0: 'tkm-master._topicAssign' is not a valid module name
Traceback (most recent call last):
File "/home/saria/Downloads/tkm-master/setup.py", line 22, in
ext_modules = cythonize(extensions),
File "/home/saria/anaconda3/envs/condaenv27/lib/python2.7/site-packages/Cython/Build/Dependencies.py", line 1027, in cythonize
cythonize_one(*args)
File "/home/saria/anaconda3/envs/condaenv27/lib/python2.7/site-packages/Cython/Build/Dependencies.py", line 1149, in cythonize_one
raise CompileError(None, pyx_file)
Cython.Compiler.Errors.CompileError: _topicAssign.pyx

Process finished with exit code 1

would you please have a look on that

output does not make sense

hello John

I have run your code on my dataset, its a large dataset about 500k txt documents
and the result does not make sense as most words got 0 percent and only 4 topic has been output.
would you please justify this?

thanks


0:    cvasoci 0.00000, mcireview 0.00000, panhypopituitarismreview 0.00000, calltrig 0.00000, orderslength 0.00000, showmiss 0.00000, gaitpt 0.00000, notepsychotherapi 0.00000, ucohl 0.00000, pulmonarydismiss 0.00000, wellnl 0.00000, utiresult 0.00000, welldc 0.00000, mammogramneg 0.00000, boquet 0.00000, 
1:    cvasoci 0.00000, mcireview 0.00000, panhypopituitarismreview 0.00000, calltrig 0.00000, orderslength 0.00000, showmiss 0.00000, gaitpt 0.00000, notepsychotherapi 0.00000, ucohl 0.00000, pulmonarydismiss 0.00000, wellnl 0.00000, utiresult 0.00000, welldc 0.00000, mammogramneg 0.00000, boquet 0.00000, 
2:    prescript 0.100, mouth 0.098, tablet 0.095, skin 0.088, clear 0.053, site 0.051, joint 0.050, lesion 0.046, neg 0.044, mild 0.043, deni 0.042, sleep 0.041, procedur 0.040, bowel 0.040, think 0.039, 
3:    baalmann 0.896, damian 0.669, cuffautomat 0.648, rachelen 0.442, balkcom 0.431, schwarz 0.368, klusmann 0.296, deceas 0.253, calv 0.250, rightarm 0.235, scholz 0.229, sztajnkrycer 0.211, sadosti 0.201, bernic 0.188, negativegu 0.162, 
4:    patient 0.207, time 0.155, mg 0.151, tablet 0.143, daili 0.092, mouth 0.091, pain 0.085, medic 0.068, dai 0.067, instruct 0.064, left 0.060, right 0.058, indic 0.058, dr 0.056, need 0.055, 

I have not changed your code except the way read the dataset:

import re
documents = glob.glob("/infodev1/phi-data/sohn/biobank/saria/biobank_65up_CI_dx_CN_cp/*.txt")
docs = []
for fi in documents:
    with open(fi, 'r') as myfile:
        d = myfile.read().replace('\n', '')
        d = re.sub(r"/[A-Za-z0-9_-]+ ", " ",
                   d)  # The/at Fulton/np-tl County/nn-tl Grand/jj-tl Jury/nn-tl said/vbd Friday/nr an/at investigation/nn") #.replace("/at","").replace("/nn-tl","").replace("/nn-hp","").replace("/np-hl","").replace("/nn","").replace("/vbd","").replace("/in","").replace("/jj","").replace("/hvz","").replace("/cs","").replace("/nps","").replace("/nr","").replace("/np-tl","").replace("/md","").replace("/np","").replace("/cd-hl","").replace("/vbn","").replace("/np-tl","").replace("/dti","").replace("--/--","")
        docs.append(d)
return docs

function for calculating the score

Again thanks for all your answer,

I want to apply and use some of your idea in my research,
Actually, I don't want the topic assignment part, rather the way you calculate the score for each word,
May I ask you to help me with your code, Do I need the computeKeywordScore? Or there are other dependencies I should consider?

Thank you so much for your time :)

evaluation part

hi john,

would you please let me know the code for your evaluation part?
have you tried to visualize your data?
may I ask you to let me know with which implementation of LDA did you compare your result? if you have the link I appreciate it share it with me.

Thanks for your time :)

visualization

Hi,

Is there any way to save the model to visualize the result something like the PyLDA is doing?

Thanks.

word distributions with 0 weight

Hi again,

I have applied this model on some data being scraped from the web news. it printed out about 18 to 20 topics for each subject. the thing is that for some subjects like "drugs" it printed out 18 topics and one of the topic clusters have 0 weight for each word distribution.
Do you have any idea why this is happening?
Like this topic cluster which is related to "Obesity" subject.
1: faerch 0.00000, perrier 0.00000, mottola 0.00000, augustin 0.00000, kadono 0.00000, hamstr 0.00000, flexor 0.00000, paperboard 0.00000, tanz 0.00000, lenihan 0.00000, aa 0.00000, dysphoria 0.00000, christel 0.00000, yukiko 0.00000, sveikata 0.00000,

Thanks!

2.1 Modeling keywords

I have a question not related to code but the paper,
I hope I get the answer of that,
In part 2.1 of the paper, column2, formula(7)

the bold difference between the two formula is(ignoring B) the first formula has Log and the second one which you reasoned will give broader topics does not have Log,

May I ask you explain how you come up with this?
I mean how you can conclude that the first formula gives more specific and the second broader while the only difference is Log?

Thank you so much, sorry if my question is not a coding issue and maybe kind of naive,

Python 2.x or 3.x?

I have inherited some code that uses TKM and I'm in the process of running it. It compiled ok, but it seems to be failing now because it's designed for 2.7 and I'm running 3.5

Should I roll back and try to run it using python 2.7? Or does the new TKM code here work in python 3.5? I can't see anywhere in the documentation for directions about the proper python to use

Thanks

Reverse of entropy

Hello,

Actually, I found your project very interesting. I have a confusion about the reverse of entropy idea.
My question is more about the implementation part:
suppose we have a matrix (10*2) stands for 10 words and 2 topics.
From your code, I can understand that you have applied the entropy on each row, so in this case, we will end up with 10 entropy. then how you can decide about the distinctiveness of the clusters of the topics?
To put it another way, how can you derive a conclusion which word is distinctive to the two available topics?

Thank you

How did you track the context

Hi :),

I have a question regarding the way you create the corpus, as stated in the paper this algorithm considers the context while assigning topic to words. I mean looking at a window before and after targeted word to assign the topic to words.

I think I may misunderstood but how did you track the context of the words?
I looked at the methods for creating corpus in AlgTools, it seems the corpus has been created regardless of their position in the documents.
So, how you were able to see which words are around the targeted word?
Sorry for many question I need to know your approach for applying this part.

many Thanks :)

PMI evaluation

I need to evaluate your approach on my data set
Do you mean for finding PMI you reported in the paper I need to calculate the median of the whole numbers in each topic and then the mean of PMI of each calculated topics?
Would you please let me know is it the same thing you did in the paper?
Actually I got good result and I only need to evaluate the model so I appreciate it if you let me know about this.

Thanks:)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.