Coder Social home page Coder Social logo

Comments (6)

angelosalatino avatar angelosalatino commented on May 30, 2024 3

Hi,
we wrote an article explaining how you can adopt the CSO Classifier in other fields: https://infernusweb.altervista.org/wp/how-to-use-the-cso-classifier-in-other-domains/

Please do let us know if you need further information.

from cso-classifier.

angelosalatino avatar angelosalatino commented on May 30, 2024 1

Hi, these are very good questions. I will soon write an article/tutorial/guide on my blog on how to move towards other domains of science. Stay tuned

from cso-classifier.

innerop avatar innerop commented on May 30, 2024 1

@angelosalatino

I looked at the code for generating the file which you shared in the article.

I'd like to point out the divergence I see with respect to the description given in the article.

The description says:

"To generate this dictionary/file, we collected all the different words available within the vocabulary of the model. Then iterating on each word, we retrieved its top 10 similar words from the model, and we computed their Levenshtein similarity against all CSO topics. If the similarity was above 0.7, we created a record which stored all CSO topics triggered by the initial word."

But I believe the code does this instead:

"To generate this dictionary/file, we collected all the different words available within the vocabulary of the model. Then iterating on each word, we retrieved its top 10 similar words from the model and put them in a list, which we iterated over. If the cosine similarity for a word in the list was equal to or greater than 0.7, and we computed its Levenshtein similarity against all CSO topics and where that was equal to or above 0.94 we added the topic to a record (or created it if it didn't exist) which stored all CSO topics triggered by the initial word from our model."

from cso-classifier.

innerop avatar innerop commented on May 30, 2024

@angelosalatino

That would help greatly in adopting and adapting this work.

For now, however, could you please provide the script that generates the token-to-cso-combined file?

The README is clear on what is involved but looking at the CSO I have no clue what constitutes a "topic" The "words" (1,2,3-gram entities) show up in so many places. I have no idea how to even query the CSO properly? Do I use SPARQL? is this RDF? RDFS? I'm completely new to the format.

Referring to this passage in README.MD:

To generate this file, we collected all the set of words available within the vocabulary of the model. Then iterating on each word, we retrieved its top 10 similar words from the model, and we computed their Levenshtein similarity against all CSO topics. If the similarity was above 0.7, we created a record which stored all CSO topics triggered by the initial word.

from cso-classifier.

innerop avatar innerop commented on May 30, 2024

Thank you and I’ll keep you in the loop on how I’m using it and any improvements I can think of or further questions.

I managed to find an older version prior to when you added the cache and I could see how you’re doing the matching against ontology with the embeddings so that was very educational. One note, however, is that the older version only works on Python 3.6, not 3.7 or later. It throws a StopIteration exception from NLTK util. That’s an issue with Python and NLTK not your codebase

Thank you 🙏 .

from cso-classifier.

angelosalatino avatar angelosalatino commented on May 30, 2024

Hi, yes. Your explanation is very detailed. We left some details out for the sake of the narrative and demanded the reader to the code for further details. But definitely. Your description fits 100% with the actual process.

Thanks

from cso-classifier.

Related Issues (11)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.