Comments (6)
Hi,
we wrote an article explaining how you can adopt the CSO Classifier in other fields: https://infernusweb.altervista.org/wp/how-to-use-the-cso-classifier-in-other-domains/
Please do let us know if you need further information.
from cso-classifier.
Hi, these are very good questions. I will soon write an article/tutorial/guide on my blog on how to move towards other domains of science. Stay tuned
from cso-classifier.
I looked at the code for generating the file which you shared in the article.
I'd like to point out the divergence I see with respect to the description given in the article.
The description says:
"To generate this dictionary/file, we collected all the different words available within the vocabulary of the model. Then iterating on each word, we retrieved its top 10 similar words from the model, and we computed their Levenshtein similarity against all CSO topics. If the similarity was above 0.7, we created a record which stored all CSO topics triggered by the initial word."
But I believe the code does this instead:
"To generate this dictionary/file, we collected all the different words available within the vocabulary of the model. Then iterating on each word, we retrieved its top 10 similar words from the model and put them in a list, which we iterated over. If the cosine similarity for a word in the list was equal to or greater than 0.7, and we computed its Levenshtein similarity against all CSO topics and where that was equal to or above 0.94 we added the topic to a record (or created it if it didn't exist) which stored all CSO topics triggered by the initial word from our model."
from cso-classifier.
That would help greatly in adopting and adapting this work.
For now, however, could you please provide the script that generates the token-to-cso-combined file?
The README is clear on what is involved but looking at the CSO I have no clue what constitutes a "topic" The "words" (1,2,3-gram entities) show up in so many places. I have no idea how to even query the CSO properly? Do I use SPARQL? is this RDF? RDFS? I'm completely new to the format.
Referring to this passage in README.MD:
To generate this file, we collected all the set of words available within the vocabulary of the model. Then iterating on each word, we retrieved its top 10 similar words from the model, and we computed their Levenshtein similarity against all CSO topics. If the similarity was above 0.7, we created a record which stored all CSO topics triggered by the initial word.
from cso-classifier.
Thank you and I’ll keep you in the loop on how I’m using it and any improvements I can think of or further questions.
I managed to find an older version prior to when you added the cache and I could see how you’re doing the matching against ontology with the embeddings so that was very educational. One note, however, is that the older version only works on Python 3.6, not 3.7 or later. It throws a StopIteration exception from NLTK util. That’s an issue with Python and NLTK not your codebase
Thank you 🙏 .
from cso-classifier.
Hi, yes. Your explanation is very detailed. We left some details out for the sake of the narrative and demanded the reader to the code for further details. But definitely. Your description fits 100% with the actual process.
Thanks
from cso-classifier.
Related Issues (11)
- Issue installing requirements HOT 2
- change namespace of the classifier
- docs[windows]: Problems installing on Windows HOT 2
- running issue with collections package HOT 2
- How do I use the klink-2 algorithm? HOT 1
- installation error HOT 18
- Bump `igraph` version HOT 11
- Error in installing dependencies HOT 2
- Error while generating package metadata/metadata-generation-failed HOT 1
- Getting requirements to build wheel ... error >>> Error compiling Cython file: spacy/vocab.pxd:28:10: Variables cannot be declared with 'cpdef'. Use 'cdef' instead. HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cso-classifier.