Comments (1)
It depends on several things, including tokenization schemes but also training data, but in general, these models are also quite capable of creating word embeddings despite not having contextual information at the time of inference. As you might notice, and especially combined with MMR (which does take into account the relationship between words to a certain extent), this already produces quite good results.
The BaseEmbedder indeed started out with the additional option to pass a word embedding model but since both models needed to be in the same dimensional space to be comparable, this turned out to be something that could not easily be implemented. You can't really (or easily) compare the output embeddings of two different embeddings using distance functions. What has been on the list for a while is to extract the token embeddings before aggregation from sentence-transformers but that again depends on the underlying model.
Any suggestions for implementations are appreciated!
from keybert.
Related Issues (20)
- Allow KeyBERT to pass `batch_size` to `llm.encode()` method HOT 5
- Make system content as variable HOT 2
- Fail to parse OpenAI api response HOT 2
- Extract keywords from multiple documents given a nested list of candidates for each document. HOT 1
- can't import keybert HOT 9
- Using KeyBERT with a locally saved model HOT 1
- Not able to use gensim HOT 5
- Setup check. Script to get keywords for comparing against SimpleMaths, TextRank and Philology results HOT 5
- Add more LLMs HOT 1
- OSError: libcudart.so.12: cannot open shared object file: No such file or directory HOT 2
- KeyLLM error: AttributeError: 'str' object has no attribute 'completions' HOT 9
- KeyLLM Error: AttributeError: 'CompletionChoice' object has no attribute 'message' HOT 6
- CLI tool like YAKE
- keybert benchmarks with respect to other phrase extraction techniques. HOT 1
- Drop support for Python < 3.8 HOT 5
- pre-commit-hooks: replace black and flake8 HOT 2
- OpenAi api HOT 4
- How to get the keywords' embedding? HOT 5
- max token with openAI HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from keybert.