As the code: <div class="snippet-clipboard-content notranslate position-relative o

The shape of word_embeddings is different with keywords. </blockquote

How to get the keywords' embedding? about keybert HOT 5 CLOSED

GengYuIsland commented on July 23, 2024

How to get the keywords' embedding?

from keybert.

Comments (5)

MaartenGr commented on July 23, 2024

The shape of word_embeddings is different with keywords.

That's correct and intended behavior! The reason why they differ is that .extract_embeddings extracts the embeddings from all words in the documents. These are then fed to .extract_keywords to extract a subset of words that will serve as keywords.

As such, if you want the embeddings of the keywords, you would have to generate them yourself.

from keybert.

GengYuIsland commented on July 23, 2024

How do I match the keywords to the vectors in word_embeddings? word_embeddings doesn't contain a vector of all words, I'm guessing it's stop words are removed. This results in me not being able to locate the corresponding vectors in word_embeddings based on the order of the keywords in the sentence. This is the point.

from keybert.

GengYuIsland commented on July 23, 2024

I had to use Sentence-Bert to embed the keyword because I see it used at the bottom of your code. Does this approach make sense? After all, to my knowledge, Sentence-Bert embeds sentences, not words.

from keybert.

MaartenGr commented on July 23, 2024

I had to use Sentence-Bert to embed the keyword because I see it used at the bottom of your code. Does this approach make sense? After all, to my knowledge, Sentence-Bert embeds sentences, not words.

It does. Let me start by saying that sentence-transformers is not a single model but a framework that can use different models. In practice, although these models do generate embeddings for sentences/paragraphs that does not mean it cannot or should not embed words. These types of models often generate contextual word/token embeddings and sometimes do a simple procedure like averaging the token embeddings. As such, it can definitely generate word embeddings and it does so quite well.

from keybert.

GengYuIsland commented on July 23, 2024

Thank you MaartenGr, you solved my problem!

from keybert.

Recommend Projects

How to get the keywords' embedding? about keybert HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent