Comments (3)
Thanks, let me try that out and see what effect that has in total and then I would also update the sample output, too. I can do this sometime soon.
Update: I think I am more pleased with the results, I am getting better summaries this way, since singular and plural forms of words now are "equal" to the algorithm and together have more weight instead of carrying separate but then not so strong weights. I will test some more and then propose a few updates to the sample page.
from pytextrank.
Hi @0dB
Thanks bringing this to our attention.
The occurrences of sentences
being grouped together is working as per the scrubber code.
Since scrubber function returns the span.text
in the example code, sentences are grouped as one, while sentence
are being grouped together.
We can change the desired behaviour by changing the example code from
return span.text
to
return span.lemma_
This will group all occurrences of sentence and sentences together.
Please feel free to make this change in the example notebook in your existing PR #233 .
from pytextrank.
Many thanks @0dB and @Ankush-Chander !
It would help to have examples/sample.ipynb
updated to illustrate the behaviors discussed here.
@0dB, the changes in your PR #233 look good -
We're having issues with our CI pipeline (see #235) and as soon as I get that cleared (hopefully tonight) I'll accept/merge the PR.
I also noticed the typo toekn
in that same notebook :) FWIW, these notebooks get rendered as Markdown to build portions of our docs, so the docs will become updated by the same fix.
from pytextrank.
Related Issues (20)
- Biased Textrank implementation uses phrases instead of sentences HOT 4
- Information about the matrix similarity HOT 2
- NotImplementedError: [E894] The 'noun_chunks' syntax iterator is not implemented for language 'ru'. HOT 8
- Silence of the Lambs HOT 1
- ZeroDivisionError: division by zero in _calc_discounted_normalised_rank HOT 2
- Cannot use stopwords in PyTextRank HOT 3
- Demo: Term Weighting for Document Similarity Testing HOT 1
- Is `biasedtextrank` implemented? HOT 4
- Is it possible to integrate Pytextrank with Flair NLP engine HOT 1
- "ValueError: [E002] Can't find factory for 'textrank' for language English (en)." - incompatibility with SpaCy 3.3.1? HOT 1
- suggestion: allow "wildcard" POS for stopwords
- Doesn't work for Dutch language HOT 1
- DiGraph instead of Graph HOT 2
- Bugfix for scrubber sample code which fails when scrubbing "two" HOT 2
- Update Sample Usage document: stop words must be lowercase HOT 3
- GitHub CI Actions for `pre-commit` are failing HOT 3
- different output HOT 6
- Dependency Management Pip-Tools Example HOT 1
- why the keyword phrase include a PRON, like "it" HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pytextrank.