Comments (3)
Hello @quantarb we've had such issues before. In this case, I first use a regular tokenizer, and then additionally split all tokens on the offset positions to get the final tokenization. There is no helper function in Flair for this, so you would need to write your own tokenization code.
from flair.
Hi @alanakbik , thank you for your quick response. I tried to split up the tokens based on the offset positions, but I'm having problems restructoring my original flair sentence from tokens. What is the best way to reconstruct a flair sentence from tokens.
I tried several different approaches but my new_sentence never matches the original sentence.
text = """ BLAH BLAH BLAH BLAH"""
old_sentence = Sentence(text)
tokens = [Token(token.text) for token in sentence]
new_sentence = Sentence(tokens)
text = """ BLAH BLAH BLAH BLAH"""
old_sentence = Sentence(text)
tokens = [Token(text[token.start_position:token.end_position]) for token in sentence]
new_sentence = Sentence(tokens)
from flair.
Hi @alanakbik , thank you for your quick response. I tried to split up the tokens based on the offset positions, but I'm having problems restructoring my original flair sentence from tokens. What is the best way to reconstruct a flair sentence from tokens.
I tried several different approaches but my new_sentence never matches the original sentence.
text = """ BLAH BLAH BLAH BLAH""" old_sentence = Sentence(text) tokens = [Token(token.text) for token in sentence] new_sentence = Sentence(tokens)
text = """ BLAH BLAH BLAH BLAH""" old_sentence = Sentence(text) tokens = [Token(text[token.start_position:token.end_position]) for token in sentence] new_sentence = Sentence(tokens)
it might have something to do with the fact that some (all?) tokenizer are lossy, you can try with a different tokenizer:
tokenized = your_tokenizer.tokenize(raw)
#print(tokenized)
sentence = Sentence(tokenized)
tagger.predict(sentence)
from flair.
Related Issues (20)
- [Bug]: Model double sizes after training. Ho to make FP16 for prediction? HOT 7
- [Bug]: Cannot use NER models offline HOT 1
- [Question]: "Redewiedergabe" taggers for flair versions > 0.10 HOT 1
- [Bug]: Receiving Named Entity from Token with `get_label()` does not work as expected HOT 2
- [Question]: Why not include cell type detection in Hunflair? HOT 3
- [Question]: Regarding the issue of reading label spans during corpus construction.
- [Feature]: Latin NLP Model HOT 2
- [Question]: CSVClassificationCorpus and tagger HOT 1
- [Bug]: unable to load upos-multi with SequenceTagger - AttributeError HOT 3
- Assertion error while reading training data
- [Bug]: Sentence Splitters do not set previous and next sentence HOT 1
- [Question]: Low and different results when reload the final_model.pt HOT 1
- [Question]: Semantic Role Labelling Usage Instruction HOT 1
- [Bug]: Whitespace offsets not properly utilized in TransformerEmbeddings
- A missing implementation of a method causing training to be stopped HOT 3
- [Bug]: splitter.split() `ValueError: substring not found` for specific character combination HOT 5
- [Question]: Regarding few shot multi label text classification HOT 1
- [Bug]: Sentence.get_token() incorrectly returning None HOT 2
- [Question]: How to add NER-Entities generated from another model into the dataset for fine-tuning? HOT 1
- [Question]: How does .embed(Sentence) work under the hood? HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from flair.