Comments (3)
Hey @tomaarsen ,
I have may a possible alternative solution: In Flair we also can construct and predict from sentences that are given by user. For this tokenization problem we use the v1 version of segtok
. You could split the input into sentences, but for just tokenizing the word_tokenizer
can be used:
https://github.com/fnl/segtok/blob/master/segtok/tokenizer.py#L210
I think this could easily be added in the Model Hub Inference logic:
So inputs
could first be tokenized by word_tokenizer
. I think that segtok
would be a great alternative and more lightweight compared to spaCy
.
Another alternative: not just implementing it on the Model Hub side: maybe it can be implemented in model.predict
directly 🤔
from spanmarkerner.
I'll certainly consider this approach, whether with segtok
, spaCy
or NLTK
. The spaCy
version is already implemented.
By default, perhaps I can apply the tokenization only if Hello, there.
tokenizes differently than Hello , there .
?
from spanmarkerner.
I've discovered that the issue only persisted for XLM-RoBERTa, and I've been able to tackle it in f2edd06!
from spanmarkerner.
Related Issues (20)
- Prevent re-adding contextual information when training with document-level context
- Unexpectedly (bad) predictions? HOT 5
- How to make this work for overlapping entities? HOT 2
- Choose class-candidates during inference
- Integrate Entity Ruler with Span Marker model HOT 1
- SpanMarker with ONNX models HOT 10
- Cannot train BILOU scheme with no singletons HOT 1
- Hugging Face Space URL not working for FewNERD fine-tuned model HOT 2
- Confusing error thrown when tokens is empty
- should return same no. of list as of inputs HOT 6
- spaCy_integration `.pipe()` does not behave as expected HOT 1
- ValueError: Failed to concatenate on axis=1 because tables don't have the same number of rows HOT 4
- inference time cpu vs gpu HOT 3
- Error loading SpanMarkerTokenizer HOT 2
- SpanMarker library for document level context Gives Error. (RuntimeError: CUDA error: device-side assert triggered) HOT 3
- num_proc not specified in .map functions HOT 3
- Evaluation Metrics with Nervalute HOT 1
- Bert-based models crash HOT 3
- SpanMaker not working on custom dataset HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from spanmarkerner.