Comments (8)
@Alexixu Possibly your usecase be solved by handling empty instances in the (custom) DatasetReader
you're using. If not, please share more details on what dataset reader you are running this with.
from allennlp.
@AkshitaB The dataset reader is custom class inherit from DatasetReader. Empty instances is ok if the data loader can handle this empty logic. Discarding empty instance is the direct way to do so. But the default implement of DataLoader has no such logic. In my view, throwing an Exception is more suitable for corrupt example and DataLoader catch this Exception and discard examples.
from allennlp.
@Alexixu, you can do this in the DatasetReader
if you override how _read()
works. You can return something like None
from DatasetReader.text_to_instance()
, and then do the right thing in _read()
.
from allennlp.
@dirkgr I have tried this, but the default DataLoader implement can not handle None object, And it will throw an Exception of "None type has no index function".
I suggest it should be handled in an obvious way by Defining a concrete Exception and adding a try catch logic in DataLoader implement.
from allennlp.
What I'm saying is, you can change this behavior in your own DatasetReader
, where you override the _read()
method to throw away the None
objects.
from allennlp.
This issue is being closed due to lack of activity. If you think it still needs to be addressed, please comment on this thread π
from allennlp.
@dirkgr I have tried that exactly, by implement _read function return None object. But the Data Loader (not the Dataset Reader) which call the text_to_instance function can not handle None object.
from allennlp.
The _read()
function should not return None
. The _read()
function is where you detect None
and throw it away (instead of returning it).
Think of it this way: From _read()
you have to return an iterable of instances. AllenNLP does not care how you do this. It only cares that _read()
returns an iterable of instances. So you can do whatever you want inside of _read()
, including skipping instances.
from allennlp.
Related Issues (20)
- error message occuied βzipfile.BadZipFile: File is not a zip fileβ HOT 3
- AllenNLP biased towards BERT HOT 12
- will update to support latest pytorch? HOT 9
- Rich 12.1.0 has been yanked, but has been pinned in `requirements.txt` HOT 1
- Incompatibile packages HOT 2
- Unclear how to use text2sql model HOT 5
- Can't load models with .zip extension HOT 2
- AllenNLP-Light! π π HOT 2
- Is it possible to load my own quantized model from local HOT 3
- Questions about start training from checkpoint using --recover HOT 1
- Is it possible to load my own quantized model from local HOT 9
- SRL BERT performing poorly for german dataset HOT 1
- Remove upper bounds for requirements HOT 1
- Alternative semantic role labeling model HOT 3
- AutoTokenizer config error when load clipmodel HOT 2
- When 'instances_per_epoch' is set up in the class MultiTaskDataLoader, the function __len__ in it will return a wrong answer. HOT 1
- New version with upper bounds on dependencies removed HOT 2
- Incomplete model_state_epoch files HOT 1
- allennlp.common.checks.ConfigurationError: key "token_embedders" is required at location "model.text_field_embedder." HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from allennlp.