Comments (7)
Hey @guggio thanks for reporting the issue and also for finding the problematic code.
We already fixed this bug and are merging quite some changes into master this minute. Please update your farm installation with a git pull if you installed through "pip install --editable ." Please keep me updated if that resolved your issue. Thanks!
from farm.
Hi @Timoeller thanks a lot for the fix! I can run my BIO-classification model now. However, there is an additional issues I faced:
My NERProcessor settings from above do not work anymore, since setting a dev_split != 0.0 throws the following error message:
/usr/local/lib/python3.6/dist-packages/farm/data_handler/data_silo.py in _get_dataset(self, filename)
65 dicts = random.shuffle(dicts)
66
---> 67 dict_batches_to_process = int(len(dicts) / self.multiprocessing_chunk_size)
68 num_cpus = min(mp.cpu_count(), self.max_processes, dict_batches_to_process) or 1
69
TypeError: object of type 'NoneType' has no len()
from farm.
We changed the way the dev set is splitted away from train set, there might be some issues there that didn't come up during our test pipeline. I will look into that next.
There also seems to be a small mistake in the classification report. I am currently working on that, too.
from farm.
Ok, we fixed the bug with the classification report.
The issue you reported on a dev_split != 0.0 seems to be coming from a error in _file_to_dicts() in processor.py file, because the dicts returned from that function is None.
In _file_to_dicts is a function called read_ner_file() that has quite some limitations on what it can read. Please have a look here to see examples of how the input format should look like.
Otherwise you could adjust the _file_to_dicts() and create a own data processor.
Maybe you can post the input format of your NER data, so we could help you out.
from farm.
Thanks a lot for the bugfix and your help :) I had to write two textfiles based on my dataframe (probably not the most efficient solution).
I was able to run the model on my data and it seems like I am achieving very promising results.
Thanks again for your great work!
from farm.
I guess the problem lies in the eval.py file. Since I am doing "per_token" classification, it takes the seqeval.metrics.classification_report which does not take target_names as input.
from seqeval.metrics import classification_report as token_classification_report
from sklearn.metrics import classification_report
...
if self.classification_report:
if head.ph_output_type == "per_token":
report_fn = token_classification_report
elif head.ph_output_type == "per_sequence":
report_fn = classification_report
elif head.ph_output_type == "per_token_squad":
report_fn = lambda *args, **kwargs: "not Implemented"
elif head.ph_output_type == "per_sequence_continuous":
report_fn = r2_score
else:
raise NotImplementedError
# CHANGE PARAMETERS, not all report_fn accept digits
if head.ph_output_type == "per_sequence_continuous":
result["report"] = report_fn(
label_all[head_num], preds_all[head_num]
)
else:
result["report"] = report_fn(
label_all[head_num], preds_all[head_num], digits=4, target_names=head.label_list)
from farm.
Nice! If you have good results to share we are more than happy to celebrate with you : )
from farm.
Related Issues (20)
- MTL Processor QA + Classification HOT 1
- Querying API Docker examples HOT 1
- Should be possible to use the proper aggregated loss for early stopping HOT 3
- AdaptiveModel.convert_to_onnx does not save float16 model conversion to output_path HOT 1
- ONNXAdaptiveModel causes NameError: name 'onnxruntime' is not defined HOT 1
- Error reporting using other pre training models HOT 2
- how to predict on single data points for classification problem.? HOT 2
- Error Importing Inferencer HOT 4
- Retreiver Fine Tuning : Are language models like roberta, gpt2 supported to use in retreiver? HOT 3
- Can't train a language model HOT 4
- Max token size? HOT 2
- summarization HOT 1
- Combine several models into one with several prediction heads HOT 2
- Need a guidance on Multi label Classification HOT 2
- Extract embedding while using parameter "extraction_strategy="per_token"" HOT 1
- Which pytorch (and other package) versions are actually required HOT 4
- Current version `0.8.1-snapshot` is not valid according to PEP 440 and causes installation problems HOT 1
- Columns and DataType Not Explicitly Set on line 147 of wordembedding_utils.py
- Installation error
- IndexError: too many indices for array: array is 2-dimensional, but 3 were indexed
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from farm.