Comments (2)
I have stumbled upon the same problem. Another possible solution - calling from the main directory:
python -m ulmfit.postprocess_wikitext --path="data/wiki/wikitext-2" --lang="en"
I am currently trying to reproduce all the steps - starting with training on wikitext - writing down all the commands on the way.
Not sure what is the intended approach though.
from multifit.
The way @tpietruszka showed is what I'm using.
There are .sh scripts that should deal with this. But Guys there is a bug in postprocess_wikitext as it removes information about articles split in wikipedia which basically breaks the downstream tasks. So if you want to give it a try, try to fix the postprocess_wikitext.py so that it includes a way to split the wikipedia text by articles. see #25. This is an easy fix but I'm away Today so if you want to start training implement the fix. If you aren't in hurry just wait with your experiments until #25 is fixed
from multifit.
Related Issues (20)
- Different size of CLS unsupervised data between .csv and original .xml files HOT 1
- Multifit inference problem HOT 2
- Training custom classifier HOT 1
- Specifying a validation set HOT 1
- Problems with reproducing zero-shot learning results HOT 2
- Where can I find the dataset de.train.csv? HOT 3
- Kernel restarted HOT 1
- Saliency maps
- Tokenizer HOT 1
- Get activations of a specific layer of the multifit model
- Missing File in CLS-DE.ipynb HOT 1
- Always labels are tokenizing instead of text column, Kindly fix the issue facing HOT 1
- Download music/books data in german version
- OOM during finetuning
- File exists but it doesn't found it!!!!! HOT 1
- fp16
- Label_for_lm() takes too much time!
- multifit does'nt work on Google Colab HOT 4
- Port to Fastai 2 HOT 1
- Create classifier with fastai v1.0
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from multifit.