pip install -r requirement.txt
python3 utama.py --file {path file.txt} --model {path model question_answer_ind}
python3 utama.py --help
Due to files > 25 mb, download the pre-trained model here
Due to files > 25 mb, download datasets here
- Import module.
- Load datasets.
- Separate the dataset between training data and test data (data validation)
- Plotting data with the help of libraries such as: matplotlib or seaborn.
- Perform text preprocessing which functions to process string data into token matrix form for processing by the machine.
- Perform data prediction on the trained model, the goal is to see the results of the trained data.
- learning_rate: 2e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 3
Training Loss | Epochs | Step | Validation Loss |
---|---|---|---|
1.5927 | 1.0 | 8156 | 1.7891 |
1.3008 | 2.0 | 16312 | 1.7875 |
1.0979 | 3.0 | 24468 | 1.8589 |
- Transformers 4.33.0.dev0
- Pytorch 2.0.0
- Datasets 2.1.0
- Tokenizers 0.13.3
- Total Training Data 130+ thousand lines
- Total Test Data (Validation) 118+ thousand lines
- Total Datasets 148+ thousand lines
- Total Training hours 5.30 hours
- Loss = 1.431