Text is taken from malaysian/indonesian news websites, and ChatGPT for tamil (since I couldn't find online tamil text in latin characters) and for the 2 "other" languages: Na'vi from Avatar and Klingon from Star Trek.
generate.py
, source code for generating the dataset.input.custom_test.txt
, unlabelled dataset for prediction.input.custom_correct.txt
, labelled dataset for checking predictions.