This is a pre-processed version of the PAN19 dataset. Read about it here
We built the data set by combining 4 instances to a sample. There are five samples per author and 25 authors for a total of 125 samples (500 instances). The dataset consists of 9 bot samples, 8 male and 8 female samples, although our work didn't require these identifications, so they have been dropped from this data. See CreateFiles.py for details on the creation of the samples.
To use this data, you must first register here