Trainer.java Takes unprocessed data set and produces processed dataset as suitable for Mahout file format. Responsible for training Complementary Naive bayes algorithm and build a statistical model.
Classifier.java Takes an unclassified data directory and classifies the documents. Creates separate subdirectories for each category and writes the files onto the directory.
Setting Up Parameters in settings.properties file Bayesparameters
Gramsize=2 // Ngram size Algorithm=cbayes // our classification algorithm DefaultCategory=unknown // Default Category DataSource=hdfs // Hadoop File System Encoding=UTF-8 // Unicode Alpha=1.0 //Smoothing parameter
For Trainer.java
TrainSet=/home/developer/dataset_rev/freshrevs/train/ // training set location which containing subdirectories of each category ProcessedSet=/home/developer/dataset_rev/freshrevs/processedTrain/ // Processed Output Directory
For Classifier.java
ModelPath=/home/developer/dataset_rev/freshrevs/model/ // Path to store and retrieve Model IpDirPath=/home/developer/dataset_rev/freshrevs/test/pos/ // Unclassifed data set OpDirPath=/home/developer/dataset_rev/freshrevs/classified/ // Path to store classified documents