This program demonstrates the use of a Pipe-Filter architectural pattern to implement a text processing platform. The goal of this platform is to determine the top 10 significant words that occur in the input text.
Using the included Maven build tool, run from the root directory:
> mvn package
A guide for the operation of Maven can be found here.
Run the included jar file:
> java -jar PipeFilter-0.0.1-SNAPSHOT.jar
This implementation then prompts the user to enter the relative path of the text file:
> Enter path of the text file:
> text_files/kjbible.txt
DataPump
- Reads the text file and injects it into pipeline.
FilterRemoveNonAlpha
- Removes all non-alphabetic characters.
FilterRemoveUpper
- Converts all words to lowercase.
FilterRemoveStopWords
- Removes all stopwords (ie. non-significant words/terms).
FilterRootForms
- Converts words down to their root forms.
DataSink
- Counts filtered words and displays the top 10 occurrences.