This application provides a graphical user interface (GUI) for topic modeling and analysis of textual data. It allows users to convert DOCX files to CSV, select files for analysis, specify the number of topics, visualize detailed information, hierarchy, and barchart representations of topics, and compare subsets of data for duplicate entries.
- DOCX to CSV conversion
- File selection for analysis
- Topic modeling with adjustable number of topics
- Detailed topic information display
- Hierarchy visualization of topics
- Barchart representation of topics
- Comparison of subsets for duplicate entries
Before you begin, ensure you have met the following requirements:
- Python 3.12 or later installed
- Conda (Anaconda or Miniconda) installed
To set up the Topic Modeler application, follow these steps:
-
Clone the repository:
git clone <repository-url> cd <repository-folder>
-
Create and activate a Conda environment:
Replace
<env_name>
with your desired environment name.conda create --name <env_name> python=3.12 conda activate <env_name>
-
Install required packages:
Ensure you are in the project root directory and run:
conda install --file requirements.txt -c conda-forge
The application dependencies are listed in requirements.txt
. This file includes all necessary Python packages, such as pandas, plotly, tkinter, and loguru.
To run the Topic Modeler, execute the following command in the terminal:
python src/app.py
Ensure your working directory is the project's root folder. The GUI should launch, allowing you to interact with the application's features.
Logs are saved to app.log
with rotation set to one week. This can help with troubleshooting and tracking the application's operations over time.
Contributions to this project are welcome. Please ensure to follow the project's coding standards and submit pull requests for review.
This project is licensed under the MIT License - see the LICENSE.md
file for details.
Thank you to all contributors who have helped with the development and improvement of this application.