Coder Social home page Coder Social logo

ms1034 / document-classification-using-knn Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 13.7 MB

Documents classification using KNN Algorithm a graph based approach along with scrapped data

Python 100.00%
document-classification graph knn-classification maximum-common-subgraph mcs mongodb mongodb-database nlp pymongo python scraping text-preprocessing

document-classification-using-knn's Introduction

Graph-Based Document Classification using KNN's

Description

This project aims to implement a document classification system using graph theory principles. By representing documents as graphs and leveraging graph-based features, the system can categorize documents into predefined topics with improved accuracy compared to traditional vector-based models.

Table of Contents

Installation

To install and set up the project, follow these steps:

  1. Clone the repository to your local machine.

Features

  • Representation of documents as directed graphs.
  • Extraction of graph-based features using common subgraph identification techniques.
  • Classification of documents using the K-Nearest Neighbors (KNN) algorithm based on graph similarity measures.

Contributing

Contributions to the project are welcome! If you'd like to contribute:

  1. Fork the repository.
  2. Create a new branch for your feature or bug fix.
  3. Make your changes and commit them with clear messages.
  4. Push your changes to your fork.
  5. Submit a pull request, clearly describing the changes implemented.

License

This project is licensed under the MIT License.

Credits

Sir Waqas Ali

Contact

For any inquiries or feedback, please contact:

document-classification-using-knn's People

Contributors

ms1034 avatar shahzaib-rafi789 avatar

Watchers

 avatar

document-classification-using-knn's Issues

Evaluation Metrics

  • Assess classification performance using accuracy, precision, recall, and F1-score.
  • Plot confusion matrix to visualize classification results.

KNN Implementation

Implement the KNN algorithm.
Define distance measure based on maximal common subgraph.
Classify test documents based on k-nearest neighbors.

Report Preparation

Compile a comprehensive report detailing methodology, results, and reflections.
Discuss challenges encountered and potential improvements.

Graph Representation

Represent each document as a directed graph.
Define nodes and edges based on term relationships.

Data Collection

Collect or create text data for each assigned topic.

  • Sports
  • Food
  • Business & Finance
    Note : Ensure each page contains approximately 300 words

Dataset Preparation

  • Preprocess the data (tokenization, stop-word removal, stemming).
  • Divide the dataset into training and test sets.

Subgraph Identification

Mine frequent subgraphs within the training set graphs.
Identify common subgraphs as features for classification.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.