Coder Social home page Coder Social logo

kalebu / plagiarism-checker-python Goto Github PK

View Code? Open in Web Editor NEW
254.0 10.0 121.0 61 KB

A python project for checking plagiarism of documents based on cosine similarity

Home Page: https://kalebujordan.dev/how-to-detect-plagiarism-in-text-using-python/

Python 100.00%
plagiarism-checker-python cosine-similarity python-cosine-similarity python-plagiarism-detector python-nlp python-nlp-projects machine-learning python-project python-programming tanzania

plagiarism-checker-python's Introduction

Plagiarism-checker-Python

This repo consists of a source code of a Python script which detects plagiarism in a textual document using cosine similarity.

Become a patron

How is it Done?

You might be wondering how plagiarism detection on textual data is done, well it ain't as complicated as you may think.

We all know that computers are good with numbers; so in order to compute the similarity between two text documents, the textual raw data is transformed into vectors => arrays of numbers and from that, we make use of basic knowledge of vectors to compute the similarity between them.

This repo contains a basic example on how to do that.

Getting Started

To get started with the code on this repo, you need to either clone or download this repo into your machine as shown below;

git clone https://github.com/Kalebu/Plagiarism-checker-Python

Dependencies

Before you begin playing with the source code, you might need to install dependencies just as shown below;

pip3 install -r requirements.txt

Running the App

To run this code you need to have your textual documents in your project directory with the .txt extension. When you run the script, it will automatically load all the documents with that extension and then compute the similarities between them as shown below;

$-> cd Plagiarism-checker-Python
$ Plagiarism-checker-Python-> python3 app.py
('john.txt', 'juma.txt', 0.5465972177348937)
('fatma.txt', 'john.txt', 0.14806887549598566)
('fatma.txt', 'juma.txt', 0.18643448370323362)

A Python Library?

Would you like to use a Python library instead to help you compare strings and documents without spending time writing the vectorizers by yourself, then take a look at Pysimilar.

Explore it

Explore it and twist it to your own use case. In case of any questions feel free to reach me directly at [email protected].

Issues

In case you have any difficulties or issues while trying to run the script you can raise an issue.

Pull Requests

If you have something to add, I welcome pull requests on improvement; your helpful contribution will be merged as soon as possible.

Give it a Star

If you find this repo useful, give it a star so that many people can get to know it.

Credits

All the credit goes to kalebu.

plagiarism-checker-python's People

Contributors

favour-olumese avatar kalebu avatar kalebujordan avatar naereen avatar tr1ms avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

plagiarism-checker-python's Issues

Error while runing app.py

I am facing the folowing error Traceback (most recent call last):
File "/content/Plagiarism-checker-Python/app.py", line 14, in
vectors = vectorize(student_notes)
File "/content/Plagiarism-checker-Python/app.py", line 10, in vectorize
def vectorize(Text): return TfidfVectorizer().fit_transform(Text).toarray()
File "/usr/local/lib/python3.10/dist-packages/sklearn/feature_extraction/text.py", line 1846, in fit_transform
X = super().fit_transform(raw_documents)
File "/usr/local/lib/python3.10/dist-packages/sklearn/feature_extraction/text.py", line 1202, in fit_transform
vocabulary, X = self._count_vocab(raw_documents,
File "/usr/local/lib/python3.10/dist-packages/sklearn/feature_extraction/text.py", line 1133, in _count_vocab
raise ValueError("empty vocabulary; perhaps the documents only"
ValueError: empty vocabulary; perhaps the documents only contain stop words

Cannot handel big txt files

in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 832: character ma
ps to

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.