Data Visualization using Python and Matplotlib
This project requires pip to run.
Clone the project and Install the Python requirements using pip.
$ git clone https://github.com/abhishtagatya/zipflaw
$ cd zipfLaw
$ pip install -r requirements.txt
Project directory
.
βββ data
β βββ wordcount.json
βββ docs
β βββ img
β βββ zipf_py.png
βββ essay.txt
βββ LICENSE
βββ README.md
βββ requirements.txt
βββ src
β βββ __init__.py
β βββ visual.py
βββ zipf
Before running 'zipf', please add in some text in 'essay.txt' otherwise it will throw and error. Make sure to check out the 'wordcount.json' for the full visualization of the word frequencies.
The project is in 'src/visual.py' go tweak it to your liking.
The project was solely to learn about Zipf's Law on the frequency distribution of words. It was hypothesized that words such as (e.g., "the", "of", "and", "to", "a") are considered high-frequency words and words such as (e.g., βaccordion,β βcatamaran,β βravioliβ) are considered low-frequency words (in english of course).
So I've decided to test out my programming skills to check whether this was true or not. You can definitely do this by yourself and add modifications to it if you see fit.
Although it works, it still lacks accuracy and good data visualization because for some reason Matplotlib messed up my arrangements and that is why you might see that it is not ordered correctly when displayed.
Note that when doing this, make sure you put in enough text in 'essay.txt' or any text file you want to feed it. It seems that the data is most accurate when given more text. So the longer the text the more accurate results you'll get.
- Python 2/3 - Python Software Foundation
- Matplotlib - Data Visualization Library
- Abhishta Gatya - Computer Science Student