Coder Social home page Coder Social logo

codezonediitj / pydatastructs Goto Github PK

View Code? Open in Web Editor NEW
199.0 5.0 269.0 644 KB

A python package for data structures and algorithms

Home Page: https://pydatastructs.readthedocs.io/en/stable/

License: Other

Python 66.70% C++ 33.30%
data-structures algorithms algorithms-and-data-structures sports-programming computer-science python3 hacktoberfest closember

pydatastructs's Introduction

PyDataStructs

Build Status Discord Join the chat at https://gitter.im/codezonediitj/pydatastructs Discuss at pydatastructs@googlegroups.com contributions welcome codecov

About

  • PyDataStructs project aims to be a Python package for various data structures and algorithms (including their parallel implementations).

  • We are also working on providing C++ backend via Python C-API for high performance use cases.

Why PyDataStructs?

  • Single package for all your data structures and algorithms

  • Consistent and Clean Interface - The APIs we have provided are consistent with each other, clean, and easy to use. We make sure of that before adding any new data structure or algorithm.

  • Well Tested - We thoroughly test our code before making any new addition to PyDataStructs. 99 percent lines of our code have already been tested by us.

Installation

If you are using Anaconda/Mamba, you can setup your development environment by executing the following commands,

conda env create --file environment.yml
conda activate pyds-env

You can install the library by running the following command,

python scripts/build/install.py

For development purposes i.e., if you intend to be a contributor,

python scripts/build/develop.py

Make sure you change your working directory to pydatastructs before executing any of the above commands. Also, your python version should be at least 3.8.

Testing

For testing your patch locally follow the steps given below,

  1. Install pytest-cov. Skip this step if you are already having the package.
  2. Run, python3 -m pytest --doctest-modules --cov=./ --cov-report=html. Look for, htmlcov/index.html and open it in your browser, which will show the coverage report. Try to ensure that the coverage is not decreasing by more than 1% for your patch.

For a good visualisation of the different data structures and algorithms, refer the following websites:

You can use the examples given in the following book as tests for your code:

Light weighted testing (without benchmarks)

Make sure you have activated the conda environment: pyds-env and your working directory is ../pydatastructs.

In the terminal, run: python -c "from pydatastructs.utils.testing_util import test; test()".

This will run all the test files, except benchmark tests. This should be used if benchmark tests are computationally too heavy to be run on your local machine.

Why do we use Python?

  • As we know Python is an interpreted language and hence executing programs in it is slower as compared to C++.

  • We still decided to use Python because the software development can happen at a much faster pace and it is much easier to test various software designs and APIs as coding them out takes no time in Python.

  • However, keeping the need of the users in mind, we are also working on providing a C++ backend, which will happen quickly as we would be required to just translate the tested code rather than writing it from scratch.

How to contribute?

Follow the steps given below,

  1. Fork, https://github.com/codezonediitj/pydatastructs/
  2. Execute, git clone https://github.com/codezonediitj/pydatastructs/
  3. Change your working directory to ../pydatastructs.
  4. Execute, git remote add origin_user https://github.com/<your-github-username>/pydatastructs/
  5. Execute, git checkout -b <your-new-branch-for-working>.
  6. Make changes to the code.
  7. Add your name and email to the AUTHORS, if you wish to.
  8. Execute, git add ..
  9. Execute, git commit -m "your-commit-message".
  10. Execute, git push origin_user <your-current-branch>.
  11. Make PR.

That's it, 10 easy steps for your first contribution. For future contributions just follow steps 5 to 10. Make sure that before starting work, always checkout to master and pull the recent changes using the remote origin and then start following steps 5 to 10.

See you soon with your first PR.

It is recommended to go through the following links before you start working.

Guidelines

We recommend you to join our discord channel for discussing anything related to the project.

Please follow the rules and guidelines given below,

  1. Follow the numpydoc docstring guide.
  2. If you are planning to contribute a new data structure then first raise an issue for discussing the API, rather than directly making a PR. Please go through Plan of Action for Adding New Data Structures.
  3. For the first-time contributors we recommend not to take a complex data structure, rather start with beginner or easy.
  4. We don't assign issues to any individual. Instead, we follow First Come First Serve for taking over issues, i.e., if one contributor has already shown interest then no comment should be made after that as it won't be considered. Anyone willing to work on an issue can comment on the thread that he/she is working on and raise a PR for the same.
  5. Any open PR must be provided with some updates after being reviewed. If it is stalled for more than 4 days, it will be labeled as Please take over, meaning that anyone willing to continue that PR can start working on it.
  6. PRs that are not related to the project or don't follow any guidelines will be labeled as Could Close, meaning that the PR is not necessary at the moment.

The following parameters are to be followed to pass the code quality tests for your Pull Requests,

  1. There should not be any trailing white spaces at any line of code.
  2. Each .py file should end with exactly one new line.
  3. Comparisons involving True, False, and None should be done by reference (using is, is not) and not by value(==, !=).

Keep contributing!!

Thanks to these wonderful people ✨✨:

https://github.com/codezonediitj/pydatastructs/graphs/contributors

pydatastructs's People

Contributors

abekaesh avatar aimaanhasan avatar akshatahuja1 avatar anirudhsai20 avatar arathyrose avatar arvind-raj06 avatar carolluca avatar czgdp1807 avatar din1881 avatar harshcasper avatar harsheetkakar avatar henokb avatar iamrajiv avatar jaythorat avatar jeanpierremr avatar jprillaman avatar kajalsinghbaghel avatar kiran-venkatesh avatar kishan-ved avatar manaswinidas avatar pratikgl avatar prshnt19 avatar rupesh-darimisetti avatar sak-codes avatar saptashrungi avatar shivangi821 avatar sjathin avatar smit-create avatar taruntomar122 avatar youknowqyh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

pydatastructs's Issues

Use DynamicOneDimensionalArray in ArrayStack

Description of the problem

Currently we use fixed size arrays in ArrayStack which is nothing but wastage of memory. To reduce that problem, DynamicOneDimensionalArrayshould be used inside the methods of ArrayStack.

The way of using is left to the one who will fix the issue.

Example of the problem

References/Other comments

Add m-ary trees

Description of the problem

Adding m-ary trees will be a nice idea. May be a array based(allows printing the trees) or pointer(better space utilisation) or both would be better.

Example of the problem

References/Other comments

.. [1] https://en.wikipedia.org/wiki/M-ary_tree

Customised deepcopy for pydatastructs

Description of the problem

Currently copy.deepcopy doesn't work for classes which inherit other classes. So, to fix this we will be needing our own deepcopy function. It should be added in utils.misc_util.

Example of the problem

References/Other comments

#22

Add binary heap

Description of the problem

A new file is to be created under the trees directory named, heaps.py and in that implementation of binary heap is to be added.
Note that before making a PR first propose an API and class structure below. Once finalized, then move on to make a PR.

Example of the problem

References/Other comments

https://en.m.wikipedia.org/wiki/Binary_heap

Add Cartesian tree

Description of the problem

Example of the problem

References/Other comments

Add Linked Lists

Description of the problem

Currently the package does not have Queues or Linked Lists.
While working with issue #22-Add Queue
It was discovered that to add a Queue which is efficient enough to perform popleft() action in O(1) time we need linked lists to implement Queues just like the collections.deque implementaion of a Queue.

Since the original goal of issue #22 was to reduce dependencies on other libraries, LInked List must be added to pydatastructs to ensure an efficient implementation of Queues.

Example of the problem

sample_queue = Queue()
for i in range(N) # N be a large number
... sample_queue.append('#')
...
sample_queue.popleft() # Takes O(N) time

References/Other comments

See the original issue -
#22

Read about the efficiency issue here -
https://www.geeksforgeeks.org/deque-in-python/
https://stackoverflow.com/questions/55578874/dequeue-complexity-on-using-python-list-as-a-queue

Make BinaryHeap._heapify use iterative logic

Description of the problem

The recently merged #46 uses recursive logic for heapifying the elements. The maximum recursion depth can cause this logic to fail. Convert that to logic an iterative one.

Example of the problem

References/Other comments

Tests using `raises` are not being asserted

Description of the problem

I was observing the test files and found that tests which use raises are not being asserted which was due to absence of return statement in,

def raises(exception, code):
"""
Utility for testing exceptions.
Parameters
==========
exception
A valid python exception
code: lambda
Code that causes exception
"""
with pytest.raises(exception):
code()

However with merging of #51 raises will return True on successful execution, hence all the tests of the following format,

raises(...)

are to be replace with the format below,

assert raises(...)

Example of the problem

References/Other comments

A typo in the README

Description of the problem

Try to ensure that the coverage is not decreasing by less than 1% for your patch.

Shouldn't it be by more than 1%

Example of the problem

References/Other comments

Allow using array for constructing binary trees

Description of the problem

Idea of using arrays for constructing binary trees originated from #38 (comment) which uses arrays for constructing binary heaps.
The challenge is to make the change without breaking existing API. One approach can be using keyword arguments like, elements and the transformed API would look like,

def __new__(cls, key=None, root_data=None, comp=None,
                is_order_statistic=False, *args, **kwargs):

If the above is done then the existing tests for binary_trees.py can be shortened.

Example of the problem

References/Other comments

Add Queue

Description of the problem

Currently we use Queue from collections module. We want that this should be implemented in pydatastructs itself, however, without breaking the current code, i.e., the API should be similar to the one of collections module.

API:

  1. An empty queue should be constructed by using Queue().

  2. New elements should be added to the end by using .append(elem).

  3. The elements should be removed from the front using .popleft().

  4. The number of elements in the queue should be returned by len i.e, __len__ method should be over-rided in the Queue class.

Example of the problem

References/Other comments

See the code of OneDimensionalSegmentTree in pydatastructs.tree.space_partitioning.trees and BinaryTreeTraversal.breadth_first_search to know more about the API requirements.

Replace `Node` by `TreeNode` and `GraphNode`

Description of the problem

Currently we have a generic Node class which basically has methods and data members for trees. However, when we will add graphs, the above Node class will not be compatible.
It would be better to use specific classes for, trees i.e., TreeNode(rename Node -> TreeNode) and a new GraphNode.
This issue aims at discussing the API for the latter.

Example of the problem

References/Other comments

Array implementation of doubly ended queue (deque)

Description of the problem

I propose the implementation of a doubly ended queue (deque) using arrays. A deque allows insertions and deletions from both the front and the rear end. The various methods available will be insert front, insert rear, delete front , delete rear in addition to returning the element at front or rear.

Hope this task will be assigned to me

Incorrectness in Documentation

Description of the problem

There are some places in the documentation where it is said that Stack to implement iterative version of recursive algorithms but in the code Queue is used. A fix is needed in the documentation and it should be mentioned that Queue is used instead of Stack.

The task is to search the entire documentation(which is not too large at the time of the creation of this issue) and correct it. No change in code is required.

Example of the problem

References/Other comments

Track coverage

Description of the problem

The coverage should be near to 100%. However, currently, it stands at 97%. Write tests for the uncovered lines, to increase the coverage.

Example of the problem

The uncovered lines of the code are shown in red as can seen in this report. Similarly, look at other files, and write tests so that the red lines are covered.

References/Other comments

This issue will never be closed and will be pinned.

Add code coverage bot

Description of the problem

Currently we don't have any code coverage mechanism. Please suggest your ideas to automatically compute the code coverage of every PR.

Example of the problem

References/Other comments

Improve README.md

Description of the problem

The README of the project should be updated. We need to replace, Who are we? and How are we different? with an About section which should contain the following words,

Currently, the aim of the project is to be a Python package for various data structures in computer science. In addition, we are also working on including parallel algorithms. To the best of our knowledge there doesn't exist any well designed library/package which has covered most of the data structures and algorithms including their parallel implementation. 

In future, i.e, after a few releases of the package when the software design will become stable, we also aim to provide APIs for the code in C++ and Java as well. For clarity, the interested students aren't expected to know C++ and Java. Knowledge of Python is enough for this project.

Example of the problem

References/Other comments

Use python's builtin abstract base classes

Description of the problem

The current code creates a new abstract class, as in the example below. Python has dedicated abc.ABCs for abstract base classes. It would provide us more flexibility and conform better to python's standards.

Example of the problem

https://github.com/codezonediitj/pydatastructs/blob/master/pydatastructs/linear_data_structures/arrays.py

References/Other comments

https://docs.python.org/3/library/abc.html
https://www.python.org/dev/peps/pep-3119/

Armstrong no

Hi! I want to add code for armstrong number in this project. Plz assign this task to me as a gssoc'20 participant.

Add KD Trees

Description of the problem

K dimensional trees is a quite important data structure for storing high dimensional data. This issue aims at adding the same.
The task is planned to be completed in two phases, first we will add static k-d trees and in the second phase we will add, dynamic k-d trees.

API Design and Class Hierarchy Overview

Static k-d trees - These trees will be built on a given data and once built, it will not be possible to modify it.
The class name for such trees is, StaticKDTree. It will support the following operations:

  1. __new__ - Meant for building the tree.
  2. nearest_neighbour
  3. find_min
  4. find_max
  5. search

Dynamic k-d trees - These trees will be built on a given data and it will be possible to modify the tree.
The class name for such trees is, DynamicKDTree. In addition to the operations supported by static k-d trees, the following operations will be supported,

  1. insert
  2. delete
  3. balance

Example of the problem

References/Other comments

[1] https://en.wikipedia.org/wiki/K-d_tree
[2] https://www.cs.cmu.edu/~ckingsf/bioinfo-lectures/kdtrees.pdf

Add information about Code quality tests in guidelines

Description of the problem

After merging #72 the code quality is checked via CI, so information is required to be added in REDAME specifically under https://github.com/codezonediitj/pydatastructs#guidelines

The following points are to be added,

  1. There should be no trailing white spaces in any line.
  2. There should only one new line at the end of the each .py file.
  3. Any comparison involving True, False or None should be done by reference(is, is not) and not by value(==, !=).

Example of the problem

References/Other comments

Add `.vs` files to `.gitignore`

Description of the problem

.vs files are not ignore as can be seen in #66
They should be added to .vs files as they are automatically generated by system, probably only on Windows.

Example of the problem

References/Other comments

Remove imports from `__future__`

Description of the problem

As we are not concerned with Python 2, removing the following lines,

from __future__ import print_function, division

will be less confusing for the developers.

Example of the problem

References/Other comments

Add information for Devs in README.md

Description of the problem

Currently, the README.md has very little information for developers/contributors.
IMO, the following should be added after https://github.com/codezonediitj/pydatastructs#installation

Testing
----------
For testing your patch locally follow the steps given below,

1. Install [pytest-cov](https://pypi.org/project/pytest-cov/). Skip this step if you are already having the package.
2. Run, `python3 -m pytest --doctest-modules --cov=./ --cov-report=html`. Look for, `htmlcov/index.html` and open it in your browser, which will show the coverage report. Try to ensure that the coverage is not decreasing by less than 1% for your patch. 

Example of the problem

References/Other comments

Use comparison by reference for bool type objects

Description of the problem

Currently there are many places in the code base which use == and != while comparing bool type objects.
Replace all such occurrences of == <bool-type-object> with is <bool-type-object> and != <bool-type-object> with is not <bool-type-object>. For example, == None with is None and != None with is not None.

Example of the problem

References/Other comments

[1] https://dbader.org/blog/difference-between-is-and-equals-in-python

Add breadth first search for graphs

Description of the problem

As graphs are now added to the master branch we can start adding graph algorithms. For a start we will move ahead with breadth first search(BFS). We look forward to add two versions of the algorithm,

  1. Serial BFS: Standard BFS using FIFO queue. Open for anyone as it is pretty easy.
  2. Parallel BFS: Much harder as there are various papers presenting different versions of implementation. We plan to use the following papers,
    i. http://supertech.csail.mit.edu/papers/pbfs.pdf (not used in the implementation)
    ii. https://people.eecs.berkeley.edu/~aydin/sc11_bfs.pdf (not used in the implementation)

Example of the problem

References/Other comments

Improve examples

Description of the problem

In the following snippet,

>>> from pydatastructs import DoublyLinkedList
>>> dll = DoublyLinkedList()
>>> dll.append(6)
>>> dll[0].data
6
>>> dll.head.data
6
>>> dll.append(5)
>>> dll.append_left(2)
>>> print(dll)
[2, 6, 5]
>>> dll[0].data = 7.2
>>> dll.extract(1).data
6
>>> print(dll)

print has been used for showing string output, instead, str should have been used. Changes should be made in the above part of the code as well as such things should be checked in other parts of the codebase as well and a single PR should be made for such improvements.

Example of the problem

References/Other comments

Add quality tests for indendation size

Description of the problem

We have no tests for indendation size. PEP 8 suggests to keep it to 4 spaces. Tests should be added to utils/test_code_quality.py for checking this.

Example of the problem

References/Other comments

`obj = object.__new__(cls)`

Description of the problem

Currently any class object is created by,

obj = object.__new__(cls)

For better readability, each such line should be replaced by the abstract superclass of that class.
Something like,

obj = <name_of_abstract_super_class>.__new__(cls)

For example, in arrays.py, OneDimensionalArray is created by,


However, the above should be replaced by,

obj = Array.__new__(cls)

Example of the problem

References/Other comments

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.