Coder Social home page Coder Social logo

ikegami-yukino / madoka-python Goto Github PK

View Code? Open in Web Editor NEW
25.0 3.0 2.0 237 KB

Memory-efficient Count-Min Sketch Counter (based on Madoka C++ library)

License: BSD 3-Clause "New" or "Revised" License

C++ 95.78% Python 3.54% Jupyter Notebook 0.67%
data-sketches counter python-wrapper memory-efficient probabilistic-data-structures

madoka-python's Introduction

Hi there ๐Ÿ‘‹

Anurag's GitHub stats Top Langs

madoka-python's People

Contributors

ikegami-yukino avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

madoka-python's Issues

Is it possible to filter out low counts?

Hi, thanks for the package.

I'm running word counts on large corpora (e.g. Wikipedia text) and found this package after Python dictionaries was giving me memory errors.

I'm new to sketching algorithms so forgive my ignorance but is it possible to filter out low counts after building the sketch object? For example, I'd like to get the top 1 million words sorted by counts in descending order. This is to save some memory.

Tangential question: once the counts are in a sketch object, it can't be converted to a dictionary because there is no key information?

Cheers

Installation issue

I'm having trouble pip installing this package. I'm using Python 3.7.1 and macOS High Sierra.

pip install madoka

Results in the following error

Collecting madoka
  Using cached https://files.pythonhosted.org/packages/da/eb/95288b1c4aa541eb296a6271e3f8c7ece03b78923ac47dbe95d2287d9f5e/madoka-0.7.1.tar.gz
Building wheels for collected packages: madoka
  Running setup.py bdist_wheel for madoka ... error
  Complete output from command /anaconda3/bin/python -u -c "import setuptools, tokenize;__file__='/private/var/folders/tf/8lz5km5n0n509s9y_f1wpj3h0000gn/T/pip-install-dg494dpd/madoka/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /private/var/folders/tf/8lz5km5n0n509s9y_f1wpj3h0000gn/T/pip-wheel-xxs7ruaf --python-tag cp37:
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.macosx-10.7-x86_64-3.7
  creating build/lib.macosx-10.7-x86_64-3.7/madoka
  copying madoka/madoka.py -> build/lib.macosx-10.7-x86_64-3.7/madoka
  copying madoka/__init__.py -> build/lib.macosx-10.7-x86_64-3.7/madoka
  running build_ext
  building '_madoka' extension
  creating build/temp.macosx-10.7-x86_64-3.7
  creating build/temp.macosx-10.7-x86_64-3.7/src
  gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/anaconda3/include -arch x86_64 -I/anaconda3/include -arch x86_64 -I/anaconda3/include/python3.7m -c madoka_wrap.cxx -o build/temp.macosx-10.7-x86_64-3.7/madoka_wrap.o -std=c++11
  warning: include path for stdlibc++ headers not found; pass '-std=libc++' on the command line to use the libc++ standard library instead [-Wstdlibcxx-not-found]
  madoka_wrap.cxx:3059:10: fatal error: 'stdexcept' file not found
  #include <stdexcept>
           ^~~~~~~~~~~
  1 warning and 1 error generated.
  error: command 'gcc' failed with exit status 1

  ----------------------------------------
  Failed building wheel for madoka
  Running setup.py clean for madoka
Failed to build madoka
Installing collected packages: madoka
  Running setup.py install for madoka ... error
    Complete output from command /anaconda3/bin/python -u -c "import setuptools, tokenize;__file__='/private/var/folders/tf/8lz5km5n0n509s9y_f1wpj3h0000gn/T/pip-install-dg494dpd/madoka/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /private/var/folders/tf/8lz5km5n0n509s9y_f1wpj3h0000gn/T/pip-record-eg369wuf/install-record.txt --single-version-externally-managed --compile:
    running install
    running build
    running build_py
    creating build
    creating build/lib.macosx-10.7-x86_64-3.7
    creating build/lib.macosx-10.7-x86_64-3.7/madoka
    copying madoka/madoka.py -> build/lib.macosx-10.7-x86_64-3.7/madoka
    copying madoka/__init__.py -> build/lib.macosx-10.7-x86_64-3.7/madoka
    running build_ext
    building '_madoka' extension
    creating build/temp.macosx-10.7-x86_64-3.7
    creating build/temp.macosx-10.7-x86_64-3.7/src
    gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/anaconda3/include -arch x86_64 -I/anaconda3/include -arch x86_64 -I/anaconda3/include/python3.7m -c madoka_wrap.cxx -o build/temp.macosx-10.7-x86_64-3.7/madoka_wrap.o -std=c++11
    warning: include path for stdlibc++ headers not found; pass '-std=libc++' on the command line to use the libc++ standard library instead [-Wstdlibcxx-not-found]
    madoka_wrap.cxx:3059:10: fatal error: 'stdexcept' file not found
    #include <stdexcept>
             ^~~~~~~~~~~
    1 warning and 1 error generated.
    error: command 'gcc' failed with exit status 1

    ----------------------------------------
Command "/anaconda3/bin/python -u -c "import setuptools, tokenize;__file__='/private/var/folders/tf/8lz5km5n0n509s9y_f1wpj3h0000gn/T/pip-install-dg494dpd/madoka/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /private/var/folders/tf/8lz5km5n0n509s9y_f1wpj3h0000gn/T/pip-record-eg369wuf/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /private/var/folders/tf/8lz5km5n0n509s9y_f1wpj3h0000gn/T/pip-install-dg494dpd/madoka/

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.