Coder Social home page Coder Social logo

vyacheslav-smirnov / sdc Goto Github PK

View Code? Open in Web Editor NEW

This project forked from intelpython/sdc

0.0 1.0 0.0 15.32 MB

A compiler-based big data framework in Python

Home Page: https://intellabs.github.io/hpat-doc/

License: BSD 2-Clause "Simplified" License

Shell 0.09% Batchfile 0.11% Python 90.50% C++ 8.70% C 0.55% CMake 0.04% Makefile 0.02%

sdc's Introduction

Intel® Scalable Dataframe Compiler

Travis CI

Numba* Extension For Pandas* Operations Compilation

Intel® Scalable Dataframe Compiler (Intel® SDC) is an extension of Numba* that enables compilation of Pandas* operations. It automatically vectorizes and parallelizes the code by leveraging modern hardware instructions and by utilizing all available cores.

Intel® SDC documentation can be found here.

Note

For maximum performance and stability, please use numba from intel/label/beta channel.

Installing Binary Packages (conda and wheel)

Intel® SDC is available on the Anaconda Cloud intel/label/beta channel. Distribution includes Intel® SDC for Python 3.6 and Python 3.7 for Windows and Linux platforms.

Intel® SDC conda package can be installed using the steps below:

> conda create -n sdc-env python=<3.7 or 3.6>
> conda activate sdc-env
> conda install sdc -c intel/label/beta -c intel -c defaults -c conda-forge --override-channels

Intel® SDC wheel package can be installed using the steps below:

> conda create -n sdc-env python=<3.7 or 3.6> pip
> conda activate sdc-env
> pip install --index-url https://pypi.anaconda.org/intel/label/beta/simple --extra-index-url https://pypi.anaconda.org/intel/simple --extra-index-url https://pypi.org/simple sdc

Building Intel® SDC from Source on Linux

We use Anaconda distribution of Python for setting up Intel® SDC build environment.

If you do not have conda, we recommend using Miniconda3:

wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
chmod +x miniconda.sh
./miniconda.sh -b
export PATH=$HOME/miniconda3/bin:$PATH

Note

For maximum performance and stability, please use numba from intel/label/beta channel.

It is possible to build Intel® SDC via conda-build or setuptools. Follow one of the cases below to install Intel® SDC and its dependencies on Linux.

Building on Linux with conda-build

PYVER=<3.6 or 3.7>
NUMPYVER=<1.16 or 1.17>
conda create -n conda-build-env python=$PYVER conda-build
source activate conda-build-env
git clone https://github.com/IntelPython/sdc.git
cd sdc
conda build --python $PYVER --numpy $NUMPYVER --output-folder=<output_folder> -c intel/label/beta -c defaults -c intel -c conda-forge --override-channels buildscripts/sdc-conda-recipe

Building on Linux with setuptools

PYVER=<3.6 or 3.7>
NUMPYVER=<1.16 or 1.17>
conda create -n sdc-env -q -y -c intel/label/beta -c defaults -c intel -c conda-forge python=$PYVER numpy=$NUMPYVER numba=0.49 pandas=0.25.3 pyarrow=0.17.0 gcc_linux-64 gxx_linux-64
source activate sdc-env
git clone https://github.com/IntelPython/sdc.git
cd sdc
python setup.py install

In case of issues, reinstalling in a new conda environment is recommended.

Building Intel® SDC from Source on Windows

Building Intel® SDC on Windows requires Build Tools for Visual Studio 2019 (with component MSVC v140 - VS 2015 C++ build tools (v14.00)):

It is possible to build Intel® SDC via conda-build or setuptools. Follow one of the cases below to install Intel® SDC and its dependencies on Windows.

Building on Windows with conda-build

set PYVER=<3.6 or 3.7>
set NUMPYVER=<1.16 or 1.17>
conda create -n conda-build-env -q -y python=%PYVER% conda-build conda-verify vc vs2015_runtime vs2015_win-64
conda activate conda-build-env
git clone https://github.com/IntelPython/sdc.git
cd sdc
conda build --python %PYVER% --numpy %NUMPYVER% --output-folder=<output_folder> -c intel/label/beta -c defaults -c intel -c conda-forge --override-channels buildscripts\sdc-conda-recipe

Building on Windows with setuptools

set PYVER=<3.6 or 3.7>
set NUMPYVER=<1.16 or 1.17>
conda create -n sdc-env -c intel/label/beta -c defaults -c intel -c conda-forge python=%PYVER% numpy=%NUMPYVER% numba=0.49 pandas=0.25.3 pyarrow=0.17.0
conda activate sdc-env
set INCLUDE=%INCLUDE%;%CONDA_PREFIX%\Library\include
set LIB=%LIB%;%CONDA_PREFIX%\Library\lib
git clone https://github.com/IntelPython/sdc.git
cd sdc
python setup.py install

Troubleshooting Windows Build

  • If the cl compiler throws the error fatal error LNK1158: cannot run 'rc.exe', add Windows Kits to your PATH (e.g. C:\Program Files (x86)\Windows Kits\8.0\bin\x86).
  • Some errors can be mitigated by set DISTUTILS_USE_SDK=1.
  • For setting up Visual Studio, one might need go to registry at HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\Microsoft\VisualStudio\SxS\VS7, and add a string value named 14.0 whose data is C:\Program Files (x86)\Microsoft Visual Studio 14.0\.
  • Sometimes if the conda version or visual studio version being used are not latest then building Intel® SDC can throw some vague error about a keyword used in a file. So make sure you are using the latest versions.

Building documentation

Building Intel® SDC User's Guide documentation requires pre-installed Intel® SDC package along with compatible Pandas* version as well as Sphinx* 2.2.1 or later.

Intel® SDC documentation includes Intel® SDC examples output which is pasted to functions description in the API Reference.

Use pip to install Sphinx* and extensions: :

pip install sphinx sphinxcontrib-programoutput

Currently the build precedure is based on make located at ./sdc/docs/ folder. While it is not generally required we recommended that you clean up the system from previous documentaiton build by running: :

make clean

To build HTML documentation you will need to run: :

make html

The built documentation will be located in the ./sdc/docs/build/html directory. To preview the documentation open index.html file.

More information about building and adding documentation can be found here.

Running unit tests

python sdc/tests/gen_test_data.py
python -m unittest

References

Intel® SDC follows ideas and initial code base of High-Performance Analytics Toolkit (HPAT). These academic papers describe ideas and methods behind HPAT:

sdc's People

Contributors

1e-to avatar akharche avatar alexander-makaryev avatar alexanderkalistratov avatar anmyachev avatar crayxt avatar densmirn avatar dmitrii-zagornyi avatar drtodd13 avatar ehsantn avatar esc avatar fschlimb avatar gitter-badger avatar hardcode84 avatar kozlov-alexey avatar pokhodenkosa avatar quasilyte avatar rdesai16 avatar rubtsowa avatar samaid avatar samir-nasibli avatar shssf avatar tocarip avatar vyacheslav-smirnov avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.