heavyai / heavydb Goto Github PK

HeavyDB (formerly OmniSciDB)

License: Apache License 2.0

C++ 84.40% CMake 1.45% Cuda 0.92% C 0.53% LLVM 0.01% Java 9.25% Python 1.93% Shell 0.62% Thrift 0.16% Dockerfile 0.03% FreeMarker 0.33% Ruby 0.29% PowerShell 0.04% Cython 0.05% Makefile 0.01% Perl 0.01%

gpu database olap visualization sql machine-learning interactive real-time mapd omnisci

heavydb's Introduction

HeavyDB (formerly OmniSciDB)

HeavyDB is an open source SQL-based, relational, columnar database engine that leverages the full performance and parallelism of modern hardware (both CPUs and GPUs) to enable querying of multi-billion row datasets in milliseconds, without the need for indexing, pre-aggregation, or downsampling. HeavyDB can be run on hybrid CPU/GPU systems (Nvidia GPUs are currently supported), as well as on CPU-only systems featuring X86, Power, and ARM (experimental support) architectures. To achieve maximum performance, HeavyDB features multi-tiered caching of data between storage, CPU memory, and GPU memory, and an innovative Just-In-Time (JIT) query compilation framework.

For usage info, see the product documentation, and for more details about the system's internal architecture, check out the developer documentation. Further technical discussion can be found on the HEAVY.AI Community Forum.

The repository includes a number of third party packages provided under separate licenses. Details about these packages and their respective licenses is at ThirdParty/licenses/index.md.

Downloads and Installation Instructions

HEAVY.AI provides pre-built binaries for Linux for stable releases of the project:

Distro	Package type	CPU/GPU	Repository	Docs
CentOS	RPM	CPU	https://releases.heavy.ai/os/yum/stable/cpu	https://docs.heavy.ai/installation-and-configuration/installation/installing-on-centos/centos-yum-gpu-ee
CentOS	RPM	GPU	https://releases.heavy.ai/os/yum/stable/cuda	https://docs.heavy.ai/installation-and-configuration/installation/installing-on-centos/centos-yum-gpu-ee
Ubuntu	DEB	CPU	https://releases.heavy.ai/os/apt/dists/stable/cpu	https://docs.heavy.ai/installation-and-configuration/installation/installing-on-ubuntu/centos-yum-gpu-ee
Ubuntu	DEB	GPU	https://releases.heavy.ai/os/apt/dists/stable/cuda	https://docs.heavy.ai/installation-and-configuration/installation/installing-on-ubuntu/centos-yum-gpu-ee
*	tarball	CPU	https://releases.heavy.ai/os/tar/heavyai-os-latest-Linux-x86_64-cpu.tar.gz
*	tarball	GPU	https://releases.heavy.ai/os/tar/heavyai-os-latest-Linux-x86_64.tar.gz

Developing HeavyDB: Table of Contents

Links
License
Contributing
Building
Testing
Using
Code Style
Dependencies
Roadmap

License

This project is licensed under the Apache License, Version 2.0.

The repository includes a number of third party packages provided under separate licenses. Details about these packages and their respective licenses is at ThirdParty/licenses/index.md.

Contributing

In order to clarify the intellectual property license granted with Contributions from any person or entity, HEAVY.AI must have a Contributor License Agreement ("CLA") on file that has been signed by each Contributor, indicating agreement to the Contributor License Agreement. After making a pull request, a bot will notify you if a signed CLA is required and provide instructions for how to sign it. Please read the agreement carefully before signing and keep a copy for your records.

Building

If this is your first time building HeavyDB, install the dependencies mentioned in the Dependencies section below.

HeavyDB uses CMake for its build system.

mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=debug ..
make -j 4

The following cmake/ccmake options can enable/disable different features:

-DCMAKE_BUILD_TYPE=release - Build type and compiler options to use. Options are Debug, Release, RelWithDebInfo, MinSizeRel, and unset.
-DENABLE_ASAN=off - Enable address sanitizer. Default is off.
-DENABLE_AWS_S3=on - Enable AWS S3 support, if available. Default is on.
-DENABLE_CUDA=off - Disable CUDA. Default is on.
-DENABLE_CUDA_KERNEL_DEBUG=off - Enable debugging symbols for CUDA kernels. Will dramatically reduce kernel performance. Default is off.
-DENABLE_DECODERS_BOUNDS_CHECKING=off - Enable bounds checking for column decoding. Default is off.
-DENABLE_FOLLY=on - Use Folly. Default is on.
-DENABLE_IWYU=off - Enable include-what-you-use. Default is off.
-DENABLE_JIT_DEBUG=off - Enable debugging symbols for the JIT. Default is off.
-DENABLE_ONLY_ONE_ARCH=off - Compile GPU code only for the host machine's architecture, speeding up compilation. Default is off.
-DENABLE_PROFILER=off - Enable google perftools. Default is off.
-DENABLE_STANDALONE_CALCITE=off - Require standalone Calcite server. Default is off.
-DENABLE_TESTS=on - Build unit tests. Default is on.
-DENABLE_TSAN=off - Enable thread sanitizer. Default is off.
-DENABLE_CODE_COVERAGE=off - Enable code coverage symbols (clang only). Default is off.
-DPREFER_STATIC_LIBS=off - Static link dependencies, if available. Default is off. Only works on CentOS.

Testing

HeavyDB uses Google Test as its main testing framework. Tests reside under the Tests directory.

The sanity_tests target runs the most common tests. If using Makefiles to build, the tests may be run using:

make sanity_tests

AddressSanitizer

AddressSanitizer can be activated by setting the ENABLE_ASAN CMake flag in a fresh build directory. At this time CUDA must also be disabled. In an empty build directory run CMake and compile:

mkdir build && cd build
cmake -DENABLE_ASAN=on -DENABLE_CUDA=off ..
make -j 4

Finally run the tests:

export ASAN_OPTIONS=alloc_dealloc_mismatch=0:handle_segv=0
make sanity_tests

ThreadSanitizer

ThreadSanitizer can be activated by setting the ENABLE_TSAN CMake flag in a fresh build directory. At this time CUDA must also be disabled. In an empty build directory run CMake and compile:

mkdir build && cd build
cmake -DENABLE_TSAN=on -DENABLE_CUDA=off ..
make -j 4

We use a TSAN suppressions file to ignore warnings in third party libraries. Source the suppressions file by adding it to your TSAN_OPTIONS env:

export TSAN_OPTIONS="suppressions=/path/to/heavydb/config/tsan.suppressions"

Finally run the tests:

make sanity_tests

Generating Packages

HeavyDB uses CPack to generate packages for distribution. Packages generated on CentOS with static linking enabled can be used on most other recent Linux distributions.

To generate packages on CentOS (assuming starting from top level of the heavydb repository):

mkdir build-package && cd build-package
cmake -DPREFER_STATIC_LIBS=on -DCMAKE_BUILD_TYPE=release ..
make -j 4
cpack -G TGZ

The first command creates a fresh build directory, to ensure there is nothing left over from a previous build.

The second command configures the build to prefer linking to the dependencies' static libraries instead of the (default) shared libraries, and to build using CMake's release configuration (enables compiler optimizations). Linking to the static versions of the libraries libraries reduces the number of dependencies that must be installed on target systems.

The last command generates a .tar.gz package. The TGZ can be replaced with, for example, RPM or DEB to generate a .rpm or .deb, respectively.

Using

The startheavy wrapper script may be used to start HeavyDB in a testing environment. This script performs the following tasks:

initializes the data storage directory via initdb, if required
starts the main HeavyDB server, heavydb
offers to download and import a sample dataset, using the insert_sample_data script

Assuming you are in the build directory, and it is a subdirectory of the heavydb repository, startheavy may be run by:

../startheavy

Starting Manually

It is assumed that the following commands are run from inside the build directory.

Initialize the data storage directory. This command only needs to be run once.

mkdir data && ./bin/initdb data

Start the HeavyDB server:

./bin/heavydb

If desired, insert a sample dataset by running the insert_sample_data script in a new terminal:

../insert_sample_data

You can now start using the database. The heavysql utility may be used to interact with the database from the command line:

./bin/heavysql -p HyperInteractive

where HyperInteractive is the default password. The default user admin is assumed if not provided.

Code Style

Contributed code should compile without generating warnings by recent compilers on most Linux distributions. Changes to the code should follow the C++ Core Guidelines.

clang-format

A .clang-format style configuration, based on the Chromium style guide, is provided at the top level of the repository. Please format your code using a recent version (8.0+ preferred) of ClangFormat before submitting.

To use:

clang-format -i File.cpp

clang-tidy

A .clang-tidy configuration is provided at the top level of the repository. Please lint your code using a recent version (6.0+ preferred) of clang-tidy before submitting.

clang-tidy requires all generated files to exist before running. The easiest way to accomplish this is to simply run a full build before running clang-tidy. A build target which runs clang-tidy is provided. To use:

make run-clang-tidy

Note: clang-tidy may make invalid or overly verbose changes to the source code. It is recommended to first commit your changes, then run clang-tidy and review its recommended changes before amending them to your commit.

Note: the clang-tidy target uses the run-clang-tidy.py script provided with LLVM, which may depend on PyYAML. The target also depends on jq, which is used to filter portions of the compile_commands.json file.

Dependencies

HeavyDB has the following dependencies:

Package	Min Version	Required
CMake	3.16	yes
LLVM	9.0	yes
GCC	8.4.0	no, if building with clang
Go	1.12	yes
Boost	1.72.0	yes
OpenJDK	1.7	yes
CUDA	11.0	yes, if compiling with GPU support
gperftools		yes
gdal	2.4.2	yes
Arrow	3.0.0	yes

CentOS 7

HeavyDB requires a number of dependencies which are not provided in the common CentOS/RHEL package repositories. A prebuilt package containing all these dependencies is provided for CentOS 7 (x86_64).

Use the scripts/mapd-deps-prebuilt.sh build script to install prebuilt dependencies.

These dependencies will be installed to a directory under /usr/local/mapd-deps. The mapd-deps-prebuilt.sh script also installs Environment Modules in order to simplify managing the required environment variables. Log out and log back in after running the mapd-deps-prebuilt.sh script in order to active Environment Modules command, module.

The mapd-deps environment module is disabled by default. To activate for your current session, run:

module load mapd-deps

To disable the mapd-deps module:

module unload mapd-deps

WARNING: The mapd-deps package contains newer versions of packages such as GCC and ncurses which might not be compatible with the rest of your environment. Make sure to disable the mapd-deps module before compiling other packages.

Instructions for installing CUDA are below.

CUDA

It is preferred, but not necessary, to install CUDA and the NVIDIA drivers using the .rpm using the instructions provided by NVIDIA. The rpm (network) method (preferred) will ensure you always have the latest stable drivers, while the rpm (local) method allows you to install does not require Internet access.

The .rpm method requires DKMS to be installed, which is available from the Extra Packages for Enterprise Linux repository:

sudo yum install epel-release

Be sure to reboot after installing in order to activate the NVIDIA drivers.

Environment Variables

The mapd-deps-prebuilt.sh script includes two files with the appropriate environment variables: mapd-deps-<date>.sh (for sourcing from your shell config) and mapd-deps-<date>.modulefile (for use with Environment Modules, yum package environment-modules). These files are placed in mapd-deps install directory, usually /usr/local/mapd-deps/<date>. Either of these may be used to configure your environment: the .sh may be sourced in your shell config; the .modulefile needs to be moved to the modulespath.

Building Dependencies

The scripts/mapd-deps-centos.sh script is used to build the dependencies. Modify this script and run if you would like to change dependency versions or to build on alternative CPU architectures.

cd scripts
module unload mapd-deps
./mapd-deps-centos.sh --compress

macOS

scripts/mapd-deps-osx.sh is provided that will automatically install and/or update Homebrew and use that to install all dependencies. Please make sure macOS is completely up to date and Xcode is installed before running. Xcode can be installed from the App Store.

CUDA

mapd-deps-osx.sh will automatically install CUDA via Homebrew and add the correct environment variables to ~/.bash_profile.

Java

mapd-deps-osx.sh will automatically install Java and Maven via Homebrew and add the correct environment variables to ~/.bash_profile.

Ubuntu

Most build dependencies required by HeavyDB are available via APT. Certain dependencies such as Thrift, Blosc, and Folly must be built as they either do not exist in the default repositories or have outdated versions. A prebuilt package containing all these dependencies is provided for Ubuntu 18.04 (x86_64). The dependencies will be installed to /usr/local/mapd-deps/ by default; see the Environment Variables section below for how to add these dependencies to your environment.

Ubuntu 16.04

HeavyDB requires a newer version of Boost than the version which is provided by Ubuntu 16.04. The scripts/mapd-deps-ubuntu1604.sh build script will compile and install a newer version of Boost into the /usr/local/mapd-deps/ directory.

Ubuntu 18.04

Use the scripts/mapd-deps-prebuilt.sh build script to install prebuilt dependencies.

These dependencies will be installed to a directory under /usr/local/mapd-deps. The mapd-deps-prebuilt.sh script above will generate a script named mapd-deps.sh containing the environment variables which need to be set. Simply source this file in your current session (or symlink it to /etc/profile.d/mapd-deps.sh) in order to activate it:

source /usr/local/mapd-deps/mapd-deps.sh

Environment Variables

The CUDA and mapd-deps lib directories need to be added to LD_LIBRARY_PATH; the CUDA and mapd-deps bin directories need to be added to PATH. The mapd-deps-ubuntu.sh and mapd-deps-prebuilt.sh scripts will generate a script named mapd-deps.sh containing the environment variables which need to be set. Simply source this file in your current session (or symlink it to /etc/profile.d/mapd-deps.sh) in order to activate it:

source /usr/local/mapd-deps/mapd-deps.sh

CUDA

Recent versions of Ubuntu provide the NVIDIA CUDA Toolkit and drivers in the standard repositories. To install:

sudo apt install -y \
    nvidia-cuda-toolkit

Be sure to reboot after installing in order to activate the NVIDIA drivers.

Building Dependencies

The scripts/mapd-deps-ubuntu.sh and scripts/mapd-deps-ubuntu1604.sh scripts are used to build the dependencies for Ubuntu 18.04 and 16.04, respectively. The scripts will install all required dependencies (except CUDA) and build the dependencies which require it. Modify this script and run if you would like to change dependency versions or to build on alternative CPU architectures.

cd scripts
./mapd-deps-ubuntu.sh --compress

Arch

scripts/mapd-deps-arch.sh is provided that will use yay to install packages from the Arch User Repository and custom PKGBUILD scripts for a few packages listed below. If you don't have yay yet, install it first: https://github.com/Jguer/yay#installation

Package Version Requirements:

CUDA

CUDA and the NVIDIA drivers may be installed using the following.

yay -S \
    linux-headers \
    cuda \
    nvidia

Be sure to reboot after installing in order to activate the NVIDIA drivers.

Environment Variables

The cuda package should set up the environment variables required to use CUDA. If you receive errors saying nvcc is not found, then CUDA bin directories need to be added to PATH: the easiest way to do so is by creating a new file named /etc/profile.d/mapd-deps.sh containing the following:

PATH=/opt/cuda/bin:$PATH
export PATH

heavydb's People

Contributors

Stargazers

Watchers

Forkers

townie vdt pborne taikoo varunnagpaal claudiouzelac alexanderdaw ngaut arewedancer mahak vab cuulee rtvt123 vaibkamble codeaudit qiuyesuifeng rogervaas tomzhang hj3938 cvega ariesdevil thanajade yzhhui baoxuezhao gstariarch tchen0123 vincentchen shangshujie365 eagle518 liangxiaobo decisionpatterns dprophet mylearning2017 qxp1011 benjamesbabala spiritlcx kongfujianlong andre-git lehoon johnspaul92 giserh enginekit b-xiang yutiansut nandankumar-drg voltcode iuliandumitru bamaao xubingyue harald-lang niejn grseb9s armeo bhatti zj2089 ryosukess caomw duzhanyuan miketam1021 profcab njwhite blunney1 joe2hpimn xxwwbb3 521314 abdo-farag stvhanna jonbaer zhuangzhi rnz sjanulonoks ezhangle nakijun kartmaster matrixquery zhenchaopro acbrewbaker xizhe-zhang biglyan nagyist arungupta tmostak jhx1008 haifeng2170 pdiddyb mo-dt tony32769 shusson lmeyerov mrblueblue bacgroup donnyzone lahdeaho david4096 wamsiv hades210 yusufameri simonzhangsm shenguoquan malaokia

heavydb's Issues

Compile failed on Ubuntu 16.04: FLEXPP_EXECUTABLE-NOTFOUND: command not found

$ make -j 4

and produce the following errors.

[  0%] Generating gen-cpp/MapD.cpp, gen-cpp/MapD.h, gen-cpp/mapd_constants.cpp, gen-cpp/mapd_types.cpp
Scanning dependencies of target poly2tri
-- Looking for flex++
Scanning dependencies of target calciteserver_thrift
Scanning dependencies of target mapd_thrift
[  0%] Building CXX object ThirdParty/poly2tri/CMakeFiles/poly2tri.dir/poly2tri/common/shapes.cc.o
[  2%] Building CXX object CMakeFiles/mapd_thrift.dir/gen-cpp/MapD.cpp.o
[  2%] Building CXX object Calcite/CMakeFiles/calciteserver_thrift.dir/__/gen-cpp/CalciteServer.cpp.o
-- Configuring done
-- Generating done
-- Build files have been written to: /home/flow/workspace/git/mapd-core/build
[  2%] Built target rerun_cmake
[  2%] Building CXX object CMakeFiles/mapd_thrift.dir/gen-cpp/mapd_constants.cpp.o
[  3%] Building CXX object ThirdParty/poly2tri/CMakeFiles/poly2tri.dir/poly2tri/sweep/advancing_front.cc.o
[  3%] Building CXX object ThirdParty/poly2tri/CMakeFiles/poly2tri.dir/poly2tri/sweep/cdt.cc.o
[  4%] Building CXX object ThirdParty/poly2tri/CMakeFiles/poly2tri.dir/poly2tri/sweep/sweep.cc.o
[  4%] Building CXX object ThirdParty/poly2tri/CMakeFiles/poly2tri.dir/poly2tri/sweep/sweep_context.cc.o
Scanning dependencies of target sqlite3
[  4%] Building C object ThirdParty/sqlite3/CMakeFiles/sqlite3.dir/sqlite3.c.o
[  5%] Linking CXX static library libpoly2tri.a
[  5%] Built target poly2tri
[  6%] Building CXX object CMakeFiles/mapd_thrift.dir/gen-cpp/mapd_types.cpp.o
[  6%] Building CXX object Calcite/CMakeFiles/calciteserver_thrift.dir/__/gen-cpp/calciteserver_constants.cpp.o
[  7%] Building CXX object Calcite/CMakeFiles/calciteserver_thrift.dir/__/gen-cpp/calciteserver_types.cpp.o
[  8%] Linking C static library libsqlite3.a
[  8%] Built target sqlite3
[  8%] Linking CXX static library libcalciteserver_thrift.a
[  8%] Built target calciteserver_thrift
Scanning dependencies of target Utils
[  9%] Building CXX object Utils/CMakeFiles/Utils.dir/StringLike.cpp.o
Scanning dependencies of target Fragmenter
[ 10%] Building CXX object Fragmenter/CMakeFiles/Fragmenter.dir/InsertOrderFragmenter.cpp.o
[ 10%] Building CXX object Utils/CMakeFiles/Utils.dir/Regexp.cpp.o
[ 11%] Building CXX object Utils/CMakeFiles/Utils.dir/ChunkIter.cpp.o
[ 11%] Linking CXX static library libUtils.a
[ 11%] Built target Utils
[ 12%] Generating Scanner.cpp
/bin/sh: FLEXPP_EXECUTABLE-NOTFOUND: command not found
Parser/CMakeFiles/ScannerFiles.dir/build.make:60: recipe for target 'Parser/Scanner.cpp' failed
make[2]: *** [Parser/Scanner.cpp] Error 127
CMakeFiles/Makefile2:822: recipe for target 'Parser/CMakeFiles/ScannerFiles.dir/all' failed
make[1]: *** [Parser/CMakeFiles/ScannerFiles.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
[ 12%] Linking CXX static library libFragmenter.a
[ 12%] Built target Fragmenter
[ 12%] Linking CXX static library libmapd_thrift.a
[ 12%] Built target mapd_thrift
Makefile:160: recipe for target 'all' failed
make: *** [all] Error 2

Improve hoisted literals code generation

Currently, we generate memory loads (from the literal buffer) when the literal node is visited and rely on loop invariant code motion pass to hoist these loads outside of the query loop. This is a heavy-handed use of LICM and might sometimes not be optimized. We should collect the constant expression with a visitor and generate the loads in the entry block.

DECIMAL datatype with a min precision doesn't have outside MIN boundaries

ISSUE

MapD's DECIMAL datatype with the min precision doesn't have some outside boundaries. To reproduce the issues, run the follow sql statements on the MapDQL cursor:

CREATE TABLE t_decimal(id SMALLINT, val DECIMAL(2,1));
INSERT INTO t_decimal VALUES(1, NULL);
INSERT INTO t_decimal VALUES(2, 1);
INSERT INTO t_decimal VALUES(3, 1.1);
INSERT INTO t_decimal VALUES(4, 1.15);
INSERT INTO t_decimal VALUES(5, 15.1);
INSERT INTO t_decimal VALUES(6, 15.15);
INSERT INTO t_decimal VALUES(7, 123456789012345678);
INSERT INTO t_decimal VALUES(8, 1234567890123456789);
INSERT INTO t_decimal VALUES(9, 12345678901234567890);
INSERT INTO t_decimal VALUES(10, 123456789012345678.1);
INSERT INTO t_decimal VALUES(11, 123456789012345678.12);
SELECT * FROM t_decimal ORDER BY id;

Results from the SELECT statement

mapdql> SELECT * FROM t_decimal ORDER BY id;
id|val
1|NULL
2|1.000000
3|1.100000
4|1.100000
5|15.100000
6|15.100000
7|123456789012345680.000000
8|-610106517247498368.000000
9|-1.000000
10|123456789012345680.000000
11|123456789012345680.000000

Comparisons of Expected and Actual Results

Row 1 (id: 1 | val: null)
- Expected Results: null  = Actual Results: null? -> PASS

Row 2 (id: 2 | val: 1)
- Expected Results: 1.0  == Actual Results: 1.000000? -> FAIL

Row 3 (id: 3 | val: 1.1)
- Expected Results: 1.1  == Actual Results: 1.100000? -> FAIL

Row 4 (id: 4 | val: 1.15)
- Expected Results: 1.15  == Actual Results: 1.000000?-> FAIL

Row 5 (id: 5 | val: 15.1)
- Expected Results: 15.1  == Actual Results: 15.100000? -> FAIL

Row 6 (id: 6 | val: 15.15)
- Expected Results: 15.1  == Actual Results: 15.100000? -> FAIL

Row 7 (id: 7 | val: 123456789012345678)
- Expected Results: 123456789012345678.0  == Actual Results: 123456789012345680.000000 -> FAIL

Row 7 (id: 8 | val: 1234567890123456789)
- Expected Results: null  == Actual Results: -610106517247498368.000000 -> FAIL

Row 7 (id: 9 | val: 12345678901234567890)
- Expected Results: null  == Actual Results: -1.000000 -> FAIL

Row 7 (id: 10 | val: 123456789012345678.1)
- Expected Results: 123456789012345678.0  == Actual Results: 123456789012345680.000000 -> FAIL

Row 7 (id: 11 | val: 123456789012345678.12)
- Expected Results: 123456789012345678.0  == Actual Results: 123456789012345680.000000 -> FAIL

Result Explanation

The results show 1 pass out of 11 test cases. In definition, precision is the number of digits in a number. Scale is the number of digits to the right of the decimal point in a number. For example, the number 123.45 has a precision of 5 and a scale of 2. The NUMERIC datatype has 8 bytes allowing up to the 19-digit precision. The way MapD's detection handling to outside boundaries is ineffective.

Suggestions

How many scale digit should DECIMAL(2,1) show a value? MapD always gives 6 scale digit including zeros. Should MapD enforce the value output based on scale?
What values should MapD give for outside boundaries?

For example, what should the value of 15.2 in the DECIMAL(2,1) column be? 15.2 or 15?
What should the value containing over 19 precision be?

Let me know your thoughts.

centos 7 Missing Python Libs?

I followed the steps to build for Centos. Looks like I got all the way down to the end and got this:
-- Performing Test COMPILER_RT_HAS_WD4800_FLAG
-- Performing Test COMPILER_RT_HAS_WD4800_FLAG - Failed
-- Looking for func
-- Looking for func - found
-- Looking for fopen in c
-- Looking for fopen in c - found
-- Looking for dlopen in dl
-- Looking for dlopen in dl - found
-- Looking for shm_open in rt
-- Looking for shm_open in rt - found
-- Looking for pow in m
-- Looking for pow in m - found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Looking for __cxa_throw in stdc++
-- Looking for __cxa_throw in stdc++ - found
-- Looking for i686
-- Looking for i686 - not found
-- Looking for i386
-- Looking for i386 - not found
-- Compiler-RT supported architectures: x86_64
-- Looking for rpc/xdr.h
-- Looking for rpc/xdr.h - found
-- Looking for tirpc/rpc/xdr.h
-- Looking for tirpc/rpc/xdr.h - not found
-- Performing Test COMPILER_RT_HAS_STD_C11_FLAG
-- Performing Test COMPILER_RT_HAS_STD_C11_FLAG - Success
-- Performing Test COMPILER_RT_HAS_VISIBILITY_HIDDEN_FLAG
-- Performing Test COMPILER_RT_HAS_VISIBILITY_HIDDEN_FLAG - Success
-- Performing Test COMPILER_RT_HAS_OMIT_FRAME_POINTER_FLAG
-- Performing Test COMPILER_RT_HAS_OMIT_FRAME_POINTER_FLAG - Success
-- Performing Test COMPILER_RT_HAS_FREESTANDING_FLAG
-- Performing Test COMPILER_RT_HAS_FREESTANDING_FLAG - Success
-- Performing Test COMPILER_RT_HAS_XRAY_COMPILER_FLAG
-- Performing Test COMPILER_RT_HAS_XRAY_COMPILER_FLAG - Failed
-- Performing Test COMPILER_RT_HAS_ATOMIC_KEYWORD
-- Performing Test COMPILER_RT_HAS_ATOMIC_KEYWORD - Success
-- Builtin supported architectures: x86_64
-- Performing Test COMPILER_RT_TARGET_HAS_ATOMICS
-- Performing Test COMPILER_RT_TARGET_HAS_ATOMICS - Success
-- Performing Test COMPILER_RT_TARGET_HAS_FCNTL_LCK
-- Performing Test COMPILER_RT_TARGET_HAS_FCNTL_LCK - Success
-- check-xray-fdr does nothing.
-- Looking for sys/resource.h
-- Looking for sys/resource.h - found
-- Clang version: 4.0.0
-- Performing Test CXX_SUPPORTS_NO_NESTED_ANON_TYPES_FLAG
-- Performing Test CXX_SUPPORTS_NO_NESTED_ANON_TYPES_FLAG - Failed
-- LLD version: 4.0.0
CMake Error at /usr/local/mapd-deps/20170608/share/cmake-3.7/Modules/FindPackageHandleStandardArgs.cmake:138 (message):
Could NOT find PythonLibs (missing: PYTHON_LIBRARIES PYTHON_INCLUDE_DIRS)
Call Stack (most recent call first):
/usr/local/mapd-deps/20170608/share/cmake-3.7/Modules/FindPackageHandleStandardArgs.cmake:378 (_FPHSA_FAILURE_MESSAGE)
/usr/local/mapd-deps/20170608/share/cmake-3.7/Modules/FindPythonLibs.cmake:255 (FIND_PACKAGE_HANDLE_STANDARD_ARGS)
tools/lldb/cmake/modules/LLDBConfig.cmake:179 (find_package)
tools/lldb/CMakeLists.txt:4 (include)

-- Configuring incomplete, errors occurred!
See also "/home/vagrant/github/mapd-core/scripts/build.llvm-4.0.0/CMakeFiles/CMakeOutput.log".
See also "/home/vagrant/github/mapd-core/scripts/build.llvm-4.0.0/CMakeFiles/CMakeError.log".

compiler mapd-core error

[root@localhost ~]# cd mapd-core-master/

[root@localhost mapd-core-master]# cd build/

[root@localhost build]# cmake -DCMAKE_BUILD_TYPE=debug -DMAPD_IMMERSE_DOWNLOAD=off ..

CMake Error at /usr/local/share/cmake-3.7/Modules/FindPackageHandleStandardArgs.cmake:138 (message):

Could NOT find Thrift (missing: Thrift_LIBRARY Thrift_VERSION)

Call Stack (most recent call first):

/usr/local/share/cmake-3.7/Modules/FindPackageHandleStandardArgs.cmake:378 (_FPHSA_FAILURE_MESSAGE)

cmake/Modules/FindThrift.cmake:92 (find_package_handle_standard_args)

CMakeLists.txt:84 (find_package)

-- Configuring incomplete, errors occurred!

See also "/root/mapd-core-master/build/CMakeFiles/CMakeOutput.log".

See also "/root/mapd-core-master/build/CMakeFiles/CMakeError.log".

[root@localhost build]#

No OpenCL Support

Why are you guys all using Nvidia's proprietary API to do all this cool stuff?!

cmake CUDA issue

I think everything is installed correctly, but I cannot get past this error:
CMake Error at /usr/local/mapd-deps/20170505/share/cmake-3.7/Modules/FindPackageHandleStandardArgs.cmake:138 (message):
Could NOT find CUDA (missing: CUDA_INCLUDE_DIRS CUDA_CUDART_LIBRARY) (found
version "8.0")
Call Stack (most recent call first):
/usr/local/mapd-deps/20170505/share/cmake-3.7/Modules/FindPackageHandleStandardArgs.cmake:378 (_FPHSA_FAILURE_MESSAGE)
/usr/local/mapd-deps/20170505/share/cmake-3.7/Modules/FindCUDA.cmake:1013 (find_package_handle_standard_args)
CMakeLists.txt:62 (find_package)

Failing with Arrow build

Hi! I'm trying out with the latest Arrow, (cuda disabled, debug build) and got this error:

/home/ubuntu/mapd-core/QueryEngine/ResultSetConversion.cpp:351:3: error: ‘Open’ is not a member of ‘arrow::ipc::StreamReader {aka arrow::ipc::RecordBatchReader}’
   ipc::StreamReader::Open(buf_reader, &reader);
   ^
QueryEngine/CMakeFiles/QueryEngine.dir/build.make:1345: recipe for target 'QueryEngine/CMakeFiles/QueryEngine.dir/ResultSetConversion.cpp.o' failed
make[2]: *** [QueryEngine/CMakeFiles/QueryEngine.dir/ResultSetConversion.cpp.o] Error 1
make[2]: *** Waiting for unfinished jobs....
CMakeFiles/Makefile2:1255: recipe for target 'QueryEngine/CMakeFiles/QueryEngine.dir/all' failed
make[1]: *** [QueryEngine/CMakeFiles/QueryEngine.dir/all] Error 2
Makefile:160: recipe for target 'all' failed
make: *** [all] Error 2

I tried switching to the previous arrow release (0.4.0) unsuccessfully.

Ubuntu SMP Wed Jul 27 16:06:39 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

llvm on OSX

OSX 10.11.6 + Homebrew 1.2 (+ using mapd-deps-osx.sh)

GroupByAndAggregate.h:34:10: fatal error:
'llvm/IR/Function.h' file not found
#include <llvm/IR/Function.h>

Any ideas for a fix? Looking at http://stackoverflow.com/questions/26246769/including-llvm-value-h-not-found do I need a different version of llvm (4.0)?

Long string causes entire COPY to fail with : Exception: String too long for dictionary encoding

If there is one bad long string (length greater than 2^16) the COPY process will exception out rather than just rejecting that row.

Work around is to clean the junk data prior to COPY command

How to enable backend rendering?

I am sorry about disturbing your guys here, since I didn't find an appropriate way to contact you about the usage and valuable features in mapd's community version. And I may also raise the issue in a inappropriate sub-repo of mapd...

The thing I want to know is that how to enable backend rendering in mapd? I don't know whether I should open it by setting something specifically, or it's only a valuable feature for enterprise version?

Thanks for your hard work for contributing to this excellent product. :-)

Implementing a distributed cluster

I've set up a few instances of MapD across a cluster and made sure they can see each other via the hostnames in the cluster.conf. However, when loading to the node configured as the aggregator, I don't see the string-server or any leaves getting data.

Other than setting the cluster.conf, string-server= or cluster= keys what other steps are needed to implement a MapD cluster? I'm using CPU only.

Thanks!

compiler of thrift-0.10.0 error: libboost_unit_test_framework.a: No such file or directory

make -j $(nproc)
.........
g++: error: /usr/lib64/libboost_unit_test_framework.a: No such file or directory

Does Folly needed by mapd?

I follow the installation guide of the mapd, and I found error like this:
CMake Error at /usr/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:108 (message): Could NOT find Folly (missing: Folly_LIBRARY Folly_DC_LIBRARY) Call Stack (most recent call first): /usr/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:315 (_FPHSA_FAILURE_MESSAGE) cmake/Modules/FindFolly.cmake:76 (find_package_handle_standard_args) CMakeLists.txt:95 (find_package)
Does Folly needed by mapd?And what version of Folly should install?

Compile failed in Qt Creator: cmake: No such file or directory

I tried to import mapd-core into Qt Creator 4.3.1 and compiled, came out with the following error.
It is probably caused by miss configurations in the IDE, but would be greatly helpful if you can give me a hint.

00:16:58: Running steps for project mapd...
00:16:58: Starting: "/usr/local/bin/cmake" --build . --target all
[  2%] Built target calciteserver_thrift
[  3%] Built target sqlite3
[  4%] Built target SqliteConnector
[  6%] Built target Utils
[  8%] Built target StringDictionary
[  9%] Built target Fragmenter
[ 10%] Built target Calcite
[ 11%] Built target Catalog
[ 12%] Built target CudaMgr
[ 19%] Built target DataMgr
[ 20%] Built target Chunk
[ 23%] Built target Shared
[ 24%] Built target initdb
make[2]: cmake: No such file or directory
make[2]: *** [CMakeFiles/rerun_cmake] Error 1
make[1]: *** [CMakeFiles/rerun_cmake.dir/all] Error 2
make: *** [all] Error 2
00:17:00: The process "/usr/local/bin/cmake" exited with code 2.
Error while building/deploying project mapd (kit: mapd-kit)
When executing step "CMake Build"
00:17:00: Elapsed time: 00:01.

Node that by checking the log, Qt Creator use server mode with cmake, to generate targets, e.g:

Running "/usr/local/bin/cmake -E server --pipe=/tmp/cmake-1SZfel/socket --experimental" in /Users/flow/workspace/github/mapd-core/build.qt.
The C compiler identification is AppleClang 8.0.0.8000042
The CXX compiler identification is AppleClang 8.0.0.8000042
Check for working C compiler: /usr/bin/gcc
Check for working C compiler: /usr/bin/gcc -- works
Detecting C compiler ABI info
Detecting C compiler ABI info - done
Detecting C compile features
Detecting C compile features - done
Check for working CXX compiler: /usr/bin/clang++
Check for working CXX compiler: /usr/bin/clang++ -- works
Detecting CXX compiler ABI info
Detecting CXX compiler ABI info - done
Detecting CXX compile features
Detecting CXX compile features - done
Looking for pthread.h
Looking for pthread.h - found
Looking for pthread_create
Looking for pthread_create - found
Found Threads: TRUE  
Found CUDA: /usr/local/cuda (found version "8.0") 
Found Git: /usr/bin/git (found version "2.10.1 (Apple Git-78)") 
Found Gflags: /usr/local/lib/libgflags.dylib  
Found Glog: /usr/local/lib/libglog.dylib  
Found Thrift: /usr/local/lib/libthrift.dylib  
Found ZLIB: /usr/lib/libz.dylib (found version "1.2.5") 
Found PNG: /usr/local/lib/libpng.dylib (found version "1.6.29") 
Found GDAL: /usr/local/Cellar/gdal/1.11.5_2/lib/libgdal.dylib  
Found Folly: /usr/local/lib/libfolly.dylib  
Found Curses: /usr/lib/libcurses.dylib  
Found JNI: /System/Library/Frameworks/JavaVM.framework  
Looking for bison++
Looking for bison++ -- /usr/local/bin/bison++
Looking for flex++
Looking for flex++ -- /usr/bin/flex++
Configuring done
Generating done
CMake Warning:
  Manually-specified variables were not used by the project:

    QT_QMAKE_EXECUTABLE

Joins across multiple GPUs using P2P

I am interested in the use case where some GPUs are connected by P2P and we perform a join involving data only on these GPUs. Does the current implementation use loads/stores or cudaMemcpy (p2p) or cudaMemcpy (via host)?

Exception: Query would require a scan without a limit on table(s)

When I execute a query in the TPC-H dataset, the mapd throws this exception, what's the meaning of this exception?

No Backend Rendering Support

Will this community edition support backend rendering?

WITH not behaving as expected

I'm trying to use table expressions, eventually using them in joins but seem to be hitting limits. The following code snippets show the issue:

CREATE TABLE test (ts TIMESTAMP NOT NULL)

This is fine:

SELECT (EXTRACT(EPOCH FROM ts) / 60) * 60 start_time FROM test GROUP BY start_time;

As is:

SELECT * FROM (SELECT (EXTRACT(EPOCH FROM ts) / 60) * 60 start_time FROM test GROUP BY start_time);

However, the following fails:

WITH s AS (SELECT (EXTRACT(EPOCH FROM ts) / 60) * 60 start_time FROM test GROUP BY start_time) SELECT * FROM s

Exception: Validate failed: From line 3, column 10 to line 3, column 21: Column 'start_time' not found in any table

Similarly, I get the same error if I try to use the select expression in a join.

I can start to work round it by using:

WITH s AS (SELECT (EXTRACT(EPOCH FROM ts) / 60) * 60 FROM test GROUP BY ((EXTRACT(EPOCH FROM ts / 60) * 60)) SELECT EXPR$0 FROM s;

but is there a simpler/better way?

Thanks,

Julian

NUMERIC/DECIMAL datatype has flaw.

ISSUE

MapD's NUMERIC/DECIMAL datatype has flaws. To reproduce the issues, run the follow sql statements on the MapDQL cursor:

CREATE TABLE t_d(id SMALLINT, val DECIMAL(2,1));
INSERT INTO t_d VALUES(1, NULL);
INSERT INTO t_d VALUES(2, 1);
INSERT INTO t_d VALUES(3, 0.1);
INSERT INTO t_d VALUES(4, 1.1);
INSERT INTO t_d VALUES(5, 15.1);
INSERT INTO t_d VALUES(6, 1.15);
INSERT INTO t_d VALUES(7, 15.15);

Results from the SELECT statement

# id|val
# 1|NULL
# 2|1.000000
# 3|0.100000
# 4|1.100000
# 5|15.100000
# 6|1.100000
# 7|15.100000

Explanation

The NUMERIC/DECIMAL datatype has 4 bytes for the maximum of 19 digits. The datatype has a precision and scale. Precision is defined as the total number of digits in a numeric or decimal type, and scale as the number of digits to the right of the decimal point. In this case, the val column is defined as DECIMAL(2,1) -> 2 digits precision and 1 digit scale. But result rows (#2 - 7) has 6 digit scales (i.e. the expected results for Row #4 should be 1.1, not 1.100000. In addition, no decimal nearest is calculated. For example, the expected results for in row #6 should be 1.2 (1.15 -> 1.2) but get 1.1.

Segfault with sql_execute_df

I'm working on getting the python client to return a pyarrow.Table for cpu shared memory results, but hitting a segfault. Using the branch at https://github.com/TomAugspurger/pymapd/tree/mapd-update, running the following causes a segfault on the call to client.sql_execute_df

# file: segfault.py
import pymapd

query = "select depdelay from flights_2008_10k"

con = pymapd.connect(user="mapd", password="HyperInteractive", dbname="mapd",
                     host="localhost")
client = con._client  # an instance of `MapD.Client`
session = con._session

r = client.sql_execute_df(session, query, 0, 0, -1)
print(r)

(pymapd isn't really required for this example; it's just a convenient way to generate the mapd bindings and create the connection)

Here's the output from the server, started under gdb

(gdb) file bin/mapd_server
Reading symbols from bin/mapd_server...done.
(gdb) run data --port 9091
Starting program: /Users/taugspurger/sandbox/mapd-core/build/bin/mapd_server data --port 9091
[New Thread 0x1403 of process 75950]
warning: unhandled dyld version (15)
E0629 16:04:50.441025 2832335808 MapDHandler.cpp:155] This build isn't CUDA enabled, will run on CPU
E0629 16:04:52.463640 2832335808 MapDHandler.cpp:184] No GPUs detected, falling back to CPU mode

# (this is where I start the script)

[New Thread 0x1207 of process 75950]
[New Thread 0x1503 of process 75950]
[New Thread 0x1603 of process 75950]
[New Thread 0x1703 of process 75950]
[New Thread 0x1803 of process 75950]
[New Thread 0x1903 of process 75950]
[New Thread 0x1a03 of process 75950]
[New Thread 0x1b03 of process 75950]
[New Thread 0x1c03 of process 75950]
[New Thread 0x1d03 of process 75950]
[New Thread 0x1e03 of process 75950]

Thread 3 received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x1207 of process 75950]
0x000000010019b71c in MapDHandler::validate_rel_alg (this=0x7000069f9930, _return=..., query_str=..., session_info=...) at /Users/taugspurger/sandbox/mapd-core/ThriftHandler/MapDHandler.cpp:652
652     }

Anything else I can provide to aid in debugging? I'm running mapd-core master (0235498) with arrow enabled, and arrow / pyarrow 0.4.1

Query causes server to core dump

I have the following two tables (lots of other columns removed that are not relevant to this):

mapdql> \d profiles
CREATE TABLE profiles (
id TEXT NOT NULL ENCODING DICT(32),
audience_ids TEXT[] NOT NULL ENCODING DICT(32))

mapdql> \d stories
CREATE TABLE stories (
id TEXT NOT NULL ENCODING DICT(32),
profile_id TEXT NOT NULL ENCODING DICT(32),
dma TEXT ENCODING DICT(32))

The following query works fine:

mapdql> select count(p.id) from stories s, profiles p
..> where s.dma = '501'
..> and s.profile_id = p.id;
EXPR$0
149729
1 rows returned.
Execution time: 26 ms, Total time: 27 ms

However, when I add in an ANY clause to the query to look for a specific audience_id value in the profiles.audience_ids column, the server crashes and core dumps.

mapdql> select count(p.id) from stories s, profiles p
..> where s.dma = '501'
..> and s.profile_id = p.id
..> and 'interest:hispanic' = ANY p.audience_ids;
Thrift: Fri Jun 30 21:07:01 2017 TSocket::write_partial() send() <Host: localhost Port: 9091>Broken pipe
......tons of these broken pipe messages........
Thrift: Fri Jun 30 21:07:02 2017 TSocket::open() connect() <Host: localhost Port: 9091>Connection refused
Thrift error: connect() failed: Connection refused
mapdql>

And from the server logs:

docker logs c706b41c7300
Backend TCP:  localhost:9091
Backend HTTP: localhost:9090
Frontend Web: localhost:9092
Calcite TCP: localhost:9093
- sleeping for 5s while server starts
Navigate to: http://localhost:9092
terminate called without an active exception
/mapd/startmapd: line 102:     8 Aborted                 (core dumped) ./bin/mapd_server $MAPD_DATA $RO --port $MAPD_TCP_PORT --http-port $MAPD_HTTP_PORT --calcite-port $MAPD_CALCITE_PORT $*
/mapd/startmapd: line 1: kill: (-7) - No such process

Compiler error with folly : gflags invalid

my OS is CENTOS 7
I download folly-2017.04.10.00.zip , but some wrong with folly
[root@localhost folly]# ./configure --prefix=/usr/local/mapd-deps
then "checking for main in -lgflags... yes

checking for gflags viability... no

configure: error: "gflags invalid, see config.log for details"

I am make gflags as compiler: /usr/local/lib/libgflags.a , is a static lib
vi config.log , said:

/usr/local/lib/libgflags.a(gflags.cc.o): In function `google::(anonymous namespace)::FlagRegistry::GlobalRegistry()':

gflags.cc:(.text+0xe96): undefined reference to `pthread_rwlock_wrlock'

gflags.cc:(.text+0xeae): undefined reference to `pthread_rwlock_unlock'

gflags.cc:(.text+0xf47): undefined reference to `pthread_rwlock_init'

Us Debian/Ubuntu users could use an own installation script too...

The repository's README.md has a long and non-trivial sequence of installation instructions, some of which may be a bit brittle (e.g. what if you're missing deb-src lines in your apt sources.list?) - I'd say it's enough to merit a dependencies installation script for debian-based distributions under scripts/ - wouldn't you? ... it makes us feel left out relative to, what is it, CentOS and MacOS people.

Build instructions for Ubuntu 16.04

The included commands to install/build the dependencies do not install bison++, which causes make to fail. Manually running apt-get install bison++ fixes the issue.

Build with ARROW_NO_DEPRECATED_API

Since the Arrow C++ public API is still evolving, some APIs may become deprecated and eventually change. If you add this compile definition, it will disable any deprecated APIs so that you have an opportunity to fix them as early as possible, and this will help avoid disruptions when upgrading to new versions of Arrow. We are using this define to assist with graceful deprecations

Build Error on Ubuntu 16.04.2 LTS

awk: symbol lookup error: /usr/local/mapd-deps/20170531/lib/libreadline.so.6: undefined symbol: UP
awk: symbol lookup error: /usr/local/mapd-deps/20170531/lib/libreadline.so.6: undefined symbol: UP
awk: symbol lookup error: /usr/local/mapd-deps/20170531/lib/libreadline.so.6: undefined symbol: UP
gawk: symbol lookup error: /usr/local/mapd-deps/20170531/lib/libreadline.so.6: undefined symbol: UP
configure: WARNING: oops, unrecognised float format:
gawk: symbol lookup error: /usr/local/mapd-deps/20170531/lib/libreadline.so.6: undefined symbol: UP
config.status: error: could not create demos/pexpr-config.h

BIGINT datatype doesn't have outside boundaries

ISSUE

MapD's BIGINT datatype doesn't have outside boundaries. To reproduce the issues, run the follow sql statements on the MapDQL cursor:

CREATE TABLE t_bigint(id SMALLINT, val BIGINT);
INSERT INTO t_bigint VALUES(1, NULL);
INSERT INTO t_bigint VALUES(2, 9223372036854775806);
INSERT INTO t_bigint VALUES(3, 9223372036854775807);
INSERT INTO t_bigint VALUES(4, 9223372036854775808);
INSERT INTO t_bigint VALUES(5, -9223372036854775807);
INSERT INTO t_bigint VALUES(6, -9223372036854775808);
INSERT INTO t_bigint VALUES(7, -9223372036854775809);
SELECT * FROM t_bigint ORDER BY id;

Results from the SELECT statement

mapdql> SELECT * FROM t_bigint ORDER BY id;
id|val
1|NULL
2|9223372036854775806
3|9223372036854775807
4|9223372036854775807
5|-9223372036854775807
6|NULL
7|NULL

Comparisons of Expected and Actual Results

Row 1 (id: 1 | val: null)
- Expected Results: null  = Actual Results: null? -> PASS

Row 2 (id: 2 | val: 9223372036854775806)
- Expected Results: 9223372036854775806  == Actual Results: 9223372036854775806? -> PASS

Row 3 (id: 3 | val: 9223372036854775807)
- Expected Results: 9223372036854775807  == Actual Results: 9223372036854775807? -> PASS

Row 4 (id: 4 | val: 9223372036854775808)
- Expected Results: <Error Msg>  == Actual Results: 9223372036854775807?-> FAIL

Row 5 (id: 5 | val: -9223372036854775807)
- Expected Results: -9223372036854775807  == Actual Results: -9223372036854775807? -> PASS

Row 6 (id: 6 | val: -9223372036854775808)
- Expected Results: -9223372036854775808  == Actual Results: null? -> FAIL

Row 7 (id: 7 | val: -9223372036854775809)
- Expected Results: <Error Msg>  == Actual Results: null -> FAIL

Result Explanation

The results show 3 passes out of 7 test cases. According to the C++ datatype, the BIGINT datatype has 8 bytes in the -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807. The way MapD's detection handling to outside boundaries is incorrect and doesn't throw error exceptions.

Suggestions

What values should MapD give for outside boundaries? Here are two suggestions:

Throw an error exception
0 or null

Let me know your thoughts.

JDBC PreparedStatement won't work with parameter as last element

Using release 3.1.1 and the associated jdbc driver lib.

CREATE TABLE t (x INT NOT NULL);

In Java code, assuming properly set up connection:

PreparedStatement ps = conn.prepareStatement("SELECT x FROM t WHERE x = ?");
ps.setInt(1, 1)

fails with:

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
	at com.mapd.jdbc.MapDPreparedStatement.setInt(MapDPreparedStatement.java:175)

This is due to the MapDPreparedStatement at https://github.com/mapd/mapd-core/blob/master/java/mapdjdbc/src/main/java/com/mapd/jdbc/MapDPreparedStatement.java#L84 initially splitting the SQL String using .split. The javadoc for that method states:

Trailing empty strings are therefore not included in the resulting array.

Therefore the subsequent line of code calculates the number of expected parameters as zero.

A simple workaround is to include an additional space at the end of the statement i.e. "SELECT x FROM t WHERE x = ? " but that shouldn't really be necessary.

Multiple table join give incorrect result when joining to two dimension tables

Setup - Watchdog is not enabled --enable-watchdog=false

drop table flights;
drop table airports_origin;
drop table airports_dest;

create table flights (flight text, dest text, origin text);
insert into flights values ('f1', 'nyc','la');

create table airports_origin (iata text, name text);
create table airports_dest (iata text, name text);

insert into airports_origin values ('nyc', 'New York');
insert into airports_origin values ('la', 'Los Angeles');

insert into airports_dest values ('nyc', 'New York');
insert into airports_dest values ('la', 'Los Angeles');

select count(*) from flights f join airports_dest d on f.dest=d.iata join airports_origin o on f.origin=o.iata;

result as expected

mapdql> select count(*) from flights f join airports_dest d on f.dest=d.iata join airports_origin o on f.origin=o.iata;
EXPR$0
1

add additional return flight

insert into flights values ('f2', 'la','nyc');

incorrect result

mapdql> select count(*) from flights f join airports_dest d on f.dest=d.iata join airports_origin o on f.origin=o.iata;
EXPR$0
1

Cuda driver moved in homebrew

Looks like the Cuda driver moved for OSX.
Error: No available formula with the name "cuda"
It was migrated from caskroom/cask to caskroom/drivers.
You can access it again by running:
brew tap caskroom/drivers

brew cask install cuda works after the above statement was ran.

Support for Geospatial Queries

Would be super useful for a lot of different types of spatial analysis.

Incompatible clang causes nvcc and thread-local storage issues

I started getting nvcc fatal : The version ('80100') of the host compiler ('Apple clang') is not supported errors on build, even though this install guide on NVidia said that clang 8.0 would work.

So I downgraded via instructions on this thread and was able to get almost everything to build, but when I started getting thread-local storage not supported errors, I had to switch back. I didn't know how to switch back so I just downloaded the newer command line tools package from the Apple developer downloads site.

Just wanted to share how I got past some issues that had me stuck for a while. Maybe there's a way to fix these down the line?

Make `APPROX_COUNT_DISTINCT` error configurable

Add a second, optional parameter to APPROX_COUNT_DISTINCT to control the error rate of the underlying HyperLogLog structure.

Supporting Geospatial Operations

Hello,

Currently, MapD does not support a variety of geo-spatial operations such as spatial kNN queries, spatial join queries, etc. Since, these queries are compute intensive, GPUs will be tremendously beneficial for boosting their performance. I (as a part of my PhD work), am working towards adding this functionality in MapD-core and planning to release it to the community. I am currently understanding the code.

Here's a post from mapd-core forum where I has posted my queries related to code docs. I am creating an issue (as suggested by @aaron_mapd) here to get more help. I am willing to work with the dev team to better understand the existing code of MapD-core, designing my solution, and releasing it. Any help in this regard will be greatly appreciated!

Thanks,
Harshada

frontend compilation error?

I successfully installed all of the dependencies but when I go to make -j 4 it looks to crash at a part that is called "frontend". Is there an explicit cause for this? I am using the =debug configuration

root:/home/ubuntu/mapd-core/build# make -j 4
[  0%] Building CXX object ThirdParty/poly2tri/CMakeFiles/poly2tri.dir/poly2tri/common/shapes.cc.o
[  0%] Generating gen-cpp/MapD.cpp, gen-cpp/MapD.h, gen-cpp/mapd_constants.cpp, gen-cpp/mapd_types.cpp
[  0%] Generating ../gen-cpp/CalciteServer.cpp, ../gen-cpp/calciteserver_constants.cpp, ../gen-cpp/calciteserver_types.cpp
[  1%] Building CXX object ThirdParty/poly2tri/CMakeFiles/poly2tri.dir/poly2tri/sweep/advancing_front.cc.o
Scanning dependencies of target calciteserver_thrift
[  1%] Building CXX object ThirdParty/poly2tri/CMakeFiles/poly2tri.dir/poly2tri/sweep/cdt.cc.o
[  2%] Building CXX object ThirdParty/poly2tri/CMakeFiles/poly2tri.dir/poly2tri/sweep/sweep.cc.o
Scanning dependencies of target mapd_thrift
[  2%] Building CXX object ThirdParty/poly2tri/CMakeFiles/poly2tri.dir/poly2tri/sweep/sweep_context.cc.o
[  3%] Linking CXX static library libpoly2tri.a
[  3%] Built target poly2tri
[  3%] Building C object ThirdParty/sqlite3/CMakeFiles/sqlite3.dir/sqlite3.c.o
[  4%] Linking C static library libsqlite3.a
[  5%] Building CXX object Calcite/CMakeFiles/calciteserver_thrift.dir/__/gen-cpp/CalciteServer.cpp.o
[  6%] Building CXX object CMakeFiles/mapd_thrift.dir/gen-cpp/MapD.cpp.o
[  6%] Built target sqlite3
[  7%] Building CXX object Utils/CMakeFiles/Utils.dir/StringLike.cpp.o
[  7%] Building CXX object Utils/CMakeFiles/Utils.dir/Regexp.cpp.o
[  8%] Building CXX object Utils/CMakeFiles/Utils.dir/ChunkIter.cpp.o
[  8%] Linking CXX static library libUtils.a
[  8%] Built target Utils
[  9%] Building CXX object Fragmenter/CMakeFiles/Fragmenter.dir/InsertOrderFragmenter.cpp.o
[  9%] Building CXX object Calcite/CMakeFiles/calciteserver_thrift.dir/__/gen-cpp/calciteserver_constants.cpp.o
[  9%] Linking CXX static library libFragmenter.a
[ 10%] Building CXX object Calcite/CMakeFiles/calciteserver_thrift.dir/__/gen-cpp/calciteserver_types.cpp.o
[ 10%] Built target Fragmenter
[ 10%] Linking CXX static library libcalciteserver_thrift.a
[ 11%] Generating Parser.cpp
/home/ubuntu/mapd-core/Parser/parser.y contains 5 shift/reduce conflicts and 2 reduce/reduce conflicts.
[ 11%] Built target ParserFiles
[ 12%] Generating Scanner.cpp
[ 12%] Building CXX object CMakeFiles/mapd_thrift.dir/gen-cpp/mapd_constants.cpp.o
[ 12%] Built target calciteserver_thrift
[ 12%] Built target ScannerFiles
[ 13%] Building CXX object CMakeFiles/mapd_thrift.dir/gen-cpp/mapd_types.cpp.o
[ 13%] Building CXX object CudaMgr/CMakeFiles/CudaMgr.dir/CudaMgr.cpp.o
[ 13%] Building CXX object Shared/CMakeFiles/Shared.dir/Datum.cpp.o
[ 15%] Building CXX object Shared/CMakeFiles/Shared.dir/timegm.cpp.o
[ 15%] Linking CXX static library libCudaMgr.a
[ 15%] Building CXX object Shared/CMakeFiles/Shared.dir/mapd_glob.cpp.o
[ 15%] Built target CudaMgr
[ 16%] Creating directories for 'frontend'
[ 17%] Building CXX object Shared/CMakeFiles/Shared.dir/StringTransform.cpp.o
[ 17%] Linking CXX static library libmapd_thrift.a
[ 17%] Performing download step (download, verify and extract) for 'frontend'
-- File already exists but no hash specified (use URL_HASH):
  file='/home/ubuntu/mapd-core/build/external/src/mapd2-dashboard-v2-f23da32-unofficial-prod.zip'
Old file will be removed and new file downloaded from URL.
-- Downloading...
   dst='/home/ubuntu/mapd-core/build/external/src/mapd2-dashboard-v2-f23da32-unofficial-prod.zip'
   timeout='none'
-- Using src='https://builds.mapd.com/frontend/mapd2-dashboard-v2-f23da32-unofficial-prod.zip'
[ 17%] Linking CXX static library libShared.a
-- Configuring done
[ 17%] Built target mapd_thrift
[ 18%] Generating bin/mapd_web_server
/bin/sh: 1: /usr/bin/go: not found
make[2]: *** [bin/mapd_web_server] Error 127
make[1]: *** [CMakeFiles/mapd_web_server.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
[ 18%] Built target Shared
-- [download 1% complete]
-- [download 3% complete]
-- Generating done
-- [download 4% complete]
-- [download 5% complete]
-- [download 7% complete]
-- Build files have been written to: /home/ubuntu/mapd-core/build
[ 18%] Built target rerun_cmake
-- [download 8% complete]
-- [download 9% complete]
-- [download 11% complete]
-- [download 12% complete]
-- [download 13% complete]
-- [download 14% complete]
-- [download 16% complete]
-- [download 17% complete]
-- [download 18% complete]
-- [download 20% complete]
-- [download 21% complete]
-- [download 22% complete]
-- [download 24% complete]
-- [download 25% complete]
-- [download 26% complete]
-- [download 28% complete]
-- [download 29% complete]
-- [download 30% complete]
-- [download 32% complete]
-- [download 33% complete]
-- [download 34% complete]
-- [download 35% complete]
-- [download 37% complete]
-- [download 38% complete]
-- [download 39% complete]
-- [download 41% complete]
-- [download 42% complete]
-- [download 43% complete]
-- [download 45% complete]
-- [download 46% complete]
-- [download 47% complete]
-- [download 49% complete]
-- [download 50% complete]
-- [download 51% complete]
-- [download 53% complete]
-- [download 54% complete]
-- [download 55% complete]
-- [download 57% complete]
-- [download 58% complete]
-- [download 59% complete]
-- [download 60% complete]
-- [download 62% complete]
-- [download 63% complete]
-- [download 64% complete]
-- [download 66% complete]
-- [download 67% complete]
-- [download 68% complete]
-- [download 70% complete]
-- [download 71% complete]
-- [download 72% complete]
-- [download 74% complete]
-- [download 75% complete]
-- [download 76% complete]
-- [download 78% complete]
-- [download 79% complete]
-- [download 80% complete]
-- [download 82% complete]
-- [download 83% complete]
-- [download 84% complete]
-- [download 85% complete]
-- [download 87% complete]
-- [download 88% complete]
-- [download 89% complete]
-- [download 91% complete]
-- [download 92% complete]
-- [download 93% complete]
-- [download 95% complete]
-- [download 96% complete]
-- [download 97% complete]
-- [download 99% complete]
-- [download 100% complete]
-- Downloading... done
-- extracting...
     src='/home/ubuntu/mapd-core/build/external/src/mapd2-dashboard-v2-f23da32-unofficial-prod.zip'
     dst='/home/ubuntu/mapd-core/build/external/src/frontend'
-- extracting... [tar xfz]
-- extracting... [analysis]
-- extracting... [rename]
-- extracting... [clean up]
-- extracting... done
[ 18%] No patch step for 'frontend'
[ 19%] No update step for 'frontend'
[ 20%] No configure step for 'frontend'
[ 20%] No build step for 'frontend'
[ 20%] No install step for 'frontend'
[ 21%] Completed 'frontend'
[ 21%] Built target frontend
make: *** [all] Error 2

sql_execute_gpudf` returns 'Exception: CHAR is not supported in temporary table.') for TEXT columns

If a column type is a TEXT or [VAR]CHAR and is queried via sql_execute_gpudf returns an error like TMapDException(error_msg='Exception: CHAR is not supported in temporary table.')
Despite the fact that the compression type is ENCODING DICT.

Though, I expect from (mapd-core/QueryEngine/InputMetadata.cpp line ~76-79)

bool uses_int_meta(const SQLTypeInfo& col_ti) {
   return col_ti.is_integer() || col_ti.is_decimal() || col_ti.is_time() || col_ti.is_boolean() ||
              (col_ti.is_string() && col_ti.get_compression() == kENCODING_DICT);
}

if the type is string (any (CHAR || TEXT || VARCHAR)) and the encoding is DICT it should evaluate the column as INT meta.

Add support for Truncate

Add TRUNCATE TABLE <TABLENAME> to product.

result of truncate:

Table definition exists for table but all data on disk and in memory is removed.
Will currently remove data from dictionary but there is an open question of whether we may want an option to keep dictionary.

Arrow API improvements for POSIX shared memory

In reviewing https://github.com/mapd/mapd-core/blob/master/QueryEngine/ResultSetConversion.cpp I suspect there are some things that could be done to make reading and writing IPC messages with POSIX shared memory a lot simpler for third parties. I opened https://issues.apache.org/jira/browse/ARROW-1385 upstream in Arrow about this

Build on macOS Sierra failed: Importer.cpp calls deprecated gdal class OGRSFDriverRegistrar

Failed to build mapd-core on macOS Sierra (v 10.12.6 ) with the following error messages during build stage (i.e. 'make -j 4' invocation).
Note: I installed all dependencies captured in "mapd-deps-osx.sh" and also set all ENV variables defined there.
Note: I installed gdal version 1.11.5 via Homebrew (see below):

$ brew info gdal
gdal: stable 1.11.5 (bottled), HEAD
Geospatial Data Abstraction Library
http://www.gdal.org/
/usr/local/Cellar/gdal/1.11.5_3 (421 files, 36.4MB) *
  Built from source on 2017-08-12 at 21:50:07 with: --with-libkml
From: https://github.com/Homebrew/homebrew-core/blob/master/Formula/gdal.rb
==> Dependencies
Required: libpng ✔, jpeg ✔, giflib ✔, libtiff ✔, libgeotiff ✔, proj ✔, geos ✔, json-c ✔, libxml2 ✔, pcre ✔, sqlite ✔, freexl ✔, libspatialite ✔
Optional: postgresql ✘, mysql ✘, armadillo ✘
==> Requirements
Build: java >= 1.7 ✔, fortran ✔, git ✔
Optional: java >= 1.7 ✔, python3 ✔

---- Excerpt of build error messages

[ 89%] Linking CXX executable ImportTest
[ 89%] Linking CXX executable bin/mapd_server
[ 89%] Linking CXX executable StorageTest
[ 89%] Linking CXX executable ExecuteTest
Undefined symbols for architecture x86_64:
  "OGRSFDriverRegistrar::Open(char const*, int, OGRSFDriver**)", referenced from:
      Importer_NS::openGDALDataset(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) in libCsvImport.a(Importer.cpp.o)
  "OGRLineString::getPoint(int, OGRPoint*) const", referenced from:
      Importer_NS::Importer::readVerticesFromGDALGeometryZ(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, OGRPolygon*, Importer_NS::PolyData2d&, bool) in libCsvImport.a(Importer.cpp.o)
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [Tests/ImportTest] Error 1
make[1]: *** [Tests/CMakeFiles/ImportTest.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
Undefined symbols for architecture x86_64:
  "OGRSFDriverRegistrar::Open(char const*, int, OGRSFDriver**)", referenced from:
      Importer_NS::openGDALDataset(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) in libCsvImport.a(Importer.cpp.o)
Undefined symbols for architecture x86_64:
  "OGRSFDriverRegistrar::Open(char const*, int, OGRSFDriver**)", referenced from:
      Importer_NS::openGDALDataset(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) in libCsvImport.a(Importer.cpp.o)
Undefined symbols for architecture x86_64:
  "OGRSFDriverRegistrar::Open(char const*, int, OGRSFDriver**)", referenced from:
      Importer_NS::openGDALDataset(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) in libCsvImport.a(Importer.cpp.o)
  "OGRLineString::getPoint(int, OGRPoint*) const", referenced from:
      Importer_NS::Importer::readVerticesFromGDALGeometryZ(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, OGRPolygon*, Importer_NS::PolyData2d&, bool) in libCsvImport.a(Importer.cpp.o)
  "OGRLineString::getPoint(int, OGRPoint*) const", referenced from:
      Importer_NS::Importer::readVerticesFromGDALGeometryZ(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, OGRPolygon*, Importer_NS::PolyData2d&, bool) in libCsvImport.a(Importer.cpp.o)
  "OGRLineString::getPoint(int, OGRPoint*) const", referenced from:
      Importer_NS::Importer::readVerticesFromGDALGeometryZ(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, OGRPolygon*, Importer_NS::PolyData2d&, bool) in libCsvImport.a(Importer.cpp.o)
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [bin/mapd_server] Error 1
make[1]: *** [CMakeFiles/mapd_server.dir/all] Error 2
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [Tests/StorageTest] Error 1
make[1]: *** [Tests/CMakeFiles/StorageTest.dir/all] Error 2
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [Tests/ExecuteTest] Error 1
make[1]: *** [Tests/CMakeFiles/ExecuteTest.dir/all] Error 2
make: *** [all] Error 2

Failing to load Sharded table with TEXT shard key

Create statements:

CREATE TABLE stories (
  id TEXT NOT NULL ENCODING DICT(32), 
  profile_id TEXT NOT NULL ENCODING DICT(32),
  dma TEXT ENCODING DICT(32),
  SHARD KEY (profile_id)
)  
WITH (shard_count = 10);

CREATE profiles (
  id TEXT NOT NULL ENCODING DICT(32),
  audience_ids TEXT[] NOT NULL ENCODING DICT(32),
  SHARD KEY (id), SHARED DICTIONARY (id) REFERENCES stories(profile_id)
)
WITH (shard_count = 10);

Copy statement

COPY profiles from '/data/profiles.csv' with (header='false');
COPY stories from '/data/stories.csv' with (header='false');

data is
profile.csv

<REDACTED>,{interest:foodie,interest:luxury-fashion,interest:music,interest:afam}

stories.csv

2017-06-i-REDACTED,<REDACTED>,501

Currently failing with

F0721 11:46:21.144613 16246 Importer.cpp:859] Check failed: false

it is rejecting all shard types except integer

in centos 7, bash deploy.sh is wrong

it is wrong.

[root@localhost ~]# curl -OJ https://internal-dependencies.mapd.com/mapd-deps/deploy.sh

% Total % Received % Xferd Average Speed Time Time Time Current

                             Dload  Upload   Total   Spent    Left  Speed

100 1768 100 1768 0 0 1050 0 0:00:01 0:00:01 --:--:-- 1049

[root@localhost ~]# sudo bash deploy.sh

........................................

FLAG=latest
'[' '' == --testing ']'
sudo wget --continue https://internal-dependencies.mapd.com/mapd-deps/mapd-deps-latest.tar.xz

--2017-05-22 21:47:56-- https://internal-dependencies.mapd.com/mapd-deps/mapd-deps-latest.tar.xz

Resolving internal-dependencies.mapd.com (internal-dependencies.mapd.com)... failed: Name or service not known.

wget: unable to resolve host address ?.nternal-dependencies.mapd.com?

Hash Join exception and multi-table JOIN??

I run the following query,
SELECT count(a.imsi),sum(a.download) from test_a a JOIN test_zc1 b ON a.imsi =b.imsi;
and it returns an exception:
Exception: Hash join failed, reason: Could not build a 1-to-1 correspondence for columns involved in equijoin
The test_a table contains 200 millions of records, and the test_zc1 contains 800 millions. So what's the meaning of the exception?
More, I have some doubts about the JOIN operation in Mapd.

Whether Mapd support multi-table JOIN or not?
Can I treat the subquery result table as the right table of JOIN ? Like this query:

select MSISDN,20140926,AREA_ID,USER_ID from DATUM.MD_PER_INF_ATTRIBUTE_DAY_03 z
 join (select S00000000023 from EDC_SUB_TEMP_10000049978_ZZ5 a 
 left join DATUM.MD_VW_PER_USER02_DAY_03 b on b.USER_ID=a.S00000000023 and b.DAY_NUMBER = 25 and b.IS_HMD_MDR_MG=1 where b.USER_ID is null ) y  
on z.USER_ID=y.S00000000023 where z.DAY_NUMBER = 26 and z.AREA_ID in (select t.area_id from edc_area t where t.parent_area_id in (1000515));

Remove our special aiias handling as Calcite 1.13 should support this directly

Currenty we process and manage aliases in sql ourselves via manipulation in the Calcite parser interface. We should not have to do this as Calcite 1.13 should now allow for aliases in group by etc..

Remove our code and confirm all works with native Calcite.

GPU is not used on Jetson TX2

My machine has 8GB of RAM and CUDA 8 on Ubuntu 16.04 and I've set the following env vars:

LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
LD_LIBRARY_PATH=/usr/lib/jvm/java-8-openjdk-arm64/jre/lib/aarch64/server:$LD_LIBRARY_PATH
LD_LIBRARY_PATH=/usr/local/mapd-deps/lib:$LD_LIBRARY_PATH

When running mapd the following error appears in the console:

E0511 11:10:42.262192 16533 MapDHandler.cpp:282] No GPUs detected, falling back to CPU mode

Note: The web based query editor / chart generator is amazing. Here is a screenshot with uname -a and the warning I see:

SMALLINT Datatype has flaws

ISSUE

MapD's SMALLINT datatype has several flaws upon applied boundary conditions. To reproduce the issues, run the follow sql statements on the MapDQL cursor:

CREATE TABLE t2(id SMALLINT, val SMALLINT);
INSERT INTO T2 VALUES(1, 0);
INSERT INTO t2 VALUES(2, 32766);
INSERT INTO t2 VALUES(3, 32767);
INSERT INTO t2 VALUES(4, 32768);
INSERT INTO t2 VALUES(5, -32767);
INSERT INTO t2 VALUES(6, -32768);
INSERT INTO t2 VALUES(7, -32769);
INSERT INTO t2 VALUES(8, 9992222222222222222222);
SELECT * FROM t2;

Results from the SELECT statement

mapdql> SELECT * FROM t2;
id|val
1|0
2|32766
3|32767
4|NULL
5|-32767
6|NULL
7|32767
8|-1

Comparisons of Expected and Actual Results

id: 1 and val: 0
- Expected Results: **0**  = Actual Results: **0**? -> PASS

id: 2 and val: 32766
- Expected Results: **32766**  == Actual Results: **32766?**? -> PASS

id: 3 and val: 32767 
- Expected Results: **32767**  == Actual Results: **32767**? -> PASS

id: 4 and val: 32768:
- Expected Results: **Error Msg**  == Actual Results: **NULL**?-> FAIL

id: 5 and val: -32767:
- Expected Results: **-32767**  == Actual Results: **-32767**? -> PASS

id: 6 and val: -32768:
- Expected Results: **-32768**  == Actual Results: **NULL**? -> FAIL

id: 7 and val: -32769:
- Expected Results: **Error Msg**  == Actual Results: **32767**? -> FAIL

id: 8 and val: 9992222222222222222222
- Expected Results: **Error Msg**  == Actual Results: **-1**? -> FAIL

Explanation

The SMALLINT datatype has 2 bytes in the -32768 to 32767 range. The way MapD is a detection handling to outside boundaries is incorrect and doesn't throw error exceptions. Also, Is the -1 result for the large number input part of an error?

Use Arrow vector appends vs. scalar appends

The Arrow builders support batch appends which have better performance than a loop with scalar appends, see:

https://github.com/mapd/mapd-core/blob/master/QueryEngine/ResultSetConversion.cpp#L223

One issue is that these APIs (e.g. see http://arrow.apache.org/docs/cpp/classarrow_1_1_primitive_builder.html#af56d2faa32f2008bf4fe8ceb4742b007) expect an array of bytes for the is_valid indicator rather than std::vector<bool>. I opened
https://issues.apache.org/jira/browse/ARROW-1383 to make this a bit easier

INT datatype doesn't have outside boundaries

ISSUE

MapD's INT datatype doesn't have outside boundaries. To reproduce the issues, run the follow sql statements on the MapDQL cursor:

CREATE TABLE t_int(id SMALLINT, val INT);
INSERT INTO t_int VALUES(1, NULL);
INSERT INTO t_int VALUES(2, 2147483646);
INSERT INTO t_int VALUES(3, 2147483647);
INSERT INTO t_int VALUES(4, 2147483648);
INSERT INTO t_int VALUES(5, -2147483647);
INSERT INTO t_int VALUES(6, -2147483648);
INSERT INTO t_int VALUES(7, -2147483649);

SELECT * FROM t_int ORDER BY id;

Results from the SELECT statement

mapdql> SELECT * FROM t_int ORDER BY id;
# id|val
# 1|NULL
# 2|2147483646
# 3|2147483647
# 4|NULL
# 5|-2147483647
# 6|NULL
# 7|2147483647

Comparisons of Expected and Actual Results

Row 1 (id: 1 | val: null)
- Expected Results: null  = Actual Results: null? -> PASS

Row 2 (id: 2 | val: 2147483646)
- Expected Results: 2147483646  == Actual Results: 2147483646? -> PASS

Row 3 (id: 3 | val: 2147483647)
- Expected Results: 2147483647  == Actual Results: 2147483647? -> PASS

Row 4 (id: 4 | val: 2147483648)
- Expected Results: <Error Msg>  == Actual Results: null?-> FAIL

Row 5 (id: 5 | val: -2147483647))
- Expected Results: -2147483647)  == Actual Results: -2147483647)? -> PASS

Row 6 (id: 6 | val: -2147483648)
- Expected Results: -2147483648  == Actual Results: null? -> FAIL

Row 7 (id: 7 | val: -2147483649)
- Expected Results: <Error Msg>  == Actual Results: 2147483647 -> FAIL

Result Explanation

The results show 3 passes out of 7 test cases. According to the C++ datatype, the INT datatype has 8 bytes in the -2147483648 to 2147483647. The way MapD's detection handling to outside boundaries is incorrect and doesn't throw error exceptions.

Suggestions

What values should MapD give for outside boundaries? Here are two suggestions:

Throw an error exception
0 or null

Let me know your thoughts.

aarch64 - StorageTest fails

On Nvidia's Tegra TX2 I was able to build mapd by doing:

Removed a compiler flag from folly and it compiled successfully.
Added the right java path to LD_LIBRARY_PATH: LD_LIBRARY_PATH=/usr/lib/jvm/java-8-openjdk-arm64/jre/lib/aarch64/server:$LD_LIBRARY_PATH

After compiling mapd, the sanity_tests say 83% success (1 of 6) and StorageTest fails without giving an specific error.

nvidia@tegra-ubuntu:~/code/mapd-core/build$ make sanity_tests
[  2%] Built target calciteserver_thrift
[  5%] Built target mapd_thrift
[  6%] Built target gtest
[ 10%] Built target poly2tri
[ 11%] Built target SqliteConnector
[ 14%] Built target Utils
[ 15%] Built target StringDictionary
[ 16%] Built target Fragmenter
[ 18%] Built target Calcite
[ 19%] Built target Catalog
[ 20%] Built target ScannerFiles
[ 20%] Built target ParserFiles
[ 22%] Built target Analyzer
[ 22%] Built target CudaMgr
[ 31%] Built target DataMgr
[ 32%] Built target Chunk
[ 35%] Built target Shared
[ 36%] Built target CsvImport
[ 80%] Built target QueryEngine
[ 81%] Built target Planner
[ 87%] Built target Parser
[ 89%] Built target StorageTest
[ 90%] Built target initdb
[ 93%] Built target ResultSetTest
[ 96%] Built target ExecuteTest
[ 97%] Built target PlanTest
[ 98%] Built target ImportTest
[100%] Built target ResultSetBaselineRadixSortTest
UpdateCTestConfiguration  from :/home/nvidia/code/mapd-core/build/Tests/DartConfiguration.tcl
UpdateCTestConfiguration  from :/home/nvidia/code/mapd-core/build/Tests/DartConfiguration.tcl
Test project /home/nvidia/code/mapd-core/build/Tests
Constructing a list of tests
Done constructing a list of tests
Checking test dependency graph...
Checking test dependency graph end

(... snip... )

test 8
    Start  8: StorageTest

8: Test command: /home/nvidia/code/mapd-core/build/Tests/StorageTest "--gtest_output=xml:../"
8: Test timeout computed to be: 9.99988e+06
8: [==========] Running 5 tests from 4 test cases.
8: [----------] Global test environment set-up.
8: [----------] 1 test from StorageLarge
8: [ RUN      ] StorageLarge.Numbers
8: Loaded 100000000 rows 3400000000 bytes in 88825 ms. at 38.2775 MB/sec.
8: Scanned 100000000 rows 3400000000 bytes in 48140 ms. at 70.6273 MB/sec.
8: Scanned 100000000 rows 3400000000 bytes in 36572 ms. at 92.9673 MB/sec.
8: [       OK ] StorageLarge.Numbers (944275 ms)
8: [----------] 1 test from StorageLarge (944275 ms total)
8: 
8: [----------] 2 tests from StorageSmall
8: [ RUN      ] StorageSmall.Strings
5/6 Test  #8: StorageTest ......................***Exception: Other986.20 sec
test 10
    Start 10: ImportTest

10: Test command: /home/nvidia/code/mapd-core/build/Tests/ImportTest "--gtest_output=xml:../"
10: Test timeout computed to be: 9.99988e+06
10: [==========] Running 2 tests from 1 test case.
10: [----------] Global test environment set-up.
10: [----------] 2 tests from Detect
10: [ RUN      ] Detect.DateTime
10: [       OK ] Detect.DateTime (0 ms)
10: [ RUN      ] Detect.Numeric
10: [       OK ] Detect.Numeric (1 ms)
10: [----------] 2 tests from Detect (1 ms total)
10: 
10: [----------] Global test environment tear-down
10: [==========] 2 tests from 1 test case ran. (2 ms total)
10: [  PASSED  ] 2 tests.
6/6 Test #10: ImportTest .......................   Passed    0.69 sec

The following tests passed:
	PlanTest
	ExecuteTest
	ResultSetTest
	ResultSetBaselineRadixSortTest
	ImportTest

83% tests passed, 1 tests failed out of 6

Total Test time (real) = 1238.78 sec

The following tests FAILED:
	  8 - StorageTest (OTHER_FAULT)
Errors while running CTest
Tests/CMakeFiles/sanity_tests.dir/build.make:62: recipe for target 'Tests/CMakeFiles/sanity_tests' failed
make[3]: *** [Tests/CMakeFiles/sanity_tests] Error 8
CMakeFiles/Makefile2:2288: recipe for target 'Tests/CMakeFiles/sanity_tests.dir/all' failed
make[2]: *** [Tests/CMakeFiles/sanity_tests.dir/all] Error 2
CMakeFiles/Makefile2:2295: recipe for target 'Tests/CMakeFiles/sanity_tests.dir/rule' failed
make[1]: *** [Tests/CMakeFiles/sanity_tests.dir/rule] Error 2
Makefile:780: recipe for target 'sanity_tests' failed
make: *** [sanity_tests] Error 2

Build uses wrong version of LLVM

I tried to compile mapd on my ubuntu 16.04 machine but it kept using the old llvm 3.6.0 version. Eventually, I changed set(llvm_config_cmd llvm-config) in https://github.com/mapd/mapd-core/blob/5a5dcc2aa94c7c507de58ca2cd201530964666cd/CMakeLists.txt#L156 to set(llvm_config_cmd llvm-config-3.9) and the build succeeded.

I'm not sure whether there is anything you can do but I wanted to note down the steps I took to make it work.

heavyai / heavydb Goto Github PK

heavydb's Introduction

HeavyDB (formerly OmniSciDB)

Downloads and Installation Instructions

Developing HeavyDB: Table of Contents

Links

License

Contributing

Building

Testing

AddressSanitizer

ThreadSanitizer

Generating Packages

Using

Starting Manually

Code Style

clang-format

clang-tidy

Dependencies

CentOS 7

CUDA

Environment Variables

Building Dependencies

macOS

CUDA

Java

Ubuntu

Ubuntu 16.04

Ubuntu 18.04

Environment Variables

CUDA

Building Dependencies

Arch

Package Version Requirements:

CUDA

Environment Variables

heavydb's People

Contributors

Stargazers

Watchers

Forkers

heavydb's Issues

ISSUE

Results from the SELECT statement

Comparisons of Expected and Actual Results

Result Explanation

Suggestions

ISSUE

Results from the SELECT statement

Explanation

ISSUE

Results from the SELECT statement

Comparisons of Expected and Actual Results

Result Explanation

Suggestions

ISSUE

Results from the SELECT statement

Comparisons of Expected and Actual Results

Explanation

ISSUE

Results from the SELECT statement

Comparisons of Expected and Actual Results

Result Explanation

Suggestions

Recommend Projects

Recommend Topics

Recommend Org