intel-hls / genomicsdb Goto Github PK

View Code? Open in Web Editor NEW

111.0 111.0 28.0 76.94 MB

GenomicsDB

License: Other

C++ 74.30% Python 4.89% Java 16.79% C 0.05% Scala 0.87% Shell 0.71% CMake 2.29% Dockerfile 0.11%

genomicsdb's People

Contributors

Stargazers

Watchers

genomicsdb's Issues

Segfault when importing VCF

Hi. I'm working on the Hail Team at the Broad Institute, and I was trying to import a VCF into GenomicsDB, but it caused a segfault. Here is the VCF file that I tried to import. Attached are the three JSON files for this VCF. The error message that I got is below:

ubuntu@ip-172-31-23-20:~/build_dir/tools$ vcf2tiledb /home/ubuntu/build_dir/jsonFiles/loader_config_file.json

[[19760,1],0]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
Host: ip-172-31-23-20

Another transport will be used instead, although this may result in
lower performance.

[ip-172-31-23-20:04729] *** Process received signal ***
[ip-172-31-23-20:04729] Signal: Segmentation fault (11)
[ip-172-31-23-20:04729] Signal code: Address not mapped (1)
[ip-172-31-23-20:04729] Failing at address: 0x1488780
[ip-172-31-23-20:04729] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f4582f36390]
[ip-172-31-23-20:04729] [ 1] vcf2tiledb[0x507b29]
[ip-172-31-23-20:04729] [ 2] vcf2tiledb[0x4d9fd2]
[ip-172-31-23-20:04729] [ 3] vcf2tiledb[0x4eca80]
[ip-172-31-23-20:04729] [ 4] vcf2tiledb[0x4eddd8]
[ip-172-31-23-20:04729] [ 5] vcf2tiledb[0x48d704]
[ip-172-31-23-20:04729] [ 6] vcf2tiledb[0x490c7f]
[ip-172-31-23-20:04729] [ 7] /usr/lib/x86_64-linux-gnu/libgomp.so.1(+0x16dfe)[0x7f458336fdfe]
[ip-172-31-23-20:04729] [ 8] /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7f4582f2c6ba]
[ip-172-31-23-20:04729] [ 9] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f4582c623dd]
[ip-172-31-23-20:04729] *** End of error message ***
Segmentation fault (core dumped)

JSON:
callsets.txt
vid_mapping_file.txt
loader_config_file.txt

Secondary indexes

Hi guys,

The last time I checked (6 months ago), GenomicsDB/TileDB didn't have a concept of secondary indexes, which was a showstopper factor for us to use it as a variant store for a large number of samples (tens of thousands WGS VCFs).

I wonder whether something has changed wrt secondary indexes on cell's attributes ?

thanks,
dmitry

BookKeeping error while using GATK4 GenomicsDBImport

Hello,

While using the GATK4 GenomicsDBImport (version >= 4.0.6.0) I'm keep getting the following error:

12:27:40.054 INFO  ProgressMeter -        Current Locus  Elapsed Minutes     Batches Processed   Batches/Minute
12:27:40.225 INFO  GenomicsDBImport - Importing batch 1 with 1 samples
terminate called after throwing an instance of 'VariantStorageManagerException'
  what():  VariantStorageManagerException exception : Error while finalizing TileDB array chr22$4514591$4617700
TileDB error message : [TileDB::BookKeeping] Error: Cannot finalize book-keeping; Writing domain size failed

This error happens on any type input.

I suspect that the dollar sign in the name of the directories can cause this on my system (although, I'm not 100% sure). I've posted similar question on the GATK forum and got a suggestion to try to communicate with the DB developers.

I'm ready to provide more information if necessary,

Thank you,
Timur

Called GTs are poorly formatted and sometimes inaccurate

Importing sample1:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  Sample1
chr20   1274366 .       A       G,<NON_REF>     562.77  .       DP=26;MQRankSum=-1.011;MQ_DP=26;QUALapprox=591;RAW_MQ=93600.00;ReadPosRankSum=0.578;VarDP=24    GT:AD:DP:GQ:PL:SB       1/1:1,16,0:16:99:368,166,0,379,171,547:1,0,12,11

and sample2:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  Sample2
chr20   1274365 .       TAA     T,TA,<NON_REF>  562.77  .       DP=26;MQRankSum=-1.011;MQ_DP=26;QUALapprox=591;RAW_MQ=93600.00;ReadPosRankSum=0.578;VarDP=24    GT:AD:DP:GQ:PL:SB       1/2:1,7,16,0:24:99:591,353,368,166,0,123,542,379,171,547:1,0,12,11

into a GDB and querying it with PRODUCE_GT_FIELD set to true produces:

chr20   1274365 .       TAA     T,TA,<NON_REF>  .       .       DP=26;MQRankSum=-1.011e+00;MQ_DP=26;QUALapprox=591;RAW_MQ=93600.00;ReadPosRankSum=0.578;VarDP=24        GT:AD:DP:GQ:PL:SB       0       1/2:1,7,16,0:24:99:591,353,368,166,0,123,542,379,171,547:1,0,12,11
chr20   1274366 .       A       G,*,<NON_REF>   .       .       DP=52;MQRankSum=-1.011e+00;MQ_DP=26;QUALapprox=591;RAW_MQ=93600.00;ReadPosRankSum=0.578;VarDP=24        GT:AD:DP:GQ:PL:SB       1/1:1,16,0,0:16:99:368,166,0,379,171,547,379,171,547,547:1,0,12,11      3/2:1,0,16,0:24:99:591,542,547,166,171,123,542,547,171,547:1,0,12,11
chr20   1274367 .       A       *,<NON_REF>     .       .       DP=26   GT:AD:DP:GQ:PL:SB       0       2/1:1,16,0:24:99:591,166,123,542,171,547:1,0,12,11

Note that the GT for the first sample at the last position is output as 2/1 (I'm not sure if that's against the spec, but it's certainly against convention) and after examining the PLs, the best likelihood is actually on the 2/2 genotype.

Can not recognize the vcf/gvcf

Hi,

I am trying to import two samples to the database, both of them are WES data.
I did both the block compression and indexing.
when I run the vcf2tiledb it give me the following error.
I tried on both vcf and gvcf and it give me the same error.

~/etc$ ./bin/vcf2tiledb /home/.../data/loader_config_file.json
[E::bcf_hdr_read_required_sample_line] Input is not detected as bcf or vcf format
Segmentation fault (core dumped)

Any suggestion?

Kind regards
Amin

Trouble Installing

We are having trouble installing on Redhat 7 with gcc 4.9.4 installed.

I pulled the latest version with:
git clone --recursive https://github.com/Intel-HLS/GenomicsDB.git
I have installed protobuf/3.6.1 safestringlib/1.0.0 and googletest and included the paths in cmake:

cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/path/GenomicsDB -DPROTOBUF_LIBRARY=/path/protobuf/lib -DPROTOBUF_INCLUDE_DIR=/path/protobuf/include -DSAFESTRINGLIB_DIR=/path/safestringlib/1.0.0 -DSAFESTRINGLIB_INCLUDE_DIR=/path/safestringlib/1.0.0/include -DSAFESTRINGLIB_LIBRARY=/path/safestringlib/1.0.0/lib/libsafestring.a -DGTEST_LIBRARY=/path/googletest/lib/libgtest.a -DGTEST_INCLUDE_DIR=/path/googletest/include -DGTEST_MAIN_LIBRARY=/path/googletest/lib/libgtest_main.a

then
make

The error I get is:
[ 84%] Linking CXX executable runAllGTests
../../../main/libgenomicsdb.a(variant_operations.cc.o): In function remap_allele_specific_annotations(std::vector<unsigned char, std::allocator<unsigned char> > const&, std::vector<unsigned char, std::allocator<unsigned char> >&, unsigned long, MergedAllelesIdxLUT<true, true> const&, unsigned int, bool, unsigned int, FieldInfo const&)': variant_operations.cc:(.text+0x1ab5): undefined reference to memcpy_s(void*, unsigned long, void const*, unsigned long)'
variant_operations.cc:(.text+0x1b23): undefined reference to memcpy_s(void*, unsigned long, void const*, unsigned long)' ../../../main/libgenomicsdb.a(variant_operations.cc.o): In function GA4GHOperator::operate(Variant&, VariantQueryConfig const&)':
variant_operations.cc:(.text+0x3a6a): undefined reference to memcpy_s(void*, unsigned long, void const*, unsigned long)' ../../../main/libgenomicsdb.a(variant_operations.cc.o): In function VariantFieldDatastd::string::copy_from(VariantFieldBase const*)':
variant_operations.cc:(.text._ZN16VariantFieldDataISsE9copy_fromEPK16VariantFieldBase[_ZN16VariantFieldDataISsE9copy_fromEPK16VariantFieldBase]+0x93): undefined reference to memcpy_s(void*, unsigned long, void const*, unsigned long)' ../../../main/libgenomicsdb.a(variant_operations.cc.o): In function VariantFieldDatastd::string::binary_serialize(std::vector<unsigned char, std::allocator >&, unsigned long&) const':
variant_operations.cc:(.text._ZNK16VariantFieldDataISsE16binary_serializeERSt6vectorIhSaIhEERm[_ZNK16VariantFieldDataISsE16binary_serializeERSt6vectorIhSaIhEERm]+0x57): undefined reference to memcpy_s(void*, unsigned long, void const*, unsigned long)' ../../../main/libgenomicsdb.a(variant_operations.cc.o):variant_operations.cc:(.text._ZN16VariantFieldDataISsE18binary_deserializeEPKcRmbj[_ZN16VariantFieldDataISsE18binary_deserializeEPKcRmbj]+0x5f): more undefined references to memcpy_s(void*, unsigned long, void const*, unsigned long)' follow
collect2: error: ld returned 1 exit status
make[2]: *** [src/test/cpp/src/runAllGTests] Error 1
make[1]: *** [src/test/cpp/src/CMakeFiles/runAllGTests.dir/all] Error 2
make: *** [all] Error 2

Any suggestions on what step I am missing?

Thank you

Fields information field_info type of "flag"

Hi folks,

As my first test case, I am looking to run vcf2tiledb against the example vcf presented in the VCF Specification. While crafting the Fields information section of the vid_mapping_file, I'm unsure how to best represent the following fields, as there is no "type":"flag" documented in the GenomicsDB docs

From the example VCF:
##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership, build 129">
##INFO=<ID=H2,Number=0,Type=Flag,Description="HapMap2 membership">

And I'd hoped to present the fields to the import program as follows:
"fields" : {
...
"DB": { "vcf_field_class":["INFO"], "type":"flag" },
"H2": { "vcf_field_class":["INFO"], "type":"flag" },
...
}

Can you provide any guidance on the best approach or point to the relevant section of the documentation, in case I have missed it. Thanks in advance

An installation problem with libraries in /usr/lib64?

Hi, there.

My Linux machine is:
3.10.0-229.el7.x86_64

After executing make, it failed with the following error message:
Creating dynamic library bin/libtiledbgenomicsdb.so
/usr/bin/ld: cannot find -lstdc++
collect2: error: ld returned 1 exit status
make: *** [bin/libtiledbgenomicsdb.so] Error 1

But there is still such a library there:
lrwxrwxrwx 1 root root 19 Jun 24 12:22 libstdc++.so -> libstdc++.so.6.0.19
lrwxrwxrwx. 1 root root 19 Oct 16 2015 libstdc++.so.6 -> libstdc++.so.6.0.19
-rwxr-xr-x. 1 root root 991744 Nov 14 2014 libstdc++.so.6.0.19

The apparent problem is about their place in /usr/lib64, not /usr/lib which the make file is currently oriented...

Any advice?

Thanks,

Angel Villahoz-Baleta.

Trouble installing in Ubuntu

having trouble installing in Ubuntu 16.04

issue similar to #226

attached are the log files
genomedb_make.txt
genomej8.txt

build difficulties, docker error

Hi,

We are excited about the potential to use GenomicsDB for some large cohort projects at NIH, both with GATK and for other purposes, but I am having trouble getting the build of GenomicsDB to succeed on any of my Mac, Ubuntu, or Centos7 - various dependencies and env paths not found on each despite try to add them.

Could it be the case that the documentation here:
https://github.com/Intel-HLS/GenomicsDB/wiki/Building-GenomicsDB-Version-0.4.0
is out of date? With first issue being need to use cmake ?

I was hoping to be able to use Docker, at least to see a working example by studying the docker file, but I find that:

docker run -it -v myoutput:/output/ intelhlsgenomicsdb/genomicsdb_builder build_genomicsdb

fails with

docker run -it -v myoutput:/output/ intelhlsgenomicsdb/genomicsdb_builder build_genomicsdb
....
   Called from: [3]	/home/default/build_src/GenomicsDB/cmake/Modules/Findlibuuid.cmake
                [2]	/home/default/build_src/GenomicsDB/cmake/Modules/FindTileDB.cmake
                [1]	/home/default/build_src/GenomicsDB/CMakeLists.txt
CMake Error at /usr/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:108 (message):
  Could not find libuuid headers and/or libraries (missing:
  LIBUUID_INCLUDE_DIR LIBUUID_LIBRARY)
Call Stack (most recent call first):
  /usr/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:315 (_FPHSA_FAILURE_MESSAGE)
  cmake/Modules/Findlibuuid.cmake:14 (find_package_handle_standard_args)
  cmake/Modules/FindTileDB.cmake:37 (find_package)
  CMakeLists.txt:110 (find_package)


   Called from: [3]	/home/default/build_src/GenomicsDB/cmake/Modules/Findlibuuid.cmake
                [2]	/home/default/build_src/GenomicsDB/cmake/Modules/FindTileDB.cmake
                [1]	/home/default/build_src/GenomicsDB/CMakeLists.txt
-- Configuring incomplete, errors occurred!
See also "/home/default/build_src/GenomicsDB/build/CMakeFiles/CMakeOutput.log".
ERROR: build_gdb not find Makefile
/usr/bin/build_genomicsdb: line 78: return: -1: invalid option
return: usage: return [n]
FAIL: cannot build GenomicsDB
/usr/bin/build_genomicsdb: line 95: return: can only `return' from a function or sourced script

I'm happy to use any flavor of Linux you recommend.
Any suggestions, or especially recipe to follow on a fresh install of whatever OS you suggest, would be much appreciated.

For our projects we are likely to use the Java wrappers and Spark, so possibly we could use the maven artifact you publish at:
https://mvnrepository.com/artifact/com.intel/genomicsdb
and not have to worry about doing the build, does this seem viable?

However - I would very much like to get the build to succeed locally in any case.

Thanks.
Justin Paschall
Bioinformatics Scientist
National Human Genome Research Institute (NHGRI)
National Institutes of Health

No Sample Line

Hi,

When I try to import a vcf file into the database by using the vcf2tiledb command I got the following error:
[E::vcf_hdr_read_requierd_sample_line] No sample line

Do you have any recommendation?

thank you very much

Error: Cannot load array schema when creating a new array

This error message is emitted when creating a new array. It probably shouldn't be since it's not actually indicating an error in that case.

[TileDB::StorageManager] Error: Cannot load array schema; Array '/Users/louisb/Workspace/gatk/src/test/resources/org/broadinstitute/hellbender/engine/GenomicsDBIntegration/workspace/tinyArray' does not exist.

Test

CSV Data Import in GenomicsDB

I have a CSV file that I have formulated myself as a CSV file containing the following columns per line

1,10583,10583,0|0,0|0,,
2,10611,10611,0|0,0|0,,
3,13302,13302,0|0,0|0,,
4,13327,13327,0|0,0|0,,
5,13957,13957,0|0,0|0,,
6,13980,13980,0|0,0|0,,
7,30923,30923,0|0,0|0,,
8,46402,46402,0|0,0|0,,
9,47190,47190,0|0,0|0,,
10,51476,51476,0|1,0|1,,

What could be the possible callsets and vid_mapping file to successfully run vcf2tiledb to successfully import it into genomicsdb

Another segfault when importing VCF

I was trying to import a VCF into GenomicsDB, but it caused a segfault. The chromosome length is 249250621 and the program is interrupted at 207235099 position three times. Attached are the three JSON files for this VCF. The error message that I got is below:

vcf2tiledb[51877]: segfault at 1b7023b0 ip 00000000004f34a1 sp 00007fff71167ce0 error 4 in vcf2tiledb[400000+31f000]

Jsons:
callset.json.split2.docx
loader_config_file.json.split2.docx
vid_mapping_file.json.split2.docx

No valid combination operation found for INFO field QS

Hi,

I'm testing GenomicsDB with vcf2tiledb, but some fields in INFO can not be handled correctly.

From the header of VCF:
##INFO=<ID=QS,Number=R,Type=Float,Description="Auxiliary tag used for calling">

From the INFO of VCF:
I16=1,0,0,0,35,1225,0,0,60,3600,0,0,0,0,0,0;MQ0F=0;DP=1;QS=1,0;DPR=1,0

And I want to import the fields as below:
"fields" : {
"PL": { "vcf_field_class":["FORMAT"], "type":"int", "length":"G" },
"DP": { "vcf_field_class":[ "INFO", "FORMAT"], "type":"int", "length":1 },
"QS": { "vcf_field_class":["INFO"], "type":"float", "length":"R" }
}

But when I execute the vcf2tiledb, I get this warning:
WARNING: No valid combination operation found for INFO field QS - the field will NOT be part of INFO fields in the generated VCF records

Can you provide any guidance on the best way to resolve it.

Thanks in advance.

[src/main/CMakeFiles/GenomicsDB_library_object_files.dir/cpp/src/utils/vid_mapper_sql.cc.o] Error 1

Hi Developer!
I meet some Error when I install GenomicsDB according to the https://github.com/Intel-HLS/GenomicsDB/wiki/Compiling-GenomicsDB.

Server system is centos 7.2, and gcc version is 7.2.1.

Install history and log as follow:

$ git clone --recursive https://github.com/Intel-HLS/GenomicsDB.git
$ mkdir -p build
$ cd build
$ cmake /work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB-DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/work/Software/GenomicsDB-0.9.2
-- The C compiler identification is GNU 7.2.1
-- The CXX compiler identification is GNU 7.2.1
-- Check for working C compiler: /opt/rh/devtoolset-7/root/usr/bin/cc
-- Check for working C compiler: /opt/rh/devtoolset-7/root/usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: /opt/rh/devtoolset-7/root/usr/bin/c++
-- Check for working CXX compiler: /opt/rh/devtoolset-7/root/usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Performing Test test_cpp_2011
-- Performing Test test_cpp_2011 - Success
-- Found MPI_C: /usr/lib64/mpich-3.2/lib/libmpi.so
-- Found MPI_CXX: /usr/lib64/mpich-3.2/lib/libmpicxx.so;/usr/lib64/mpich-3.2/lib/libmpi.so
-- Try OpenMP C flag = [-fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Success
-- Try OpenMP CXX flag = [-fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Success
-- Performing Test OPENMPV4_FOUND
-- Performing Test OPENMPV4_FOUND - Success
-- Found libuuid: /usr/include
-- Found RapidJSON: /work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/dependencies/RapidJSON/include
-- Found htslib: /work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/dependencies/htslib
-- Found ZLIB: /usr/lib64/libz.so (found version "1.2.7")
-- Found OpenSSL: /usr/lib64/libssl.so;/usr/lib64/libcrypto.so (found version "1.0.2k")
-- Found libdbi: /usr/local/include
-- Performing Test LIBDBI_TEST_PROGRAM_COMPILES
-- Performing Test LIBDBI_TEST_PROGRAM_COMPILES - Success
-- Found Protobuf: /usr/include
-- Looking for include file pthread.h
-- Looking for include file pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - found
-- Found Threads: TRUE
-- Found PROTOBUF: /usr/lib64/libprotobuf.so
-- Found libcsv: /usr/include
-- Found GTest: /usr/lib64/libgtest.so
-- Configuring done
-- Generating done
-- Build files have been written to: /work/Software/Download/Variant_Package/GenomicsDB/build

$ make -j8

Scanning dependencies of target TileDB
Scanning dependencies of target htslib
Scanning dependencies of target PROTOBUF_GENERATED_CXX_TARGET
[ 1%] [ 2%] [ 4%] Creating directories for 'htslib'
Creating directories for 'TileDB'
Running C++ protocol buffer compiler on genomicsdb_coordinates.proto
[ 5%] [ 7%] Running C++ protocol buffer compiler on genomicsdb_callsets_mapping.proto
[ 8%] [ 10%] No download step for 'htslib'
Running C++ protocol buffer compiler on genomicsdb_export_config.proto
Running C++ protocol buffer compiler on genomicsdb_import_config.proto
[ 11%] [ 13%] [ 14%] [ 15%] Running C++ protocol buffer compiler on genomicsdb_vid_mapping.proto
No patch step for 'htslib'
No download step for 'TileDB'
Performing update step for 'htslib'
[ 15%] Built target PROTOBUF_GENERATED_CXX_TARGET
[ 17%] [ 18%] No patch step for 'TileDB'
No update step for 'TileDB'
[ 20%] Performing configure step for 'TileDB'
-- The C compiler identification is GNU 7.2.1
-- The CXX compiler identification is GNU 7.2.1
-- Check for working C compiler: /opt/rh/devtoolset-7/root/usr/bin/cc
-- Check for working C compiler: /opt/rh/devtoolset-7/root/usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: /opt/rh/devtoolset-7/root/usr/bin/c++
-- Check for working CXX compiler: /opt/rh/devtoolset-7/root/usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Found ZLIB: /usr/lib64/libz.so (found version "1.2.7")
-- Found libuuid: /usr/include
-- Found OpenSSL: /usr/lib64/libssl.so;/usr/lib64/libcrypto.so (found version "1.0.2k")
-- Found GTest: /usr/lib64/libgtest.so
-- Found Doxygen: /usr/bin/doxygen (found version "1.8.5")
-- Try OpenMP C flag = [-fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
make[3]: warning: jobserver unavailable: using -j1. Add `+' to parent make rule.
-- Performing Test OpenMP_FLAG_DETECTED - Success
-- Try OpenMP CXX flag = [-fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Success
-- Found OpenMP: -fopenmp
-- Performing Test CXX_2011_FOUND
-- Performing Test CXX_2011_FOUND - Success
-- Compiler supports C++ 2011.
-- The TileDB library is compiled with verbosity.
-- Configuring done
-- Generating done
-- Build files have been written to: /work/Software/Download/Variant_Package/GenomicsDB/build/TileDB-prefix/src/TileDB-build
[ 21%] Performing build step for 'TileDB'
Scanning dependencies of target TILEDB_CORE_OBJECTS
[ 5%] [ 10%] Building CXX object core/CMakeFiles/TILEDB_CORE_OBJECTS.dir/src/array/array_iterator.cc.o
Building CXX object core/CMakeFiles/TILEDB_CORE_OBJECTS.dir/src/array/array.cc.o
[ 15%] [ 21%] [ 26%] Building CXX object core/CMakeFiles/TILEDB_CORE_OBJECTS.dir/src/array/array_sorted_write_state.cc.o
Building CXX object core/CMakeFiles/TILEDB_CORE_OBJECTS.dir/src/array/array_schema.cc.o
Building CXX object core/CMakeFiles/TILEDB_CORE_OBJECTS.dir/src/array/array_sorted_read_state.cc.o
[ 31%] [ 36%] Building CXX object core/CMakeFiles/TILEDB_CORE_OBJECTS.dir/src/array/array_read_state.cc.o
Building CXX object core/CMakeFiles/TILEDB_CORE_OBJECTS.dir/src/storage_manager/storage_manager.cc.o
[ 23%] Performing configure step for 'htslib'
[ 42%] Building CXX object core/CMakeFiles/TILEDB_CORE_OBJECTS.dir/src/storage_manager/storage_manager_config.cc.o
[ 47%] Building CXX object core/CMakeFiles/TILEDB_CORE_OBJECTS.dir/src/fragment/book_keeping.cc.o
checking for gcc... /opt/rh/devtoolset-7/root/usr/bin/cc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... [ 52%] Building CXX object core/CMakeFiles/TILEDB_CORE_OBJECTS.dir/src/fragment/write_state.cc.o
no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether /opt/rh/devtoolset-7/root/usr/bin/cc accepts -g... yes
checking for /opt/rh/devtoolset-7/root/usr/bin/cc option to accept ISO C89... none needed
checking for ranlib... /opt/rh/devtoolset-7/root/usr/bin/ranlib
checking for pkg-config... /usr/bin/pkg-config
checking pkg-config is at least version 0.9.0... yes
checking for special C compiler options needed for large files... no
checking for _FILE_OFFSET_BITS value needed for large files... [ 57%] no
checking for _LARGEFILE_SOURCE value needed for large files... Building CXX object core/CMakeFiles/TILEDB_CORE_OBJECTS.dir/src/fragment/read_state.cc.o
no
[ 63%] Building CXX object core/CMakeFiles/TILEDB_CORE_OBJECTS.dir/src/fragment/fragment.cc.o
checking shared library type for unknown-Linux... plain .so
checking how to run the C preprocessor... /opt/rh/devtoolset-7/root/usr/bin/cc -E
checking for grep that handles long lines and -e... /usr/bin/grep
checking for egrep... /usr/bin/grep -E
checking for ANSI C header files... [ 68%] Building CXX object core/CMakeFiles/TILEDB_CORE_OBJECTS.dir/src/misc/hilbert_curve.cc.o
[ 73%] Building CXX object core/CMakeFiles/TILEDB_CORE_OBJECTS.dir/src/misc/progress_bar.cc.o
[ 78%] Building CXX object core/CMakeFiles/TILEDB_CORE_OBJECTS.dir/src/misc/utils.cc.o
[ 84%] Building CXX object core/CMakeFiles/TILEDB_CORE_OBJECTS.dir/src/metadata/metadata.cc.o
yes
[ 89%] checking for sys/types.h... Building CXX object core/CMakeFiles/TILEDB_CORE_OBJECTS.dir/src/metadata/metadata_iterator.cc.o
[ 94%] Building CXX object core/CMakeFiles/TILEDB_CORE_OBJECTS.dir/src/expressions/expression.cc.o
[100%] yes
Building CXX object core/CMakeFiles/TILEDB_CORE_OBJECTS.dir/src/c_api/tiledb.cc.o
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking for stdlib.h... (cached) yes
[100%] Built target TILEDB_CORE_OBJECTS
checking for unistd.h... (cached) yes
Scanning dependencies of target tiledb_shared
Scanning dependencies of target tiledb_static
Linking CXX shared library libtiledb.so
Linking CXX static library libtiledb.a
checking for sys/param.h... [100%] Built target tiledb_static
[100%] yes
Built target tiledb_shared
checking for getpagesize... [ 24%] Performing install step for 'TileDB'
yes
checking for working mmap... [100%] Built target TILEDB_CORE_OBJECTS
[100%] yes
[100%] Built target tiledb_shared
Built target tiledb_static
checking for gmtime_r... Install the project...
-- Install configuration: "Release"
-- Installing: /work/Software/GenomicsDB-0.9.2/lib/libtiledb.a
-- Installing: /work/Software/GenomicsDB-0.9.2/lib/libtiledb.so
-- Up-to-date: /work/Software/GenomicsDB-0.9.2/include/tiledb_constants.h
-- Up-to-date: /work/Software/GenomicsDB-0.9.2/include/tiledb.h
[ 26%] Completed 'TileDB'
yes
[ 26%] checking for fsync... Built target TileDB
Scanning dependencies of target GenomicsDB_library_object_files
yes
checking for drand48... yes
checking whether fdatasync is declared... yes
checking for fdatasync... yes
checking for library containing log... -lm
checking for zlib.h... yes
checking for inflate in -lz... [ 27%] [ 28%] Building CXX object src/main/CMakeFiles/GenomicsDB_library_object_files.dir/cpp/src/query_operations/variant_operations.cc.o
Building CXX object src/main/CMakeFiles/GenomicsDB_library_object_files.dir/cpp/src/query_operations/broad_combined_gvcf.cc.o
yes
checking for library containing recv... [ 30%] Building CXX object src/main/CMakeFiles/GenomicsDB_library_object_files.dir/cpp/src/genomicsdb/variant_cell.cc.o
[ 31%] Building CXX object src/main/CMakeFiles/GenomicsDB_library_object_files.dir/cpp/src/genomicsdb/variant_storage_manager.cc.o
[ 33%] [ 34%] Building CXX object src/main/CMakeFiles/GenomicsDB_library_object_files.dir/cpp/src/genomicsdb/variant_field_data.cc.o
Building CXX object src/main/CMakeFiles/GenomicsDB_library_object_files.dir/cpp/src/genomicsdb/variant_array_schema.cc.o
[ 36%] none required
configure: WARNING: GCS support not enabled: requires libcurl support
configure: WARNING: S3 support not enabled: requires libcurl support
Building CXX object src/main/CMakeFiles/GenomicsDB_library_object_files.dir/cpp/src/genomicsdb/variant_field_handler.cc.o
checking whether PTHREAD_MUTEX_RECURSIVE is declared... yes
configure: creating ./config.status
config.status: creating config.mk
[ 37%] Building CXX object src/main/CMakeFiles/GenomicsDB_library_object_files.dir/cpp/src/genomicsdb/variant.cc.o
[ 39%] Building CXX object src/main/CMakeFiles/GenomicsDB_library_object_files.dir/cpp/src/genomicsdb/variant_query_config.cc.o
config.status: creating htslib.pc.tmp
config.status: creating config.h
[ 40%] Performing build step for 'htslib'
[ 42%] Building CXX object src/main/CMakeFiles/GenomicsDB_library_object_files.dir/cpp/src/genomicsdb/query_variants.cc.o
[ 43%] Building CXX object src/main/CMakeFiles/GenomicsDB_library_object_files.dir/cpp/src/genomicsdb/genomicsdb_columnar_field.cc.o
[ 44%] Building CXX object src/main/CMakeFiles/GenomicsDB_library_object_files.dir/cpp/src/genomicsdb/genomicsdb_iterators.cc.o
[ 46%] Building CXX object src/main/CMakeFiles/GenomicsDB_library_object_files.dir/cpp/src/genomicsdb/genomicsdb_multid_vector_field.cc.o
[ 47%] Building CXX object src/main/CMakeFiles/GenomicsDB_library_object_files.dir/cpp/src/loader/tiledb_loader_text_file.cc.o
[ 49%] Building CXX object src/main/CMakeFiles/GenomicsDB_library_object_files.dir/cpp/src/loader/load_operators.cc.o
[ 50%] Building CXX object src/main/CMakeFiles/GenomicsDB_library_object_files.dir/cpp/src/loader/genomicsdb_importer.cc.o
[ 52%] Building CXX object src/main/CMakeFiles/GenomicsDB_library_object_files.dir/cpp/src/loader/tiledb_loader_file_base.cc.o
[ 53%] Building CXX object src/main/CMakeFiles/GenomicsDB_library_object_files.dir/cpp/src/loader/tiledb_loader.cc.o
[ 55%] Building CXX object src/main/CMakeFiles/GenomicsDB_library_object_files.dir/cpp/src/utils/command_line.cc.o
[ 56%] Building CXX object src/main/CMakeFiles/GenomicsDB_library_object_files.dir/cpp/src/utils/memory_measure.cc.o
[ 57%] Building CXX object src/main/CMakeFiles/GenomicsDB_library_object_files.dir/cpp/src/utils/histogram.cc.o
[ 59%] Building CXX object src/main/CMakeFiles/GenomicsDB_library_object_files.dir/cpp/src/utils/json_config.cc.o
[ 60%] Building CXX object src/main/CMakeFiles/GenomicsDB_library_object_files.dir/cpp/src/utils/vid_mapper_pb.cc.o
[ 62%] Building CXX object src/main/CMakeFiles/GenomicsDB_library_object_files.dir/cpp/src/utils/lut.cc.o
[ 63%] Building CXX object src/main/CMakeFiles/GenomicsDB_library_object_files.dir/cpp/src/utils/known_field_info.cc.o
[ 65%] Building CXX object src/main/CMakeFiles/GenomicsDB_library_object_files.dir/cpp/src/utils/vid_mapper.cc.o
[ 66%] Building CXX object src/main/CMakeFiles/GenomicsDB_library_object_files.dir/cpp/src/utils/vid_mapper_sql.cc.o
[ 68%] Building CXX object src/main/CMakeFiles/GenomicsDB_library_object_files.dir/cpp/src/utils/timer.cc.o
/work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/src/utils/vid_mapper_sql.cc: In member function ‘int SQLBasedVidMapper::load_field_info()’:
/work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/src/utils/vid_mapper_sql.cc:180:11: error: ‘class FieldInfo’ has no member named ‘m_type_index’
ref.m_type_index = (*iter).second;
^~~~~~~~~~~~
/work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/src/utils/vid_mapper_sql.cc:186:11: error: ‘class FieldInfo’ has no member named ‘m_bcf_ht_type’; did you mean ‘m_vcf_type’?
ref.m_bcf_ht_type = (*iter).second;
^~~~~~~~~~~~~
m_vcf_type
/work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/src/utils/vid_mapper_sql.cc:206:35: error: no match for ‘operator=’ (operand types are ‘FieldLengthDescriptor’ and ‘unsigned int’)
ref.m_length_descriptor = length_descriptor;
^~~~~~~~~~~~~~~~~
In file included from /work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/include/utils/vid_mapper_sql.h:26:0,
from /work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/src/utils/vid_mapper_sql.cc:24:
/work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/include/utils/vid_mapper.h:175:7: note: candidate: FieldLengthDescriptor& FieldLengthDescriptor::operator=(const FieldLengthDescriptor&)
class FieldLengthDescriptor
^~~~~~~~~~~~~~~~~~~~~
/work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/include/utils/vid_mapper.h:175:7: note: no known conversion for argument 1 from ‘unsigned int’ to ‘const FieldLengthDescriptor&’
/work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/include/utils/vid_mapper.h:175:7: note: candidate: FieldLengthDescriptor& FieldLengthDescriptor::operator=(FieldLengthDescriptor&&)
/work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/include/utils/vid_mapper.h:175:7: note: no known conversion for argument 1 from ‘unsigned int’ to ‘FieldLengthDescriptor&&’
/work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/src/utils/vid_mapper_sql.cc:208:15: error: ‘class FieldInfo’ has no member named ‘m_num_elements’
ref.m_num_elements = KnownFieldInfo::get_num_elements_for_known_field_enum(known_field_enum, 0u, 0u);
^~~~~~~~~~~~~~
/work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/src/utils/vid_mapper_sql.cc:211:13: error: ‘class FieldInfo’ has no member named ‘m_num_elements’
ref.m_num_elements = 1;
^~~~~~~~~~~~~~
In file included from /work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/include/vcf/vcf.h:35:0,
from /work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/include/utils/vid_mapper.h:27,
from /work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/include/utils/vid_mapper_sql.h:26,
from /work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/src/utils/vid_mapper_sql.cc:24:
/work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/dependencies/htslib/htslib/vcf.h:75:22: error: no match for ‘operator=’ (operand types are ‘FieldLengthDescriptor’ and ‘int’)
#define BCF_VL_FIXED 0 // variable length
^
/work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/src/utils/vid_mapper_sql.cc:212:35: note: in expansion of macro ‘BCF_VL_FIXED’
ref.m_length_descriptor = BCF_VL_FIXED;
^~~~~~~~~~~~
In file included from /work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/include/utils/vid_mapper_sql.h:26:0,
from /work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/src/utils/vid_mapper_sql.cc:24:
/work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/include/utils/vid_mapper.h:175:7: note: candidate: FieldLengthDescriptor& FieldLengthDescriptor::operator=(const FieldLengthDescriptor&)
class FieldLengthDescriptor
^~~~~~~~~~~~~~~~~~~~~
/work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/include/utils/vid_mapper.h:175:7: note: no known conversion for argument 1 from ‘int’ to ‘const FieldLengthDescriptor&’
/work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/include/utils/vid_mapper.h:175:7: note: candidate: FieldLengthDescriptor& FieldLengthDescriptor::operator=(FieldLengthDescriptor&&)
/work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/include/utils/vid_mapper.h:175:7: note: no known conversion for argument 1 from ‘int’ to ‘FieldLengthDescriptor&&’
In file included from /work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/include/vcf/vcf.h:35:0,
from /work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/include/utils/vid_mapper.h:27,
from /work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/include/utils/vid_mapper_sql.h:26,
from /work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/src/utils/vid_mapper_sql.cc:24:
/work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/dependencies/htslib/htslib/vcf.h:75:22: error: no match for ‘operator=’ (operand types are ‘FieldLengthDescriptor’ and ‘int’)
#define BCF_VL_FIXED 0 // variable length
^
/work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/src/utils/vid_mapper_sql.cc:216:35: note: in expansion of macro ‘BCF_VL_FIXED’
ref.m_length_descriptor = BCF_VL_FIXED;
^~~~~~~~~~~~
In file included from /work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/include/utils/vid_mapper_sql.h:26:0,
from /work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/src/utils/vid_mapper_sql.cc:24:
/work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/include/utils/vid_mapper.h:175:7: note: candidate: FieldLengthDescriptor& FieldLengthDescriptor::operator=(const FieldLengthDescriptor&)
class FieldLengthDescriptor
^~~~~~~~~~~~~~~~~~~~~
/work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/include/utils/vid_mapper.h:175:7: note: no known conversion for argument 1 from ‘int’ to ‘const FieldLengthDescriptor&’
/work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/include/utils/vid_mapper.h:175:7: note: candidate: FieldLengthDescriptor& FieldLengthDescriptor::operator=(FieldLengthDescriptor&&)
/work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/include/utils/vid_mapper.h:175:7: note: no known conversion for argument 1 from ‘int’ to ‘FieldLengthDescriptor&&’
/work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/src/utils/vid_mapper_sql.cc:217:13: error: ‘class FieldInfo’ has no member named ‘m_num_elements’
ref.m_num_elements = dbi_result_get_int(result, DBTABLE_FIELD_COLUMN_LENVAL.c_str());
^~~~~~~~~~~~~~
In file included from /work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/include/vcf/vcf.h:35:0,
from /work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/include/utils/vid_mapper.h:27,
from /work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/include/utils/vid_mapper_sql.h:26,
from /work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/src/utils/vid_mapper_sql.cc:24:
/work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/dependencies/htslib/htslib/vcf.h:77:22: error: no match for ‘operator=’ (operand types are ‘FieldLengthDescriptor’ and ‘int’)
#define BCF_VL_A 2
^
/work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/src/utils/vid_mapper_sql.cc:220:39: note: in expansion of macro ‘BCF_VL_A’
ref.m_length_descriptor = BCF_VL_A;
^~~~~~~~
In file included from /work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/include/utils/vid_mapper_sql.h:26:0,
from /work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/src/utils/vid_mapper_sql.cc:24:
/work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/include/utils/vid_mapper.h:175:7: note: candidate: FieldLengthDescriptor& FieldLengthDescriptor::operator=(const FieldLengthDescriptor&)
class FieldLengthDescriptor
^~~~~~~~~~~~~~~~~~~~~
/work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/include/utils/vid_mapper.h:175:7: note: no known conversion for argument 1 from ‘int’ to ‘const FieldLengthDescriptor&’
/work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/include/utils/vid_mapper.h:175:7: note: candidate: FieldLengthDescriptor& FieldLengthDescriptor::operator=(FieldLengthDescriptor&&)
/work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/include/utils/vid_mapper.h:175:7: note: no known conversion for argument 1 from ‘int’ to ‘FieldLengthDescriptor&&’
In file included from /work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/include/vcf/vcf.h:35:0,
from /work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/include/utils/vid_mapper.h:27,
from /work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/include/utils/vid_mapper_sql.h:26,
from /work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/src/utils/vid_mapper_sql.cc:24:
/work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/dependencies/htslib/htslib/vcf.h:78:22: error: no match for ‘operator=’ (operand types are ‘FieldLengthDescriptor’ and ‘int’)
#define BCF_VL_G 3
^
/work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/src/utils/vid_mapper_sql.cc:222:39: note: in expansion of macro ‘BCF_VL_G’
ref.m_length_descriptor = BCF_VL_G;
^~~~~~~~
In file included from /work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/include/utils/vid_mapper_sql.h:26:0,
from /work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/src/utils/vid_mapper_sql.cc:24:
/work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/include/utils/vid_mapper.h:175:7: note: candidate: FieldLengthDescriptor& FieldLengthDescriptor::operator=(const FieldLengthDescriptor&)
class FieldLengthDescriptor
^~~~~~~~~~~~~~~~~~~~~
/work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/include/utils/vid_mapper.h:175:7: note: no known conversion for argument 1 from ‘int’ to ‘const FieldLengthDescriptor&’
/work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/include/utils/vid_mapper.h:175:7: note: candidate: FieldLengthDescriptor& FieldLengthDescriptor::operator=(FieldLengthDescriptor&&)
/work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/include/utils/vid_mapper.h:175:7: note: no known conversion for argument 1 from ‘int’ to ‘FieldLengthDescriptor&&’
In file included from /work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/include/vcf/vcf.h:35:0,
from /work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/include/utils/vid_mapper.h:27,
from /work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/include/utils/vid_mapper_sql.h:26,
from /work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/src/utils/vid_mapper_sql.cc:24:
/work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/dependencies/htslib/htslib/vcf.h:79:22: error: no match for ‘operator=’ (operand types are ‘FieldLengthDescriptor’ and ‘int’)
#define BCF_VL_R 4
^
/work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/src/utils/vid_mapper_sql.cc:224:39: note: in expansion of macro ‘BCF_VL_R’
ref.m_length_descriptor = BCF_VL_R;
^~~~~~~~
In file included from /work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/include/utils/vid_mapper_sql.h:26:0,
from /work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/src/utils/vid_mapper_sql.cc:24:
/work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/include/utils/vid_mapper.h:175:7: note: candidate: FieldLengthDescriptor& FieldLengthDescriptor::operator=(const FieldLengthDescriptor&)
class FieldLengthDescriptor
^~~~~~~~~~~~~~~~~~~~~
/work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/include/utils/vid_mapper.h:175:7: note: no known conversion for argument 1 from ‘int’ to ‘const FieldLengthDescriptor&’
/work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/include/utils/vid_mapper.h:175:7: note: candidate: FieldLengthDescriptor& FieldLengthDescriptor::operator=(FieldLengthDescriptor&&)
/work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/include/utils/vid_mapper.h:175:7: note: no known conversion for argument 1 from ‘int’ to ‘FieldLengthDescriptor&&’
In file included from /work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/include/vcf/vcf.h:35:0,
from /work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/include/utils/vid_mapper.h:27,
from /work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/include/utils/vid_mapper_sql.h:26,
from /work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/src/utils/vid_mapper_sql.cc:24:
/work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/dependencies/htslib/htslib/vcf.h:76:22: error: no match for ‘operator=’ (operand types are ‘FieldLengthDescriptor’ and ‘int’)
#define BCF_VL_VAR 1
^
/work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/src/utils/vid_mapper_sql.cc:227:41: note: in expansion of macro ‘BCF_VL_VAR’
ref.m_length_descriptor = BCF_VL_VAR;
^~~~~~~~~~
In file included from /work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/include/utils/vid_mapper_sql.h:26:0,
from /work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/src/utils/vid_mapper_sql.cc:24:
/work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/include/utils/vid_mapper.h:175:7: note: candidate: FieldLengthDescriptor& FieldLengthDescriptor::operator=(const FieldLengthDescriptor&)
class FieldLengthDescriptor
^~~~~~~~~~~~~~~~~~~~~
/work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/include/utils/vid_mapper.h:175:7: note: no known conversion for argument 1 from ‘int’ to ‘const FieldLengthDescriptor&’
/work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/include/utils/vid_mapper.h:175:7: note: candidate: FieldLengthDescriptor& FieldLengthDescriptor::operator=(FieldLengthDescriptor&&)
/work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/include/utils/vid_mapper.h:175:7: note: no known conversion for argument 1 from ‘int’ to ‘FieldLengthDescriptor&&’
/work/Software/Download/Variant_Package/GenomicsDB/GenomicsDB/src/main/cpp/src/utils/vid_mapper_sql.cc:249:148: error: no match for ‘operator!=’ (operand types are ‘FieldLengthDescriptor’ and ‘int’)
if ((ref.m_VCF_field_combine_operation == VCFFieldCombineOperationEnum::VCF_FIELD_COMBINE_OPERATION_CONCATENATE) && (ref.m_length_descriptor != BCF_VL_VAR)) {
[ 69%] Building CXX object src/main/CMakeFiles/GenomicsDB_library_object_files.dir/cpp/src/vcf/vcf_adapter.cc.o
make[2]: *** [src/main/CMakeFiles/GenomicsDB_library_object_files.dir/cpp/src/utils/vid_mapper_sql.cc.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [src/main/CMakeFiles/GenomicsDB_library_object_files.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
[ 71%] No install step for 'htslib'
[ 72%] Completed 'htslib'
[ 72%] Built target htslib
make: *** [all] Error 2

How could I resolve this error?

vcf2tiledb: src/loader/load_operators.cc:303: virtual void LoaderCombinedGVCFOperator::operate(const void*): Assertion `end_value >= m_partition.first && column_value <= m_partition.second' failed.

Hi,
I'm trying to run vcf2tiledb on 2900 samples with around 703637439 variants per sample.
I'm executing the job using openmpi 1.8.4 through LSF scheduler using 1000 cores. the data is stored in GPFS so nothing happening at the node level.
I created a loader config file with 1000 partitions (cf json file attached) and I set "num_parallel_vcf_files": 1000
I get the following error after 90 minutes of execution : vcf2tiledb: src/loader/load_operators.cc:303: virtual void LoaderCombinedGVCFOperator::operate(const void_): Assertion `end_value >= m_partition.first && column_value <= m_partition.second' failed.
There's data in the vcf.gz file (420M)
I'm did a test run with 22 vcf files,199 cores , 199 partitions and "num_parallel_vcf_files": 22 and that is running fie for 2 days now and file size is 37G
Could you please help troubleshouting this error and advise what would be the best setting for the _number of cores* to be used, num_parallel_vcf_files, number of partitions based on the number of vcf files to combine and number of variants ?
Thanks in advance for your help
Best,
Ramzi

output.778999.genomeDB.txt
errors.778999.genomeDB.txt
loader_config_file_20160606.json.txt

alignment store in GenomicDB

Is there any thought on storing the alignment data into tile DB similar to the Pileup format:
http://samtools.sourceforge.net/pileup.shtml

The alignment information will be much more dense than variant calls.

Cannot find GenomicsDB maven package that supports spark-core_2.11

I am looking here for the genomicsDB maven packages, and the only one I can find depends on spark-core_2.10. Is there another build somewhere that depends on spark-core_2.11 instead?

vcf2tiledb mpirun fails

I have tried running vcf2tiledb mpirun with command shown below on numerous occasions. I have tried using entire comet node, which has memeory of 128 GB of RAM, however, the run fails after an hour or so. I do have the partial results. Please advice what to do. Even if mpirun were to complete, its performance does not scale at all.
I have been able to run vcf2tiledb without mpirun on a single thread and a single partition. It completes most of the time, but I should add that the successful completion depends on the right choice of "size_per_column_partition".
Running with the rank (partition index) on 4-partitions is not a complete success either because 1 or so partition fails while others complete. I should add that running with rank is extremely slow. Way slower than single thread run on a single partition.

mpirun -n 4 /share/apps/compute/genomicsdb/bin/vcf2tiledb /oasis/scratch/comet/mgujral/temp_project/tmpDellProject/loaderNew.json

Here is the error message

intel/2015.2.164(64):ERROR:105: Unable to locate a modulefile for 'openmpi_ib'
[TileDB::StorageManager] Error: Cannot load array schema; Array '/oasis/scratch/comet/mgujral/temp_project/GenomicsDB_WorkspaceTest/test0' does not exist.
[TileDB::StorageManager] Error: Cannot load array schema; Array '/oasis/scratch/comet/mgujral/temp_project/GenomicsDB_WorkspaceTest/test3' does not exist.
[TileDB::StorageManager] Error: Cannot load array schema; Array '/oasis/scratch/comet/mgujral/temp_project/GenomicsDB_WorkspaceTest/test1' does not exist.
[TileDB::StorageManager] Error: Cannot load array schema; Array '/oasis/scratch/comet/mgujral/temp_project/GenomicsDB_WorkspaceTest/test2' does not exist.
[comet-10-49.sdsc.edu:mpi_rank_3][error_sighandler] Caught error: Segmentation fault (signal 11)

Query based on other column

I am wondering would it possible to query the data based on the other columns rather than chromosomal coordinates? For example based on the GT or BaseQRankSum.

Segfault when working with VCF with multiple contigs

Hi, I turned this VCF into a TileDB array and then tried to query that array, but it resulted in a segfault. This VCF is different from previous VCFs I have used in that it has multiple contigs. The JSON files that go along with it are in the same directory as the VCF in my forked hail repo. Is there anything that should be done differently when working with VCFs with multiple contigs? Thanks

Multiple random genotypes generated if GT is last field in vid_mapping.json fields section for GenomicsDB 0.9.0 vcf2tiledb

When loading a VCF file into a new GenomicsDB instance using vcf2tiledb, if the fields portion of the vid_mapping.json contains the GT definition as the last item a large number of apparently random genotypes are assigned instead of the correct genotypes, e.g.: if the fields block of the JSON ends:

...
"GQ": {
"type": "int",
"vcf_field_class": [
"FORMAT"
]
},
"GT": {
"length": "PP",
"type": "int",
"vcf_field_class": [
"FORMAT"
]
}
}

then a query with gt_mpi_gather returns results similar to:

"GT": [ 0, 0, 256, 6619136, 0, 327680, 0, 960233472, 0, 5701632, 0, 960233472, 0, 65536, 21168128, 1409286144, 1114636288, 1, -1, 39, 8, 47, 1, 814, 0, 0, 256, 88145920, 2, 0, 0, 0, 0 (... and on for several hundred random values)

for a single genotype entry. However, as long as the GT field isn't last in the fields block:

...
"GT": {
"length": "PP",
"type": "int",
"vcf_field_class": [
"FORMAT"
]
},
"GQ": {
"type": "int",
"vcf_field_class": [
"FORMAT"
]
}
}
the load proceeds normally and returns correct genotypes. The wiki documentation seems to imply the ordering of fields within the block should not matter, so this seems as if this is unintended behavior.

Similarity search

Hi,

I was wondering if it's possible to perform a similarity search using TileDB/GenomicsDB. For instance, consider aa, ab, and bb as the keys stored in TileDB; if I query ba, which is not a stored key in the database, I would expect TileDB to return ba (with higher similarity rank) and aa/bb (with lower rank).

Does TileDB/GenomicsDB come with such a feature?

Help with a Simple Query

Hi there,

I seem to be doing something wrong. I have a genomicsdb workspace created by @jackgoldsmith4 with the 0.6.4 release of Genomics DB. The source of this genomics db array is a small, multi-sample VCF file.

I believe I've created a minimal example repository that I expect to print the lines of the original VCF, but instead I see no output. To reproduce:

git clone https://github.com/danking/minimal-genomicsdb-example.git
download https://storage.googleapis.com/gnomad-resources/Homo_sapiens_assembly19.fasta into src/test/resources
from the git repository root, execute sbt and then execute run from the sbt prompt.

From debugging the program, it appears that the header lines are read (but perhaps this is because I include the header lines as sample2header.vcf?). I don't see any variant lines in the iterator. Is there a simpler query I can try? I'm not sure how to gain more understanding of what might be going wrong.

Remove or supress "Buffer resized" message

In GenomicsDB importer we're seeing constant spam of:

Buffer resized from 327680bytes to 327681
Buffer resized from 327679bytes to 327681
Buffer resized from 327678bytes to 327681
...

Could this message be removed? Alternatively, a better long term solution would be to provide a JNI hook to enable/disable log messages or set a verbosity level.

error: ‘const class google::protobuf::RepeatedPtrField<SampleIDToTileDBIDMap>’ has no member named ‘cbegin’

Hi,

I am trying to install on CentOS 6.9 w/GCC 4.9.2 (devtoolset-3) following instructions on GenomicsDB page. I have installed protobuf 3.0.x manually as documented and have included path to this installation in my cmake command. I have also commented out 6 lines in CMakeLists.txt file referring to libdbi tests (issue #196). I am now receiving the following fatal error from make command:

$ make -j4
Scanning dependencies of target htslib
Scanning dependencies of target TileDB
Scanning dependencies of target PROTOBUF_GENERATED_CXX_TARGET
[ 1%] [ 3%] [ 6%] [ 6%] Creating directories for 'htslib'
<...>
[ 63%] Building CXX object src/main/CMakeFiles/GenomicsDB_library_object_files.dir/cpp/src/utils/memory_measure.cc.o
[ 64%] Building CXX object src/main/CMakeFiles/GenomicsDB_library_object_files.dir/cpp/src/utils/histogram.cc.o
[ 66%] Building CXX object src/main/CMakeFiles/GenomicsDB_library_object_files.dir/cpp/src/utils/json_config.cc.o
[ 67%] Building CXX object src/main/CMakeFiles/GenomicsDB_library_object_files.dir/cpp/src/utils/vid_mapper_pb.cc.o
[ 69%] Building CXX object src/main/CMakeFiles/GenomicsDB_library_object_files.dir/cpp/src/utils/lut.cc.o
[ 70%] Building CXX object src/main/CMakeFiles/GenomicsDB_library_object_files.dir/cpp/src/utils/known_field_info.cc.o
/net/home/ivan/GenomicsDB/src/main/cpp/src/utils/vid_mapper_pb.cc: In member function ‘int ProtoBufBasedVidMapper::parse_callset_protobuf(const CallsetMappingPB*, const std::vector&)’:
/net/home/ivan/GenomicsDB/src/main/cpp/src/utils/vid_mapper_pb.cc:78:40: error: ‘const class google::protobuf::RepeatedPtrField’ has no member named ‘cbegin’
callset_map_protobuf->callsets().cbegin();
^
/net/home/ivan/GenomicsDB/src/main/cpp/src/utils/vid_mapper_pb.cc:80:49: error: ‘const class google::protobuf::RepeatedPtrField’ has no member named ‘cend’
for (; it != callset_map_protobuf->callsets().cend(); ++it) {
^
make[2]: *** [src/main/CMakeFiles/GenomicsDB_library_object_files.dir/cpp/src/utils/vid_mapper_pb.cc.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [src/main/CMakeFiles/GenomicsDB_library_object_files.dir/all] Error 2
make: *** [all] Error 2

Bad floating point parsing in AF field

I created a genomicsDB array using the VCF importer and this VCF. I then used a GenomicsDBFeatureReader to iterate over the VariantContext objects generated from that array, but the numbers I got for the AF INFO field (type = "float") were sometimes slightly different than the original values from the VCF. For example, in the second VariantContext, the original value for the AF field is 0.007252, but the number in the VariantContext object is 0.00725199.

jar from maven fails to load library on Travis-CI

We're unable to load the native library from the genomicsDB jar when running gatk on travis-ci. It's using ubuntu 14.04. We're able to run the same tests on our local macs and redhat machines, so I suspect it's some library incompatibility.

Error logs here and the relevant pull request enabling genomics db support in gatk is broadinstitute/gatk#1975.

Is it possible to provide a jar that will run on ubuntu 14.04? I'm not sure what library is incompatible because the error messages are not very descriptive.

Correct combination operation for length "A" fields?

For fields of length A (AC, for example) what combination operation should be specified to get each element in the array? With the mean combiner, I’m getting back only the first element in the array. For example, at position 16051453 in sample2.vcf, the AC field has values of (540, 26), but I am currently only getting back 540 in the VariantContext.

Two extra zeroes added onto DP INFO field

I created a genomicsDB array using the VCF importer and this VCF. I then used a GenomicsDBFeatureReader to iterate over the VariantContext objects generated from that array, but the DP (INFO) field values in the VariantContext objects each had two extra zeroes on the end compared to the original VCF values. For example, in the first VariantContext, the DP (INFO) field has a value of 148500, but the original VCF value is 1485.

Querying across multiple arrays

Hi,
I am trying to use GenomicsDB to import gVCFs in batches with each batch running on separate compute nodes.
Importing the data into arrays seems to work ok but I am unsure as to how to query across multiple arrays. Do row indexes in the callset files need to be unique for every sample across all arrays or can each array have it's own callset.json?
I have tried using a single callset file that is common to all my arrays with each array containing a subset of the samples 1-10, 11-20 etc. What I find is that when I query an array that has samples starting at row index 0 (something like gt_mpi_gather -j multiquery.json -s 10000 --print-AC) the results come back very quickly. However, if the array has a 1st sample starting > row index 0 there is a long delay before any data is returned, is this expected?

Core Dumped

Hi,
I'm running GenomicsDB 0.9.2 using 9720 files and 5000 partitions and I'm getting the error below in 1000+ jobs where other complete successfully:
terminate called after throwing an instance of 'VCF2BinaryException'
what(): VCF2BinaryException : Could not open file 0200162381_S4_L004.hc.snps.indels.g.vcf.gz : could not load index (VCF/BCF files must be block compressed and indexed)
[hpcgenomicn10:18116] *** Process received signal ***
[hpcgenomicn10:18116] Signal: Aborted (6)
[hpcgenomicn10:18116] Signal code: (-6)
[hpcgenomicn10:18116] [ 0] /lib64/libpthread.so.0(+0xf130)[0x2b1768526130]
[hpcgenomicn10:18116] [ 1] /lib64/libc.so.6(gsignal+0x37)[0x2b176ab355d7]
[hpcgenomicn10:18116] [ 2] /lib64/libc.so.6(abort+0x148)[0x2b176ab36cc8]
[hpcgenomicn10:18116] [ 3] /gpfs/software/tools/gcc-4.9.1/lib64/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x175)[0x2b176a336115]
[hpcgenomicn10:18116] [ 4] /gpfs/software/tools/gcc-4.9.1/lib64/libstdc++.so.6(+0x5e176)[0x2b176a334176]
[hpcgenomicn10:18116] [ 5] /gpfs/software/tools/gcc-4.9.1/lib64/libstdc++.so.6(+0x5e1c1)[0x2b176a3341c1]
[hpcgenomicn10:18116] [ 6] /gpfs/software/tools/gcc-4.9.1/lib64/libstdc++.so.6(+0x5e3d8)[0x2b176a3343d8]
[hpcgenomicn10:18116] [ 7] vcf2tiledb[0x4611d5]
[hpcgenomicn10:18116] [ 8] vcf2tiledb[0x42fbb3]
[hpcgenomicn10:18116] [ 9] vcf2tiledb[0x435b23]
[hpcgenomicn10:18116] [10] vcf2tiledb[0x4398e7]
[hpcgenomicn10:18116] [11] vcf2tiledb[0x43a14b]
[hpcgenomicn10:18116] [12] vcf2tiledb[0x4211e9]
[hpcgenomicn10:18116] [13] /lib64/libc.so.6(__libc_start_main+0xf5)[0x2b176ab21af5]
[hpcgenomicn10:18116] [14] vcf2tiledb[0x424189]
[hpcgenomicn10:18116] *** End of error message ***

I've tested the file (zcat file and running bcftools index --stats) and all seems ok
I removed That file from the list, then the error is coming back with another vcf file.
This is happening only on nodes of a certain queue that have identical settings as the other that are not failing.
Any Idea what might be causing the problem?
Thank you in advance for your help.
Best,
Ramzi

Error while writing to TileDB array

Hello,

I am trying to run GenomicsDBImport from GATK 4.0.11.0 on 850 gvcfs and encountered this error:

15:52:38.351 INFO ProgressMeter - 1:2115904 80.6 1 0.0
15:52:38.351 INFO GenomicsDBImport - Done importing batch 1/17
15:52:41.256 INFO GenomicsDBImport - Importing batch 2 with 50 samples

terminate called after throwing an instance of 'VariantStorageManagerException'

what(): VariantStorageManagerException exception : Error while writing to TileDB array
TileDB error message :

The command I used was:
/apps/gatk/4.0.11.0/gatk GenomicsDBImport --batch-size 50 -L intervals.bed --genomicsdb-workspace-path workspace -V [850 gvcf files]

Do you happen to know what the error was about and how to fix it? Thank you in advance for your help.

Error reporting - report "holes" in row_idx sequence in callsets JSON

Add reference to Dockerfile to wiki?

I made a Dockerfile to help people quickly test this project without having to read all the compilation steps in the wiki. Could be useful to have in the wiki or in the main repo itself?

https://github.com/pearofducks/genomicsDB-docker

[Question] MPI query

Hello,

I'm trying to split up a query between 4 MPI processes:

{
    "workspace" : "/home/centos/data/workspaces/ws1",
    "array" : "test0",
    "query_column_ranges" : [ [ [0, 50000 ] ] ],
    "query_row_ranges" : [ [ [0, 100] ], [ [100, 200] ], [ [200, 300] ], [ [300, 400] ] ],
    "query_attributes" : [ "REF", "ALT" ]
}

This is the loader.config:

{
    "row_based_partitioning" : false,
    "produce_combined_vcf": false,
    "produce_tiledb_array" : true,
    "column_partitions" : [
        {"begin": 0, "workspace":"/home/centos/data/workspaces/ws1", "array": "test0"}
    ],
    "vid_mapping_file" : "/home/centos/data/vcfs/lcsb/vid_mapping.json",
    "callset_mapping_file" : "/home/centos/data/vcfs/lcsb/callset_mapping.json",
    "size_per_column_partition": 1000000,
    "treat_deletions_as_intervals" : true,
    "num_parallel_vcf_files" : 1,
    "delete_and_create_tiledb_array" : true,
    "fail_if_updating": true,
    "vcf_output_format": "z",
    "do_ping_pong_buffering" : true,
    "offload_vcf_output_processing" : true,
    "discard_vcf_index": true,
    "compress_tiledb_array" : true,
    "disable_synced_writes" : true,
    "segment_size" : 1048576,
    "num_cells_per_tile" : 1000,
    "ignore_cells_not_in_partition": false
}

But when I run

mpirun -n 4 ./build/bin/gt_mpi_gather -j query.json -l data/vcfs/lcsb/loader.conf.json

I get these errors:

terminate called after throwing an instance of 'RunConfigException'
  what():  RunConfigException : No column partition/query interval available for process with rank 1
terminate called after throwing an instance of 'RunConfigException'
  what():  RunConfigException : No column partition/query interval available for process with rank 2
terminate called after throwing an instance of 'RunConfigException'
  what():  RunConfigException : No column partition/query interval available for process with rank 3

Am I doing something wrong?

tmp vcf header getting unlinked very early for Cloud URLs

The src/main/cpp/include/config/genomicsdb_config_base.h destructor is causing the tmp file created for Cloud paths to be unlinked very early causing this gatk fork to fail for
SelectVariants with Cloud URLs. e.g.

./gatk SelectVariants -V gendb://hdfs://oda-master:9000/gdb_ws -R ~/vcfs/reference/hs37d5.fa -O o.vcf
...
[E::hts_open_format] Failed to open file /tmp/TileDBuU2lrW
terminate called after throwing an instance of 'VCFAdapterException'
  what():  VCFAdapterException : Could not open template VCF header file /tmp/TileDBuU2lrW

Destructor in genomicsdb_config_base.h

~GenomicsDBConfigBase()
    {
      if(m_is_tmp_vcf_header_filename)
        unlink(m_vcf_header_filename.c_str());
      m_is_tmp_vcf_header_filename = false;
    }

Error reporting

Raise exception if the loader file has both row and column partitions

Problematic frame tiledb_array_iterator_reset_subarray from JNI

There appears to be an issue querying from JNI from most recent version of master:

# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f87275f8224, pid=3381457, tid=0x00007f87735e8700
#
# JRE version: OpenJDK Runtime Environment (8.0_171-b10) (build 1.8.0_171-b10)
# Java VM: OpenJDK 64-Bit Server VM (25.171-b10 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  [libtiledbgenomicsdb1254139083221711517.so+0x252224]  tiledb_array_iterator_reset_subarray+0x4

This is the bit of code that triggers this error:

    @transient val hadoopConf = sc.hadoopConfiguration
    hadoopConf.set(GenomicsDBConfiguration.LOADERJSON, loader_json)
    hadoopConf.set(GenomicsDBConfiguration.MPIHOSTFILE, hostfile)
    hadoopConf.set(GenomicsDBConfiguration.QUERYJSON, queryfile)
    val myrdd: RDD[(String, VariantContext)] =
      sc.newAPIHadoopRDD[String, VariantContext,
      GenomicsDBInputFormat[VariantContext, PositionalBufferedStream]](sc.hadoopConfiguration,
      classOf[GenomicsDBInputFormat[VariantContext, PositionalBufferedStream]],
      classOf[String], classOf[VariantContext])
    myrdd

Confirmed that I do not experience this issue with commit: 6d8f87d

Floating point exception (core dumped)

Hi all,

I'm trying to run GenomicsDB for a test on Ubuntu 14.04.3 LTS. What I'm trying to do is to run chr1 chunk on 4 g.vcf files with 4 threads.
The error I got is

ubuntu@sli-es1:/mnt/SCRATCH/GenomicsDB$ ./bin/vcf2tiledb /mnt/SCRATCH/benchmark/test/load_config.json 
[sli-es1:28444] *** Process received signal ***
[sli-es1:28444] Signal: Floating point exception (8)
[sli-es1:28444] Signal code: Integer divide-by-zero (1)
[sli-es1:28444] Failing at address: 0x417a00
[sli-es1:28444] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x10330) [0x7f0f21df8330]
[sli-es1:28444] [ 1] ./bin/vcf2tiledb() [0x417a00]
[sli-es1:28444] [ 2] ./bin/vcf2tiledb() [0x40af73]
[sli-es1:28444] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7f0f21a44f45]
[sli-es1:28444] [ 4] ./bin/vcf2tiledb() [0x40ca07]
[sli-es1:28444] *** End of error message ***
Floating point exception (core dumped)

I'm not sure where goes wrong. Sorry that I'm not familiar with C++, I tried to google it but it implied sth wrong with dividing by zero or sth else, which confused me even more. Please find the attached files which are the input json and advise me about these.
Thank you so much!

load_config.json.txt
callsets.json.txt
vid.json.txt

test failed: GenomicsDB/tests/run.py not working for test examples

Dear Team,
The test examples are failed during execution of run.py.
Also the wiki page https://github.com/Intel-HLS/GenomicsDB/wiki/Importing-VCF-data-into-GenomicsDB for importing VCF is not good for examples purpose as the required files and folders are not available to found to recreate test.

The output of example run.py is given below:-
[rakesh@openlap tests]$ pwd
/home/rakesh/Documents/Git_Repositories/GenomicsDB_Test/GenomicsDB/tests
[rakesh@openlap tests]$ ./run.py
Deleting all .da files in ../ and subdirectories
Done.
GENOMICSDB_TIMER,Fetch from VCF,Wall-clock time(s),0.007005,Cpu time(s),0.00700724,Critical path wall-clock time(s),0.006993,Cpu time(s),0.00699289,#critical path,1
GENOMICSDB_TIMER,Combining cells,Wall-clock time(s),0.000939,Cpu time(s),0.000942653,Critical path wall-clock time(s),1.9e-05,Cpu time(s),2.2444e-05,#critical path,3
GENOMICSDB_TIMER,Flush output,Wall-clock time(s),0,Cpu time(s),0,Critical path wall-clock time(s),0,Cpu time(s),0,#critical path,0
GENOMICSDB_TIMER,Sections time,Wall-clock time(s),0.008002,Cpu time(s),0.00800081,Critical path wall-clock time(s),0,Cpu time(s),0,#critical path,0
GENOMICSDB_TIMER,Time in single thread phase(),Wall-clock time(s),4.1e-05,Cpu time(s),3.8718e-05,Critical path wall-clock time(s),0,Cpu time(s),0,#critical path,0
GENOMICSDB_TIMER,Time in read_all(),Wall-clock time(s),0.008097,Cpu time(s),0.00809743,Critical path wall-clock time(s),0,Cpu time(s),0,#critical path,0
#processes 1
Host : openlap rank 0 LD_LIBRARY_PATH= :/usr/lib64/mpich/lib:/usr/local/lib/:/usr/lib64/mpich/lib
#processes 1
Host : openlap rank 0 LD_LIBRARY_PATH= :/usr/lib64/mpich/lib:/usr/local/lib/:/usr/lib64/mpich/lib
#processes 1
Host : openlap rank 0 LD_LIBRARY_PATH= :/usr/lib64/mpich/lib:/usr/local/lib/:/usr/lib64/mpich/lib
GENOMICSDB_TIMER,Total scan_and_produce_Broad_GVCF time for rank 0,Wall-clock time(s),0.002883,Cpu time(s),0.00287919
#processes 1
Host : openlap rank 0 LD_LIBRARY_PATH= :/usr/lib64/mpich/lib:/usr/local/lib/:/usr/lib64/mpich/lib
WARNING: page size is ignored except for scan now
GENOMICSDB_TIMER,Total scan_and_produce_Broad_GVCF time for rank 0,Wall-clock time(s),0.002703,Cpu time(s),0.00270366
Error: Could not find or load main class TestGenomicsDB
Query test: t0_1_2-java_vcf failed

Please help.

Option to skip 0/0 GT

I'd expect most variants for a sample in a large joint VCF file (100+ samples) to have 0/0 genotype. For thousands of samples in a database loaded from joint VCFs, most of cells will have 0/0 GT. It would be nice to have an option:

- to skip 0/0 GT variants during import
- to omit 0/0 GT variants in query results

This would enhance performance and usability greatly.

thanks,
dmitry

question on updates regarding spanning deletions

Hi, I'm reading https://github.com/Intel-HLS/GenomicsDB/wiki/Overlapping-variant-calls-in-a-sample from November 2017 and see that the datastore requires truncating deletions.

Is this still the case? There is a variant representation convention called spanning deletions and I am wondering if the datastore uses this at all.

Also, is there a hands-on tutorial for using GenomicsDB?

Thanks.

Default combination operation for DP INFO field produces different value

On an earlier issue, you said that the default combination operation for the DP INFO field is "sum". When I specify the combination operation as "sum" in the vid JSON file, I get the correct DP value. However, when I don't specify any combination operation for the DP field, I get a value that is 100x the corresponding correct value in the VCF. This suggests that something could be wrong with the default combiner for DP.

reference genome file in query.json

Hi, I am wondering how the .fasta reference genome file is used in genoimcsDB. It is a large file, and it is set as a mandatory parameter in the docs. However, my tests pass without this file, and all I get is a warning that it could not be opened. What is the purpose for this file in genomicsDB?

Cmake error: cannot find MPI on macOS

Installed all the requirements to the letter and running on problems building the application on macOS 10.13.2

cmake ../ -DCMAKE_FIND_FRAMEWORK=LAST -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/usr/local/bin \
    -DMPI_CC_COMPILER=/usr/local/opt/mpich/bin/mpicc -DMPI_CXX_COMPILER=/usr/local/opt/mpich/bin/mpicxx \
    -DOPENSSL_PREFIX_DIR=/usr/local/opt/openssl
-- The C compiler identification is AppleClang 9.1.0.9020039
-- The CXX compiler identification is AppleClang 9.1.0.9020039
-- Check for working C compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc
-- Check for working C compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++
-- Check for working CXX compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Performing Test test_cpp_2011
-- Performing Test test_cpp_2011 - Success
-- Could NOT find MPI_C (missing: MPI_C_WORKS)
-- Could NOT find MPI_CXX (missing: MPI_CXX_WORKS)
CMake Error at /usr/local/Cellar/cmake/3.11.1/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:137 (message):
  Could NOT find MPI (missing: MPI_C_FOUND MPI_CXX_FOUND)
Call Stack (most recent call first):
  /usr/local/Cellar/cmake/3.11.1/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:378 (_FPHSA_FAILURE_MESSAGE)
  /usr/local/Cellar/cmake/3.11.1/share/cmake/Modules/FindMPI.cmake:1663 (find_package_handle_standard_args)
  CMakeLists.txt:78 (find_package)

mpicc and mpicxx seem fine

 master ⮀ mpicc -v
mpicc for MPICH version 3.2.1
Apple LLVM version 9.1.0 (clang-902.0.39.1)
Target: x86_64-apple-darwin17.3.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
clang: warning: argument unused during compilation: '-I /usr/local/Cellar/mpich/3.2.1_1/include' [-Wunused-command-line-argument]

 master ⮀ mpicxx -v
mpicxx for MPICH version 3.2.1
Apple LLVM version 9.1.0 (clang-902.0.39.1)
Target: x86_64-apple-darwin17.3.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
clang: warning: argument unused during compilation: '-I /usr/local/Cellar/mpich/3.2.1_1/include' [-Wunused-command-line-argument]

Any help appreciated.

m

unable to build new cpp gtests

We get a long list of syntax issues as shown below. The rest of the sources build fine. For now, we have reverted to to build 8a0353e, before the tests were brought to master.

Building CXX object src/test/cpp/src/CMakeFiles/runAllGTests.dir/test_mapping_data_loader.cc.o
In file included from /nfs/home/mcwrinn/GenomicsDB/src/test/cpp/src/test_mapping_data_loader.cc:23:0:
/nfs/home/mcwrinn/GenomicsDB/src/test/cpp/include/test_mapping_data_loader.h:39:17: error: ‘ContigInfo’ was not declared in this scope
std::vector m_contig_idx_to_info;

intel-hls / genomicsdb Goto Github PK

genomicsdb's People

Contributors

Stargazers

Watchers

Forkers

genomicsdb's Issues

Recommend Projects

Recommend Topics

Recommend Org