I wanted to try oxbow and was testing on the following VCF file, but it doesn't seem to work. I'm not sure if this should be supported already, or if I am missing something.
$ git rev-parse HEAD
e3d2a1751901430a16438134b87bc16f21d90269
$ wget https://ftp.ncbi.nlm.nih.gov/snp/latest_release/VCF/GCF_000001405.40.gz
$ wget https://ftp.ncbi.nlm.nih.gov/snp/latest_release/VCF/GCF_000001405.40.gz.tbi
$ ls -l GCF_000001405.40.*
-rw-r--r-- 1 andreaspoehlmann staff 26611209012 Oct 16 10:51 GCF_000001405.40.gz
-rw-r--r-- 1 andreaspoehlmann staff 3118040 Oct 16 10:58 GCF_000001405.40.gz.tbi
$ md5sum GCF_000001405.40.*
a1082ca70e15eb63301dfc33b19d0ae7 GCF_000001405.40.gz
76959b1691e8e62cd650664b00b7ea02 GCF_000001405.40.gz.tbi
# read_vcf.py
import importlib.metadata
import oxbow as ox
import polars as pl
print("oxbow.__version__", importlib.metadata.version("oxbow"))
ipc = ox.read_vcf("GCF_000001405.40.gz", index="GCF_000001405.40.gz.tbi")
df = pl.read_ipc(ipc)
print(df)
$ python read_vcf.py
oxbow.__version__ 0.2.0
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: ExternalError(Custom { kind: InvalidData, error: InvalidInfo(InvalidField(InvalidValue(Other(Other("RS")), InvalidInteger(ParseIntError { kind: PosOverflow })))) })', src/lib.rs:117:49
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Traceback (most recent call last):
File "/Users/andreaspoehlmann/development/oxbow-test/read_vcf.py", line 8, in <module>
ipc = ox.read_vcf("GCF_000001405.40.gz", index="GCF_000001405.40.gz.tbi")
pyo3_runtime.PanicException: called `Result::unwrap()` on an `Err` value: ExternalError(Custom { kind: InvalidData, error: InvalidInfo(InvalidField(InvalidValue(Other(Other("RS")), InvalidInteger(ParseIntError { kind: PosOverflow })))) })
$ python -VV
Python 3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:41:52) [Clang 15.0.7 ]
$ uname -a
Darwin F2WR4P9QNH 23.0.0 Darwin Kernel Version 23.0.0: Fri Sep 15 14:43:05 PDT 2023; root:xnu-10002.1.13~1/RELEASE_ARM64_T6020 arm64
$ system_profiler SPHardwareDataType | grep -e "Model\|Memory\|Cores"
Model Name: MacBook Pro
Model Identifier: Mac14,5
Total Number of Cores: 12 (8 performance and 4 efficiency)
Memory: 64 GB