Comments (2)
Thanks for the detailed report. I see the issue here.
It was an unfortunate decision to entangle the format/spec with an implemntation detail. A few changes will have to be made to provide compatibility with htslib.
-
bcf::header::StringMap
needs to be reworked to allow nonsequential entries. This means it can no longer be backed by anIndexSet
. (4291d88 and 4e39a77) - A
StringMap
will need to be created forcontig
header records. Given that absolute positions can override relative positions, the order ofvcf::header::Contigs
cannot be used. (17397d7) - Document that a record chromosome ID (
chrom
) is not an index intovcf::header::Contigs
but an index into the contigStringMap
(6b73f93). - Writing a VCF record needs to use the contig
StringMap
instead of the VCF header contigs (9da12e7). - Converting a BCF record to BCF record needs to the contig
StringMap
instead of the VCF header contigs (9c97ba8). - Parse
IDX
field from raw record invcf::header::Contig
(22046f0).
I will work on these.
from noodles.
noodles 0.17.0 includes the changes that resolve this issue, mainly parsing the IDX field on VCF header contig
records and also building a string map for those records.
The implementation is a breaking change. The user now has to parse a bcf::header::StringMaps
, which includes both the dictionary of strings and dictionary of contigs. Typically, if any method will need both maps (e.g., Record::try_into_vcf_record
and Writer::write_vcf_record
), use the collection; otherwise, use the respective string map (StringMaps::strings
or StringMaps::contigs
). For example,
let raw_header = reader.read_header()?;
let header = raw_header.parse()?;
let string_maps: StringMaps = raw_header.parse()?;
let record = bcf::Record::default();
let dp = record.info().get(&header, string_maps.strings(), Key::TotalDepth);
let chromosome_id = usize::try_from(record.chromosome_id()).expect("invalid chrom");
let chromosome_name = strings_map.contigs().get_index(chromosome_id);
writer.write_vcf_record(&header, &string_maps, &record)?;
from noodles.
Related Issues (20)
- noodles-fastq: Expose `reader` module as `pub` to access `Records` struct HOT 2
- Allow setting fasta hard wrap length HOT 3
- sam/header/record: Header record with group order is serialized incorrectly
- cram/writer: Add `cram::AsyncWriter` HOT 2
- bam/bai: `read_async` and `write_async` convenience methods? HOT 1
- cram/async: query support
- cram/async: support read_record HOT 2
- Parsing even very basic VCF INFO headers fails; error message is not informative HOT 1
- alignment::reader::Builder build_from_reader cannot read from stream (e.g., stdin) HOT 1
- cram/container/block: Decode fails when the block is empty HOT 1
- panic when removing SAM tags HOT 2
- noodles bgzf: Reader breaks on concatenated bgzip files HOT 4
- Parse pedigree field more completely HOT 2
- `bcf::Writer::write_vcf_record` writes wrong `FORMAT/GT` HOT 1
- `TryFromUByteError(255)` in CRAM parsing HOT 1
- `InvalidRecord(InvalidInfo(TypeMismatch("Integer", "Float")))` in BCF parsing HOT 2
- Track bam reading state and error or transparently advance the stream.
- Can bgzip multi-threading be combined with random access reads? HOT 1
- Question about async read bam HOT 4
- Making an BED reader where number of columns unknown HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from noodles.