zaeleus / noodles Goto Github PK
View Code? Open in Web Editor NEWBioinformatics I/O libraries in Rust
License: MIT License
Bioinformatics I/O libraries in Rust
License: MIT License
Hi, I think the record size for bcf is off. With this code:
use noodles_bcf as bcf;
use std::fs::{File};
use std::io::Write;
fn main() {
let path = &"bug.bcf";
let mut bcf = File::open(path).map(bcf::Reader::new).expect("couldn't open bcf");
let mut stderr = std::io::stderr();
for r in bcf.records() {
let mut _rec = r.expect("error getting record");
writeln!(stderr, "made it");
}
}
and the attached file (after gzipping), I get an error. The error is different (invalid record length) if the file is bigger.
bug.bcf.gz
Here is the error with that file:
thread 'main' panicked at 'error getting record: Custom { kind: UnexpectedEof, error: "failed to fill whole buffer" }', src/main.rs:13:26
with a larger file, it reads 2 records and then shows:
thread 'main' panicked at 'error getting record: Custom { kind: InvalidData, error: "invalid record length: l_shared = 2147484031, l_indiv = 2147484031" }', src/main.rs:13:26
Hello Michael,
While writing some small tasks for GSoC students:
https://github.com/chris-zen/htsget-mvp/labels/gsoc
I've noticed that BCF is not present in Noodles's workspace. It shouldn't be a big deal since for variants we'll be focusing on VCF files in the backend, but I just wanted to ask if you have plans to implement a BCF reader in Noodles?
This is not high priority at all, I was just curious while revising Google's htsget impl.
/cc @chris-zen @victorskl
Might be a good idea to mention that the supplied position parameter for the SAM record builder must be 1-based on the docs. Also may be worthwhile to mention somewhere that the SAM position is automatically converted 0-based when writing it out as a BAM record.
Hi,
I am trying to use this library as a replacement to rust-htslib
. I am getting compilation error while trying to run examples. This can be reproduced both on macOS and centOS.
> git clone https://github.com/zaeleus/noodles
> cd noodles/
> cargo run --release --example bam_write > sample.bam
Updating crates.io index
Downloaded byteorder v1.3.4
Downloaded bit-vec v0.6.2
Downloaded indexmap v1.5.0
Downloaded flate2 v1.0.16
Downloaded hashbrown v0.8.1
Downloaded miniz_oxide v0.4.0
Downloaded cfg-if v0.1.10
Downloaded adler v0.2.3
Downloaded libc v0.2.73
Downloaded crc32fast v1.2.0
Compiling autocfg v1.0.0
Compiling crc32fast v1.2.0
Compiling libc v0.2.73
Compiling cfg-if v0.1.10
Compiling byteorder v1.3.4
Compiling adler v0.2.3
Compiling bitflags v1.2.1
Compiling bit-vec v0.6.2
Compiling miniz_oxide v0.4.0
Compiling hashbrown v0.8.1
Compiling indexmap v1.5.0
Compiling flate2 v1.0.16
Compiling noodles-bgzf v0.1.0 (/home/mayakond/noodles/noodles-bgzf)
error[E0599]: no associated item named `MAX` found for type `u16` in the current scope
--> noodles-bgzf/src/virtual_position.rs:6:56
|
6 | pub(crate) const MAX_UNCOMPRESSED_POSITION: u16 = u16::MAX; // 2^16 - 1;
| ^^^ associated item not found in `u16`
|
help: you are looking for the module in `std`, not the primitive type
|
6 | pub(crate) const MAX_UNCOMPRESSED_POSITION: u16 = std::u16::MAX; // 2^16 - 1;
| ^^^^^^^^^^^^^
error: aborting due to previous error
For more information about this error, try `rustc --explain E0599`.
error: could not compile `noodles-bgzf`.
warning: build failed, waiting for other jobs to finish...
error: build failed
While experimenting, I ran into a quirk in the bgzf
module: when working with a reader: io::BufReader<bgzf::Reader<R>>
, calls to reader.get_ref().virtual_position()
break. A little MWE to illustrate:
use std::io::{self, BufRead, Read, Write};
use noodles_bgzf as bgzf;
fn log(line: &str, virtual_position: &bgzf::VirtualPosition) {
println!(
"Line '{}', compressed offset {}; uncompressed offset {}.",
line, virtual_position.compressed(), virtual_position.uncompressed()
);
}
fn main() -> io::Result<()> {
// Setup
let input = b"line1\nline2\nline3\nline4\nline5\n";
let mut writer = bgzf::Writer::new(Vec::new());
writer.write_all(input)?;
let data = writer.finish()?;
// Plain reader (this works as expected)
println!("Plain reader:");
let mut reader = bgzf::Reader::new(data.as_slice());
let mut virtual_position = reader.virtual_position();
let mut buf = [0u8; 6];
while let Ok(_) = reader.read_exact(&mut buf) {
let line = std::str::from_utf8(&buf[0..5]).expect("cannot parse UTF8");
log(line, &virtual_position);
virtual_position = reader.virtual_position();
};
println!();
// Buffered reader (this does not work as expected)
println!("Buffered reader:");
let mut reader = io::BufReader::new(bgzf::Reader::new(data.as_slice()));
loop {
let virtual_position = reader.get_ref().virtual_position();
let mut line = String::new();
match reader.read_line(&mut line) {
Ok(0) => break,
Err(e) => return Err(e),
_ => { line.pop(); },
}
log(&line, &virtual_position);
}
Ok(())
}
Running this prints:
Plain reader:
Line 'line1', compressed offset 0; uncompressed offset 0.
Line 'line2', compressed offset 0; uncompressed offset 6.
Line 'line3', compressed offset 0; uncompressed offset 12.
Line 'line4', compressed offset 0; uncompressed offset 18.
Line 'line5', compressed offset 0; uncompressed offset 24.
Buffered reader:
Line 'line1', compressed offset 0; uncompressed offset 0.
Line 'line2', compressed offset 46; uncompressed offset 0.
Line 'line3', compressed offset 46; uncompressed offset 0.
Line 'line4', compressed offset 46; uncompressed offset 0.
Line 'line5', compressed offset 46; uncompressed offset 0.
I guess this is simply a consequence of the fact that the virtual position will be moved on BufReader::fill_buf
, since this is where the call to the inner bgzf::Reader::read
occurs, rather than BufReader::consume
. Therefore, I thought at first that I was perhaps in unintended territory by going through the BufReader::get_ref
, but I notice the same thing occurs in vcf::Reader::virtual_position
, so I thought it might be helpful to mention it here.
The only solution I could think of would be to implement io::BufRead
on the bgzf::Reader
directly, to ensure that the position gets updated properly in consume
. I think that could be made to work, and in some sense it seems kind of natural since the bgzf::Reader
is already kind of buffered via the calls to read_block
. I'll be happy to try and help draft a PR if you think this is the way forward, but I thought maybe you might have different ideas for how and whether this should be supported.
While trying noodles squab
I noticed that GTF files couldn't be read (only GFF files). Are there plans to support it (as from a parsing perspective, only column 9 needs different parsing).
Hi, thanks for the library. It would be useful to have an example of setting and getting a value from the info field. I see the tests in the modules, but for a rust beginner like me, those are hard to translate to real-world.
Also it would be useful to have a way to make the genotype parsing optional for BCFs with many samples when we only need to access the INFO fields.
While trying to read the header from this file I got an error. (The uncompressed version can be found in the same directory)
Code:
let mut reader = File::open(path.as_ref()).map(bcf::Reader::new)?;
let _ = reader.read_header().unwrap();
Output:
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Custom { kind: UnexpectedEof, error: "failed to fill whole buffer" }'
I'm not sure if this is a noodles bug, but I think so, because I created the BCF file converting this VCF file using bcftools, so it should be okay.
This is an excellent project which shows off the capabilities of Rust in a bioinfomatics environment.
I was wondering if you plan async or memory mapped file support?
The two most common use cases are:
Data on local SSD -> Memory mapped files (you can get away with Cursor if performance is not an issue).
Data on object store -> async file i/o.
Async requires use of AsyncRead/AsyncWrite and async functions.
With BGZip encoded data, you can read ahead on multiple async requests (s3 recommends
around 16MB chunks) and decode on multiple threads to lower latency and memory usage.
On writes, you can use multipart upload on s3.
I'd be happy to help if you have not already done some of this.
Recent version of the async enabled crates fail to build standalone on docs.rs.
In the logs the following can be found
error[E0432]: unresolved import `tokio::fs`
bam started failing after 0.7.
When built through the wrapper noodles package the docs seem build without issues, but they are built without the async feature.
noodles_sam::record::data::Data
uses the default IndexMap
to store the Tag
/Value
pairs. The IndexMap
docs state:
IndexMap and IndexSet have a default hasher type S = RandomState, just like the standard HashMap and HashSet, which is resistant to HashDoS attacks but not the most performant.
I think there no plausible way to exploit the DoS attack when we are only hashing two bytes and alternative hash functions can easily be used. According to this page Fnv hasher performs best when hashing small values. FnvHasher is available as an external dependency
When I change the IndexMap
in noodles_sam::record::data::Data
to IndexMap<K, V, FnvBuildHasher>
I see a measurable increase in performance depending on the number of data fields (5% at 5 fields, 10% at 16)
This is a two line change, but IndexMap
is used in many more locations (mostly header types, but also vcf-record). When hashing small values I guess these can all be switched? Would you like a PR?
I was having problems using a BCF file produced by noodles
in bcftools
. After a bit of digging, I believe my issue has to do with the handling of the IDX
field in the header contig field. To illustrate, running
echo -e '##fileformat=VCFv4.3\n##contig=<ID=1>\n#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO\tFORMAT\tsample' | bcftools view -O b > test.bcf.gz
and then
use std::{error, fs, io};
use noodles_bcf as bcf;
use noodles_vcf as vcf;
fn main() -> Result<(), Box<dyn error::Error>> {
let mut reader = fs::File::open("test.bcf.gz").map(bcf::Reader::new)?;
reader.read_file_format()?;
let raw_header = reader.read_header()?;
let header: vcf::Header = raw_header.parse()?;
let mut writer = vcf::Writer::new(io::stdout());
writer.write_header(&header)?;
Ok(())
}
prints something like
##fileformat=VCFv4.3
##FILTER=<ID=PASS,Description="All filters passed",IDX=0>
##contig=<ID=1,IDX="0">
##bcftools_viewVersion=1.14-25-gc304cb1+htslib-1.14-9-ge769401
##bcftools_viewCommand=view -O b; Date=Tue Jan 4 13:22:23 2022
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample
Note in particular that the value "0"
of the IDX
field on the contig record is quoted ##contig=<ID=1,IDX="0">
, since it is encoded as a regular entry of a IndexMap<String, String>
rather than as a particular Option<usize>
field as used e.g. for filters.
When writing BCFs with noodles
, this causes issues when interfacing e.g. with bcftools
. Modifying the above slightly to write a BCF,
use std::{error, fs, io};
use noodles_bcf as bcf;
use noodles_vcf as vcf;
fn main() -> Result<(), Box<dyn error::Error>> {
let mut reader = fs::File::open("test.bcf.gz").map(bcf::Reader::new)?;
reader.read_file_format()?;
let raw_header = reader.read_header()?;
let header: vcf::Header = raw_header.parse()?;
let mut writer = bcf::Writer::new(io::stdout());
writer.write_file_format()?;
writer.write_header(&header)?;
Ok(())
}
and running for instance
cargo run | bcftools view
produces
[W::bcf_hdr_register_hrec] Error parsing the IDX tag, skipping
##fileformat=VCFv4.3
##FILTER=<ID=PASS,Description="All filters passed">
##bcftools_viewVersion=1.14-25-gc304cb1+htslib-1.14-9-ge769401
##bcftools_viewCommand=view -O b; Date=Tue Jan 4 13:22:23 2022
##bcftools_viewCommand=view; Date=Tue Jan 4 13:38:29 2022
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample
where the contig entry has been skipped due to an error parsing the string IDX tag.
Digging around in the BCF specs sections 6.2.1-6.2.2, it's not clear to me how, if at all, the IDX field should work for contigs. My impression from reading samtools/hts-specs#591, however, is that contig IDXs should somehow be handled, albeit separately from other header records? I don't understand the details of the BCF spec well enough to be sure that I completely follow that thread, however.
Feel free to close if you think this is a problem in bcftools
or with the spec, rather than with noodles
- I figured either way it might save someone else a bit of time repeating the above investigation if there's a searchable issue describing the problem.
When encoding the number of cigar operations it's not correct to divide by the sizeof u32. The currently incorrectly encoded cigar length results in invalid bams.
Occurs in both Writer
and AsyncWriter
.
noodles/noodles-bam/src/writer/record.rs
Line 30 in 1f0f8ec
I found it by adding the following test to the writer:
#[test]
fn test_write_record_with_cigar() -> Result<(), Box<dyn std::error::Error>> {
use crate::record::cigar::Op;
use crate::record::sequence::Base;
use noodles_sam::record::cigar::op::Kind;
use noodles_sam::record::quality_scores::Score;
let mut buf = Vec::new();
let mut record = Record::default();
let cigar = record.cigar_mut();
cigar.push(Op::new(Kind::Match, 3)?);
let seq = record.sequence_mut();
seq.push(Base::A);
seq.push(Base::A);
seq.push(Base::A);
let qual = record.quality_scores_mut();
qual.push(Score::try_from(40)?);
qual.push(Score::try_from(40)?);
qual.push(Score::try_from(40)?);
write_record(&mut buf, &record)?;
let expected = [
0x2B, 0x00, 0x00, 0x00, // block_size = 43
0xff, 0xff, 0xff, 0xff, // ref_id = -1
0xff, 0xff, 0xff, 0xff, // pos = -1
0x02, // l_read_name = 2
0xff, // mapq = 255
0x48, 0x12, // bin = 4680
0x01, 0x00, // n_cigar_op = 1
0x04, 0x00, // flag = 4
0x03, 0x00, 0x00, 0x00, // l_seq = 3
0xff, 0xff, 0xff, 0xff, // next_ref_id = -1
0xff, 0xff, 0xff, 0xff, // next_pos = -1
0x00, 0x00, 0x00, 0x00, // tlen = 0
0x2a, 0x00, // read_name = "*\x00"
0x30, 0x00, 0x00, 0x00, // cigar = "3M"
0x11, 0x10, // seq = "AAAA"
0x28, 0x28, 0x28// score = 40,40,40
];
assert_eq!(buf, expected);
Ok(())
}
Hi!
I am currently migrating some code from rust-htslib to noodles.
While doing so it turned out that several bam and sam record attributes are no primitive times e.g. a record position and therefore do not support hashing.
As I need to add several attributes to a HashSet it would be helpful if hashing would be generally supported by deriving Hash
In my case this would concern ReferenceSequenceId
and sam::record::Position
.
I was getting an error parsing a VCF header. The code may be reduced to the following:
fn main() {
use noodles_vcf::Header;
let raw_header = r#"##fileformat=VCFv4.2
##INFO=<ID=MQ,Number=1,Type=Integer,Description="Average mapping quality">
"#;
let header = raw_header.parse::<Header>();
println!("{:?}", header); // Error
}
which produces Err(InvalidInfo(TypeMismatch(Integer, Float)))
. As the error indicates, the problem seems to arise from the fact that MQ
here is Integer
, whereas noodles
seems to expect Float
. As far as I can see, the specs aren't entirely clear about the types of the reserved INFO
fields. However,
MQ
fields for genotypes, whichs saysMQ : RMS mapping quality, similar to the version in the INFO field. (Integer)
MQ
as an integer field (e.g. here)which makes me think that MQ should perhaps be Type::Integer
in noodles also.
On a side note, I wonder if something similar may also be true for BQ
, though I can find even less info in the specs and bcftools examples in this case?
Happy to submit a PR if you'd like.
There's and interesting barrier regarding Send
with AsyncReader
for the bgzf formats, and I'm not sure if it's by design. For example, the bam AsyncReader
is not send, making code like in this example, fail to compile.
The error complains that: the trait std::marker::Send
is not implemented for (dyn std::future::Future<Output = std::result::Result<noodles::noodles_bgzf::block::Block, std::io::Error>> + 'static)
Do you think that this is unavoidable?
Although rare, FASTA files do pass through Windows machines and the current reader assumes \n
only line endings.
If this analysis is correct, I am willing to work on it.
As far as I can tell, it's currently impossible to modify an existing vcf::Header
, apart from adding unstructured records with the insert
method. This is inconvenient e.g. for the case when modifying an existing VCF by adding a few INFO or FORMAT fields to the header and records, rather than building a header from scratch.
I see that the sam::Header
already has _mut
methods for cases like this. Would it make sense to add something similar for the vcf::Header
methods that return IndexMap
s? For the non-IndexMap
fields (assembly
, file_format
, pedigree_db
), there could be similar _mut
methods to return mutable reference or perhaps simply set_
setters? Finally, it might sometimes be ergonomic to create a vcf::header::Builder
from an existing header - perhaps impl From<Header> for Builder
could be nice?
Happy to make the PR if you'd like.
The Recod::flags
function returns Flags
which is Copy
so there is, as far as I can see, no way to mutate the flags.
pub fn flags_mut(&mut self) -> &mut Flags {
&mut self.flags
}
is probably the way to go considering similar functions of the API. I can send a PR if you like.
Is there a timeline or line in the sand for when you might be interested in pushing a v0.1.0 of all of these to crates.io
? It's impossible to publish a crate that depends on any of the noodles
family until there is a version of them in crates.io
.
If there's a checklist of things to add to add or do for an initial release I'm happy to help push it over the finish line. Specifically I want to release a new version of perbase that uses noodles-bgzf
can't publish it to crates.io until until noodles is there first.
In the encoded bam there is one field that encodes the length for both the sequence and the base qualities. When writing a bam::Record this is not validated resulting in possibly corrupt BAM files.
According to the bam spec when base qualities are omitted 0xFF should be written instead.
This can be done in the writer code, but it's probably cleaner to do a validate
function on the record? Should I do a PR?
When calling query on a ReferenceSequence from a Tabix file (this file specifically) I got a panic.
Code:
let path = std::env::current_dir()
.unwrap()
.join("data")
.join("vcf")
.join("sample1-bcbio-cancer.vcf.gz.tbi");
let index = tabix::read(path).unwrap();
for ref_seq in index.reference_sequences() {
ref_seq.query(150, i32::MAX);
}
Output:
thread 'main' panicked at 'index out of bounds: 37450 >= 37450', /home/daniel/.cargo/registry/src/github.com-1ecc6299db9ec823/bit-vec-0.6.3/src/lib.rs:539:9
It seems to have something to do with the i32::MAX value, but it doesn't happen if the start value is 0
I also wanted to ask if there is an easy way of suing the query functionality with an open end or start, so it covers until the end of the sequence or the start.
hi,
I used noodles to modify and filter bam files with the primary focus of learning rust. Is it possible to parallelize the compression part when writing to a bam file.
Great package :)
Kristian
If I try to parse the VCF example header from the VCF specification I get the following error.
Code:
use noodles_vcf::Header;
r#"##fileformat=VCFv4.3
##fileDate=20090805
##source=myImputationProgramV3.1
##reference=file:///seq/references/1000GenomesPilot-NCBI36.fasta
##contig=<ID=20,length=62435964,assembly=B36,md5=f126cdf8a6e0c7f379d618ff66beb2da,species="Homo sapiens",taxonomy=x>
##phasing=partial
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency">
##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele">
##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership, build 129">
##INFO=<ID=H2,Number=0,Type=Flag,Description="HapMap2 membership">
##FILTER=<ID=q10,Description="Quality below 10">
##FILTER=<ID=s50,Description="Less than 50% of samples have data">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NA00002 NA00003"#
.parse::<Header>().unwrap();
Output:
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: InvalidContig(InvalidRecord)'
The error disappears when I remove the md5=f126cdf8a6e0c7f379d618ff66beb2da,
part in the contig field.
I'm trying noodles-(bam/sam) for a real project now and so far the experience is more than pleasant. I've come across a few performance issues I'd like to discuss. The Tag::Other(String)
representation being a relatively simple change.
Currently Tag:Other(String)
requires the end user to create a String when you want to query a tag. This has a few drawbacks
String
cannot be const
Since the tag is required by the specification to be 2 bytes change the String
to [u8; 2]
(this is a LENGTH
constant in de code).
This can be done with very few API changes and is implemented in a PR.
I ran a benchmark searching for the UMI tag in a real data line:
AS:i:0 ZS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:75 YS:i:0 YT:Z:CP RG:Z:000000000-JM6RV.1 XS:A:+ NH:i:4 RX:Z:CGAATAAA MQ:i:1 MC:Z:75M ms:i:2646
let v = data
.fields()
.find_map(|f| {
let f = f.ok()?;
if f.tag() == &Tag::UmiSequence {
Some(f.value().to_owned())
} else {
None
}
}).unwrap();
assert_eq!(v, Value::String("CGAATAAA".to_string()));
The change to [u8; 2]
results in a decent performance gain:
DataValues/classic time: [569.66 ns 570.12 ns 570.64 ns]
change: [-28.071% -27.942% -27.830%] (p = 0.00 < 0.05)
The issue with this array implementation is the same as with the String version. Directly creating the enum variant is allowed and makes it possible to generate invalid tags. let t = Tag::Other("_a".to_string());
is currently accepted as well as in the changed version. This can probably be avoided by wrapping the array (or the String in the current version) in a newtype, but I'm not sure how this would affect the ability to make const tags.
Let me know if you want to explore this.
Hi,
yes, this VCF (converted to BCF): https://github.com/brentp/vcf-bench#testing-datawith this code: https://github.com/brentp/vcf-bench/blob/625482287b6238a5d1e018ac94b3d89558593986/rust-noodles/src/main.rs
also in that code, I couldn't get the key on line 27 to work, had to use the predefined,
Key::TotalAlleleCount
. I'm curious what I could change to use it as line 27.
Originally posted by @brentp in #41 (comment)
If I try to parse the VCF header from this file, which I think has a valid VCF header, I get an error.
Code:
use noodles_vcf::Header;
r#"##fileformat=VCFv4.1
##fileDate=20100501
##reference=1000GenomesPilot-NCBI36
##assembly=ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/sv/breakpoint_assemblies.fasta
##INFO=<ID=BKPTID,Number=.,Type=String,Description="ID of the assembled alternate allele in the assembly file">
##INFO=<ID=CIEND,Number=2,Type=Integer,Description="Confidence interval around END for imprecise variants">
##INFO=<ID=CIPOS,Number=2,Type=Integer,Description="Confidence interval around POS for imprecise variants">
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the variant described in this record">
##INFO=<ID=HOMLEN,Number=.,Type=Integer,Description="Length of base pair identical micro-homology at event breakpoints">
##INFO=<ID=HOMSEQ,Number=.,Type=String,Description="Sequence of base pair identical micro-homology at event breakpoints">
##INFO=<ID=SVLEN,Number=.,Type=Integer,Description="Difference in length between REF and ALT alleles">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
##ALT=<ID=DEL,Description="Deletion">
##ALT=<ID=DEL:ME:ALU,Description="Deletion of ALU element">
##ALT=<ID=DEL:ME:L1,Description="Deletion of L1 element">
##ALT=<ID=DUP,Description="Duplication">
##ALT=<ID=DUP:TANDEM,Description="Tandem Duplication">
##ALT=<ID=INS,Description="Insertion of novel sequence">
##ALT=<ID=INS:ME:ALU,Description="Insertion of ALU element">
##ALT=<ID=INS:ME:L1,Description="Insertion of L1 element">
##ALT=<ID=INV,Description="Inversion">
##ALT=<ID=CNV,Description="Copy number variable region">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Float,Description="Genotype quality">
##FORMAT=<ID=CN,Number=1,Type=Integer,Description="Copy number genotype for imprecise events">
##FORMAT=<ID=CNQ,Number=1,Type=Float,Description="Copy number genotype quality for imprecise events">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001
"#
.parse::<Header>().unwrap();
Output:
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: InvalidFormat(TypeMismatch(Float, Integer))'
It seems to have something to do with the format lines, but I couldn't isolate the error
Something similar also happens with the header from this other file.
Code:
use noodles_vcf::Header;
r#"##fileformat=VCFv4.1
##FILTER=<ID=LowQual,Description="Low quality">
##FORMAT=<ID=AD,Number=.,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
##GATKCommandLine=<ID=UnifiedGenotyper,Version=2.6-4-g3e5ff60,Date="Sun Jul 21 00:41:26 EDT 2013",Epoch=1374381686726,CommandLineOptions="analysis_type=UnifiedGenotyper input_file=[/home/chapmanb/bio/bcbio-nextgen/tests/test_automated_output/bamprep/Test1/chrM/Test1-chrM_100_1000-prep.bam] read_buffer_size=null phone_home=AWS gatk_key=null tag=NA read_filter=[BadCigar, NotPrimaryAlignment] intervals=[/home/chapmanb/bio/bcbio-nextgen/tests/test_automated_output/gatk/chrM/Test1-chrM_100_1000-prep-variants-TestBatch1-raw-regions.bed] excludeIntervals=null interval_set_rule=INTERSECTION interval_merging=ALL interval_padding=0 reference_sequence=/home/chapmanb/bio/bcbio-nextgen/tests/data/automated/tool-data/../../genomes/hg19/seq/hg19.fa nonDeterministicRandomSeed=false disableDithering=false maxRuntime=-1 maxRuntimeUnits=MINUTES downsampling_type=BY_SAMPLE downsample_to_fraction=null downsample_to_coverage=250 baq=OFF baqGapOpenPenalty=40.0 fix_misencoded_quality_scores=false allow_potentially_misencoded_quality_scores=false useOriginalQualities=false defaultBaseQualities=-1 performanceLog=null BQSR=null quantize_quals=0 disable_indel_quals=false emit_original_quals=false preserve_qscores_less_than=6 globalQScorePrior=-1.0 allow_bqsr_on_reduced_bams_despite_repeated_warnings=false validation_strictness=SILENT remove_program_records=false keep_program_records=false unsafe=LENIENT_VCF_PROCESSING disable_auto_index_creation_and_locking_when_reading_rods=false num_threads=1 num_cpu_threads_per_data_thread=1 num_io_threads=0 monitorThreadEfficiency=false num_bam_file_handles=null read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_with_unindexed_bam=false generateShadowBCF=false logging_level=INFO log_to_file=null help=false version=false genotype_likelihoods_model=BOTH pcr_error_rate=1.0E-4 computeSLOD=false annotateNDA=false pair_hmm_implementation=LOGLESS_CACHING min_base_quality_score=17 max_deletion_fraction=0.05 allSitePLs=false min_indel_count_for_genotyping=5 min_indel_fraction_per_sample=0.25 indelGapContinuationPenalty=10 indelGapOpenPenalty=45 indelHaplotypeSize=80 indelDebug=false ignoreSNPAlleles=false allReadsSP=false ignoreLaneInfo=false reference_sample_calls=(RodBinding name= source=UNBOUND) reference_sample_name=null sample_ploidy=2 min_quality_score=1 max_quality_score=40 site_quality_prior=20 min_power_threshold_for_calling=0.95 min_reference_depth=100 exclude_filtered_reference_sites=false heterozygosity=0.001 indel_heterozygosity=1.25E-4 genotyping_mode=DISCOVERY output_mode=EMIT_VARIANTS_ONLY standard_min_confidence_threshold_for_calling=4.0 standard_min_confidence_threshold_for_emitting=4.0 alleles=(RodBinding name= source=UNBOUND) max_alternate_alleles=6 input_prior=[] contamination_fraction_to_filter=0.05 contamination_fraction_per_sample_file=null p_nonref_model=EXACT_INDEPENDENT exactcallslog=null dbsnp=(RodBinding name=dbsnp source=/home/chapmanb/bio/bcbio-nextgen/tests/data/automated/tool-data/../../genomes/hg19/variation/dbsnp_132.vcf) comp=[] out=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub no_cmdline_in_header=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub sites_only=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub bcf=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub debug_file=null metrics_file=null annotation=[BaseQualityRankSumTest, FisherStrand, GCContent, HaplotypeScore, HomopolymerRun, MappingQualityRankSumTest, MappingQualityZero, QualByDepth, ReadPosRankSumTest, RMSMappingQuality, DepthPerAlleleBySample, Coverage] excludeAnnotation=[] filter_reads_with_N_cigar=false filter_mismatching_base_and_quals=false filter_bases_not_stored=false">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##INFO=<ID=BaseQRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt Vs. Ref base qualities">
##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP Membership">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth; some reads may have been filtered">
##INFO=<ID=DS,Number=0,Type=Flag,Description="Were any of the samples downsampled?">
##INFO=<ID=Dels,Number=1,Type=Float,Description="Fraction of Reads Containing Spanning Deletions">
##INFO=<ID=FS,Number=1,Type=Float,Description="Phred-scaled p-value using Fisher's exact test to detect strand bias">
##INFO=<ID=GC,Number=1,Type=Integer,Description="GC content around the variant (see docs for window size details)">
##INFO=<ID=HRun,Number=1,Type=Integer,Description="Largest Contiguous Homopolymer Run of Variant Allele In Either Direction">
##INFO=<ID=HaplotypeScore,Number=1,Type=Float,Description="Consistency of the site with at most two segregating haplotypes">
##INFO=<ID=InbreedingCoeff,Number=1,Type=Float,Description="Inbreeding coefficient as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg expectation">
##INFO=<ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed">
##INFO=<ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed">
##INFO=<ID=MQ,Number=1,Type=Float,Description="RMS Mapping Quality">
##INFO=<ID=MQ0,Number=1,Type=Integer,Description="Total Mapping Quality Zero Reads">
##INFO=<ID=MQRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref read mapping qualities">
##INFO=<ID=QD,Number=1,Type=Float,Description="Variant Confidence/Quality by Depth">
##INFO=<ID=RPA,Number=.,Type=Integer,Description="Number of times tandem repeat unit is repeated, for each allele (including reference)">
##INFO=<ID=RU,Number=1,Type=String,Description="Tandem repeat unit (bases)">
##INFO=<ID=ReadPosRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias">
##INFO=<ID=STR,Number=0,Type=Flag,Description="Variant is a short tandem repeat">
##contig=<ID=chrM,length=16571,assembly=hg19>
##contig=<ID=chr22,length=20001,assembly=hg19>
##reference=file:///home/chapmanb/bio/bcbio-nextgen/tests/data/automated/tool-data/../../genomes/hg19/seq/hg19.fa
"#
.parse::<Header>().unwrap();
Output:
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: InvalidFormat(NumberMismatch(Unknown, R))
I was trying to use noodles (thanks!) to write a VCF file with bgzip compression, which I thought I'd be able to do with a vcf::Writer<bgzf::Writer<W>>
along the lines of what's happening in the BAM module.
To illustrate, I tried modifying noodles-vcf/examples/vcf_write.rs
(changed lines are emphasised):
use std::io; use noodles_bgzf as bgzf; use noodles_vcf::{self as vcf, header::Contig}; fn main() -> Result<(), Box> { let stdout = io::stdout(); let handle = stdout.lock(); let bgzf_handle = bgzf::Writer::new(handle); let mut writer = vcf::Writer::new(bgzf_handle); let header = vcf::Header::builder() .add_contig(Contig::new(String::from("sq0"))) .build(); writer.write_header(&header)?; let record = vcf::Record::builder() .set_chromosome("sq0".parse()?) .set_position(1) .set_reference_bases("A".parse()?) .build()?; writer.write_record(&record)?; Ok(()) }
Now, running cargo run --release --example vcf_write
compiles and runs, but nothing is written to stdout. I tried the same thing with a std::fs::File
handle instead of a locked stdout, and that likewise compiles and runs, only to produce an empty file. If I explicitly flush the writes, I get a bgzip header and trailer around an uncompressed VCF, and then I wasn't sure how to proceed. Perhaps I'm missing something simpler?
While compiling an updated version (c235259) I got the following error
error[E0277]: the type `[u8]` cannot be indexed by `(Bound<usize>, Bound<usize>)`
--> /home/daniel/.cargo/git/checkouts/noodles-f00ebc122d065317/c235259/noodles-fasta/src/reader.rs:251:36
|
251 | Ok(Record::new(definition, sequence[range].to_vec()))
| ^^^^^^^^^^^^^^^ slice indices are of type `usize` or ranges of `usize`
|
= help: the trait `SliceIndex<[u8]>` is not implemented for `(Bound<usize>, Bound<usize>)`
= note: required because of the requirements on the impl of `std::ops::Index<(Bound<usize>, Bound<usize>)>` for `Vec<u8>`
The HtsGet-rs tests fail after some of the last changes. I have tracked the new errors down to the commits mentioned in umccr/htsget-rs#51. It seems the changes regarding "directly reading the metadata pseudo-bin" are responsible for this. The errors vary from not being able to read index files at all and some really weird values. It would be of great help if you could look into it ๐ฅ
I can parse a VCF record to get the FORMAT/GT
fields as a noodles_vcf::record::genotype::field::value::genotype::Genotype
, which I see is a newtype around Vec<noodles_vcf::record::genotype::field::value::genotype::allele::Allele>
. However, I cannot find a way to work with these alleles in any way. From other parts of noodles I might have expected that Genotype
implemented e.g. Deref<Target = Vec<Allele>>
or provided getters/setters/iterators, but I cannot find any way to access the inner information. Am I missing something, or is such access lacking?
A MWE gets a little lengthy, but I'll include it for completeness:
use std::io;
use noodles_vcf as vcf;
fn main() -> io::Result<()> {
let data = br#"##fileformat=VCFv4.1
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NA00002 NA00003
1 1 . A . 999 PASS . GT 0/0 ./0 1/1"#;
let mut reader = vcf::Reader::new(&data[..]);
let _header = reader.read_header()?;
let mut raw_record = String::new();
reader.read_record(&mut raw_record)?;
let record: vcf::Record = raw_record.parse().expect("cannot parse record");
let genotypes: Vec<vcf::record::genotype::field::value::genotype::Genotype> = record
.genotypes()
.iter()
.map(|gt| {
gt.get(&vcf::record::genotype::field::key::Key::Genotype)
.and_then(|field| field.value())
.map(|value| match value {
vcf::record::genotype::field::Value::String(s) => s
.parse::<vcf::record::genotype::field::value::genotype::Genotype>()
.expect("cannot parse genotype"),
_ => panic!("no genotype for sample"),
})
.expect("cannot access genotype")
})
.collect();
for genotype in genotypes {
println!("{:?}", genotype);
// Do something with alleles in genotype?
}
Ok(())
}
On a somewhat related note, would it be in scope for noodles to provide an iterator over noodles_vcf::record::genotype::field::value::genotype::Genotype
from a noodles_vcf::record::Record
in addition to the iterator over noodles_vcf::record::genotype::Genotype
that is returned by Record::genotypes
? For my own part at least, the GT
field itself is by far the most commonly accessed FORMAT
field, so this might be a nice ergonomic improvement as compared to the above.
Again, I'm happy to contribute PRs for either of those suggestions if you'd like.
Hello
I am currently trying to get the target and mate-target ID which I find a bit confusing.
I got the header-like ReferenceSequence via:
let sequences : IndexMap<String, sam::header::ReferenceSequence> = match reader.read_reference_sequences() {
Ok(n) => n,
Err(_) => panic!("ERROR: could not get ref sequence!"),
};
Now I thought I could easily just look up the index and get the entry, but the reference_sequence_id
returns something I am not familiar with:
let chr1_index = read.reference_sequence_id().expect("ERROR: could not obtain read ref ID");
leads to
[src/lib/noodles_based_lib.rs:77] chr1_index = ReferenceSequenceId(
1434,
)
[src/lib/noodles_based_lib.rs:77] chr1_index = ReferenceSequenceId(
1434,
)
Stupid question, what is that structure and how do I get the index number to use as lookup for the IndexMap ? :)
Update:
Okay stupid me figured it out, just hope that this is the good way to do it:
let chr1_index = match read.reference_sequence_id().map(i32::from){
Some(n) => n,
_ => panic!("ERROR: could not retrieve the id of the reference!"),
};
let chr1 : &str = match seq_local.get_index(chr1_index as usize){
Some(n) => n.0 ,
None => panic!("ERROR: could not assign a reference ID to a reference name!"),
};
If this seems fine, please feel free to close. Just want to avoid that I close it and you recommend against this way ;)
Looking at the tests, this is expected and tested for:
noodles/noodles-bam/src/record.rs
Line 562 in 0e5af40
Is it really intentional? That makes the .read_name()
value really hard to work with.
I noticed this because I am converting from bam
to sam
via a sam record builder to modify tag values.
Are there plans to optionally allow libdeflater instead of flate2 in noodles-bgzf
?
HTSlib will use libdeflate if found at compile time as it is quite a bit faster than zlib (and alternative zlib implementations).
http://www.htslib.org/benchmarks/zlib.html
https://github.com/sstadick/gzp might be interesting too (has its own bgzf implementation).
I could be missing something, but the BinningIndex
query function requires a RangeBounds<i32>
and Copy
type, however RangeBounds
does not implement Copy
.
Specifically:
let chunks = index.query(ref_seq_id, seq_start..=seq_end);
Outputs error E0277:
the trait bound RangeInclusive<i32>: std::marker::Copy is not satisfied
In many of the Cargo.toml
files, the docs url is doc.rs
. It should be docs.rs
.
Hi... very excited to see this library! I've developed a couple tangentially related packages, though noodles is much better developed.
In particular there's one that does mgf and mzml. These are used in metabolomics/proteomics for storing mass spec data. Are you accepting format contributions? / Would mgf and mzml be a good fit (i.e. a noodles-mgf
)?
The goal of this package is to have have functions for reserializing to a different format (e.g. mgf to jsonl), so there's a CLI component and a lib component. It's annoying to comingle them, so seeing noodles I thought I'd see what you thought (or at least we can commiserate about bioinformatic formats ๐).
I found a case in which calling seek changes the output. The file I'm using to test it can be found here
Code:
let path = std::env::current_dir()
.unwrap()
.join("data")
.join("vcf")
.join("sample1-bcbio-cancer.vcf.gz");
let mut reader = File::open(path)
.map(bgzf::Reader::new)
.map(vcf::Reader::new)
.unwrap();
println!("First round");
reader.read_header().unwrap();
let mut s = String::new();
loop {
reader.read_record(&mut s).unwrap();
if s.is_empty() {
break;
}
let vpos = reader.virtual_position();
println!("{}/{}", vpos.compressed(), vpos.uncompressed());
s.clear();
}
println!("Repeat after performing a seek");
reader
.seek(VirtualPosition::try_from((0, 7288)).unwrap())
.unwrap();
loop {
reader.read_record(&mut s).unwrap();
if s.is_empty() {
break;
}
let vpos = reader.virtual_position();
println!("{}/{}", vpos.compressed(), vpos.uncompressed());
s.clear();
}
Output:
First round
0/7481
0/7687
0/7882
0/8075
0/8269
0/8463
0/8658
0/8852
0/9108
0/9302
0/9496
3395/0
Repeat after performing a seek
0/7481
0/7687
0/7882
0/8075
0/8269
0/8463
0/8658
0/8852
0/9108
0/9302
0/9496
28/0
The results are exactly the same until the end of the block, when the seek version reports an incorrect compressed part. The same results appear when running each version with different readers.
2. The header is still lacking mutable access to the unstructured fields. Perhaps it would make sense to add either
get_mut
/delete
to go withget
/insert
, and/or simply addrecords
/records_mut
for full access?
Add
vcf::Header::records -> &IndexMap<String, Vec<Record>>
andvcf::Header::records_mut -> &mut IndexMap<String, Vec<Record>>
to access the header map
field. This is a map of generic records.
Not sure if this is really wrong, but I thought it could use a revision.
The index (made with tabix) reports a different block offset than noodles. The files can be found here
Code:
let path = std::env::current_dir()
.unwrap()
.join("data")
.join("vcf")
.join("sample1-bcbio-cancer.vcf.gz");
let mut reader = File::open(&path)
.map(bgzf::Reader::new)
.map(vcf::Reader::new)
.unwrap();
println!("From reading the records");
reader.read_header().unwrap();
let mut s = String::new();
loop {
reader.read_record(&mut s).unwrap();
if s.is_empty() {
break;
}
let vpos = reader.virtual_position();
println!("{}/{}", vpos.compressed(), vpos.uncompressed());
s.clear();
}
println!("From the index");
let index = tabix::read(format!("{}{}", path.to_str().unwrap(), ".tbi")).unwrap();
index
.reference_sequences()
.into_iter()
.flat_map(|seq| seq.bins())
.flat_map(|bin| bin.chunks())
.map(|chunk| chunk.end())
.for_each(|chunk| println!("{}/{}", chunk.compressed(), chunk.uncompressed()));
Output:
From reading the records
0/7481
0/7687
0/7882
0/8075
0/8269
0/8463
0/8658
0/8852
0/9108
0/9302
0/9496
3395/0
From the index
0/9302
3367/0
The index reports 3367, but noodles reading the records reports 3395.
Hi, here is an example record:
chr1 27378 . T C 31.28 VQSRTrancheSNP99.80to100.00 AF_EAS_unrel=.
with:
##INFO=<ID=AF_EAS_unrel,Number=A,Type=Float,Description="Allele frequency in EAS (2504 unrelated samples)">
Hi -- for my line of work I often have to (de-)serialize bioinformatic file formats into more common formats, and I'm curious if there are any recommendations for how to do that with noodles or if someone else has done it... like in an ideal world I could:
use noodles::fasta;
fn main() {
let d = fasta::record::Definition::new(String::from("seq1"), None);
let seq = "ATCG".as_bytes().to_vec();
let s = fasta::record::Record::new(d, seq);
let s_json = serde_json::to_string(s);
println!("{}", s_json);
}
I know it's possible add serialization to structs in external packages, but it's a non trivial amount of work, so thought I'd ask either a) if there was a good path to take; b) any thoughts/plans on supporting serialization a la rust-bio.
Thanks!
When writing ultralong nanopore reads from circular contigs (in this specific case chrM), noodles_cram::Writer::write_record fails with
thread 'main' panicked at 'index out of bounds: the len is 16569 but the index is 16569', /home/till/.cargo/registry/src/github.com-1ecc6299db9ec823/noodles-cram-0.11.0/src/data_container/compression_header/preservation_map/substitution_matrix/builder.rs:17:28
I assume this is because the read wraps around the contig. I've attached a cram file (failing_record.zip) which contains such a record. The reference fasta is available at https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/chm13.draft_v1.1.fasta.gz (from the telomere-to-telomere project)
The crates currently offers read and write functionality for various file formats and their respective index formats. Is it within the scope of this repo to include an Indexer
struct to calculate the index Record
from the respecitve file format?
Hello everyone thanks for your very nice work.
I didn't see any information for the Minimal Support Rust Version. This information is important for people use noodles, and for new contributor to know which Rust functionality they can use.
A nice choice for my point of view is use 1.57 Rust version, it's match with edition 2021. But I didn't have many good argument for this version.
The following code fails:
let path = std::env::current_dir()
.unwrap()
.join("data")
.join("vcf")
.join("sample1-bcbio-cancer.vcf.gz");
let mut reader = File::open(path)
.map(bgzf::Reader::new)
.map(vcf::Reader::new)
.unwrap();
reader.read_header().unwrap();
loop {
if reader.read_record(&mut String::new()).unwrap() == 0 {
break;
}
}
reader
.seek(VirtualPosition::try_from((0, 9302)).unwrap())
.unwrap();
assert_eq!(
reader.virtual_position(),
VirtualPosition::try_from((0, 9302)).unwrap()
);
The file I'm reading is this one (The compressed version can be found in the same folder).
I think it may have something to do with the EOF.
Hey there! I've got a BAM-parsing pipeline that I'd love to re-implement in Rust using noodles
to speed everything up and consolidate many steps.
I've been reading through the code/documentation/examples, which are all great! But I either haven't found or haven't grokked how to a) check for the presence of a tag and b) get the value in an efficient way.
Using pysam
the workflow would be something like (for example):
for a in aligned_bam:
if a.is_secondary or a.is_supplementary or a.mapping_quality < mq:
continue
for tag in tags: # I have a list of tags I'm looking for
if a.has_tag(tag):
tag_histograms[tag][a.get_tag(tag)] += 1
I'm not sure how I would do this with noodles
. My current guess is
for result in reader.records() {
let record = result?;
let flags = record.flags();
let data = record.data();
if !(flags.is_supplementary() || flags.is_duplicate()) {
let mapq = u8::from(record.mapping_quality());
if mapq != 255 && mapq >= 10 {
// iterate over data and add values to counters?
}
}
}
In part I'm questioning whether I need to iterate over the tags or if there's a faster lookup method. That's about where I decided to open an issue.
Happy to contribute examples for this workflow, if you can point me in the right direction!
It seems like the VirtualPosition
returned by the async and blocking bam Reader
are not the same. Have a look at the example here. Could be missing something, but let me know your thoughts.
As of noodles version 0.17.0 / noodles-fastq version 0.3.0, the fastq module does not support comments at all. I know that these are rarely used, they are however part of the file format. Would it be possible to include functionality for reading/getting and writing/setting comments?
The vcf::Header
parser should require the header header (#CHROM
...) to be the last line of the raw input. The smallest valid header then becomes
##fileformat=VCFv4.3
#CHROM POS ID REF ALT QUAL FILTER INFO
See #19 (comment).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.