Coder Social home page Coder Social logo

IUPAC codes in VCF REF field about noodles HOT 1 CLOSED

ACEnglish avatar ACEnglish commented on August 22, 2024
IUPAC codes in VCF REF field

from noodles.

Comments (1)

zaeleus avatar zaeleus commented on August 22, 2024 1

As you noted in ACEnglish/kanpig#1, the error is being triggered because the input is invalid. Reference bases must be {A, C, G, T, N}. In previous versions of noodles, this was validated when parsed, but it was changed to when serializing to allow at least reading such cases.

I think the solution is to integrate an ambiguity base resolver into noodles-vcf's record reference bases writer. This likely covers the vast majority of "invalid" reference bases. Although this behavior is undefined in VCF < 4.3, it is valid to apply to those versions.

Is it possible for noodles-vcf to raise the error before writing in order to prevent the corrupted lines?

This isn't really ideal because the given record's in-memory format is opaque, i.e., records are implementations of vcf::variant::Record. It's the reason why validation is done inline; the record is not guaranteed to have been previously parsed (e.g., vcf::Record and bcf::Record).

My idea would be to edit io/writer/record so that the write_* calls are sent a temporary buffer instead of the final writer.

For this approach, my recommendation would be to create a wrapper around vcf::io::Writer, e.g.,

main.rs
// cargo add [email protected] --features core,vcf

use std::io::{self, Write};

use noodles::{
    core::Position,
    vcf::{
        self as vcf,
        header::record::value::{map::Contig, Map},
        variant::io::Write as _,
    },
};

fn main() -> io::Result<()> {
    let stdout = io::stdout().lock();
    let mut writer = VcfLineWriter::new(vcf::io::Writer::new(stdout));

    let header = vcf::Header::builder()
        .add_contig("sq0", Map::<Contig>::new())
        .build();

    writer.write_header(&header)?;

    let record = vcf::variant::RecordBuf::builder()
        .set_reference_sequence_name("sq0")
        .set_variant_start(Position::MIN)
        .set_reference_bases("R")
        .build();

    writer.write_variant_record(&header, &record)?;

    Ok(())
}

struct VcfLineWriter<W> {
    inner: vcf::io::Writer<W>,
    buf: Vec<u8>,
}

impl<W> VcfLineWriter<W>
where
    W: Write,
{
    fn new(inner: vcf::io::Writer<W>) -> Self {
        Self {
            inner,
            buf: Vec::new(),
        }
    }

    fn write_header(&mut self, header: &vcf::Header) -> io::Result<()> {
        self.buf.clear();

        let mut writer = vcf::io::Writer::new(&mut self.buf);
        writer.write_header(header)?;

        self.inner.get_mut().write_all(&self.buf)
    }

    fn write_variant_record<R>(&mut self, header: &vcf::Header, record: &R) -> io::Result<()>
    where
        R: vcf::variant::Record,
    {
        self.buf.clear();

        let mut writer = vcf::io::Writer::new(&mut self.buf);
        writer.write_variant_record(header, record)?;

        self.inner.get_mut().write_all(&self.buf)
    }
}

Thanks for the report, and let me know if you have further questions or other issues with migrating to noodles-vcf >= 0.52.0.

from noodles.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.