Coder Social home page Coder Social logo

dna-fountain's Introduction

Encoding example

Create a compressed tar archive:

tar -b1 -czvf info_to_code.tar.gz ./info_to_code/

Zero-padding to make the input a multiple of 512bytes

truncate -s2116608 ./info_to_code.tar.gz

Or download the original archive:

wget http://files.teamerlich.org/dna_fountain/dna-fountain-input-files.tar.gz

Actual encoding of data as DNA (output is a FASTA file):

python encode.py \
--file_in info_to_code.tar.gz \
--size 32 \
-m 3 \
--gc 0.05 \
--rs 2 \
--delta 0.001 \
--c_dist 0.025 \
--out info_to_code.tar.gz.dna \
--stop 72000

Add annealing sites:

cat info_to_code.tar.gz.dna | grep -v '>' |\
awk '{print "GTTCAGAGTTCTACAGTCCGACGATC"$0"TGGAATTCTCGGGTGCCAAGG"}' \
> info_to_code.tar.gz.dna_order

Output file is ready to order synthetic DNA.

Decoding example

Convert BCL to FASTQ using picard (https://github.com/broadinstitute/picard):

for i in {1101..1119} {2101..2119}; do
mkdir ~/Downloads/fountaincode/seq_data3/$i/;
done

for i in {1101..1119} {2101..2119}; do
java -jar ~/Downloads/picard-tools-2.5.0/picard.jar \
IlluminaBasecallsToFastq \
BASECALLS_DIR=./raw/19854859/Data/Intensities/BaseCalls/ \
LANE=1 \
OUTPUT_PREFIX=./seq_data3/$i/ \
RUN_BARCODE=19854859 \
MACHINE_NAME=M00911 \
READ_STRUCTURE=151T6M151T \
FIRST_TILE=$i \
TILE_LIMIT=1 \
FLOWCELL_BARCODE=AR4JF;
done

Read stitching using PEAR (Zhang J et al., Bioinformatics, 2014). This step takes the 150nt reads and places them together to get back the full oligo.

for i in {1101..1119} {2101..2119}; do
pear -f ./$i.1.fastq -r ./$i.2.fastq -o $i.all.fastq;
done

Retain only fragments with 152nt (the original oligo size):

awk '(NR%4==2 && length($0)==152){print $0}' *.all.fastq.assembled.fastq > all.fastq.good

Sort to prioritize highly abundant reads:

sort -S4G all.fastq.good | uniq -c > all.fastq.good.sorted
gsed -r 's/^\s+//' all.fastq.good.sorted |\
sort -r -n -k1 -S4G > all.fastq.good.sorted.quantity

Exclude column 1 specifying the number of times a read was seen and exclude reads with N:

cut -f2 -d' ' all.fastq.good.sorted.quantity |\
grep -v 'N' > all.fastq.good.sorted.seq
# Decoding:
python ~/Downloads/fountaincode/receiver.py \
-f ./seq_data3/all.fastq.good.sorted.seq \
--header_size 4 \
--rs 2 \
--delta 0.001 \
--c_dist 0.025 \
-n 67088 \
-m 3 \
--gc 0.05 \
--max_hamming 0 \
--out decoder.out.bin

checksum verification:

md5 decoder.out.bin

expected output is 8651e90d3a013178b816b63fdbb94b9b

md5 info_to_code.tar.gz

expected output is 8651e90d3a013178b816b63fdbb94b9b

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.