This particular code challenge was split into three parts
- Recursively find all FASTQ files in a directory and report each file name and the percent of sequences in that file that are greater than 30 nucleotides long.
- Given a FASTA file with DNA sequences, find 10 most frequent sequences and return the sequence and their counts in the file.
- Given a chromosome and coordinates, write a program for looking up its annotation. Keep in mind you'll be doing this annotation millions of times. Input: Tab-delimited file: ChrPosition GTF formatted file with genome annotations. Output: Annotated file of gene name that input position overlaps. Hint: Most of the sequence reads come from a small portion of the genome. Try to use this information to improve performance, if possible.
The following repo contains a Ruby and a Python (currently under construction) directory that each contain the challenges in their corresponding languages.
This project is using the follwing
- ruby version 2.5.3
- clone repo here
- cd into the desired code directory Ruby or Python
- cd into desired script numbered above
- run the lib/runner.rb file and follow the instructions on placing analysis file in the correct location.
To run test enter the desired script as instructed above and run the test/script_name_test.rb file
- LinkedIn click here
- Personal Website click here(under construction)
- GitHub click here
- Twitter click here
- Turing Portfolio click here