mdshw5 / pyfaidx Goto Github PK
View Code? Open in Web Editor NEWEfficient pythonic random access to fasta subsequences
Home Page: https://pypi.python.org/pypi/pyfaidx
License: Other
Efficient pythonic random access to fasta subsequences
Home Page: https://pypi.python.org/pypi/pyfaidx
License: Other
"stats" right now just consists of the sequence names and lengths. It might be more useful to allow simple bed files, UCSC-style "chrom.sizes" files, and more advanced nucleotide (gc%, a/c/g/t, top kmer, etc...) formats to be specified like: faidx --stats bed
and so on.
I installed pyfaidx as indicated.
Typing
faidx
usage: faidx [-h] [-b BED] [--stats] [-c] [-r] [-n] [--split-files] [--lazy]
[--default-seq DEFAULT_SEQ] [-d DELIMITER]
[--mask-with-default-seq | --mask-by-case] [--version]
fasta [regions [regions ...]]
.....
hower, when I try to run something it always tells me that there is an unrecognized arguments in my comand. The same is true if I use the example files.
faidx --stats tests/data/genes.fasta
usage: faidx [-h] [-b BED] [--stats] [-c] [-r] [-n] [--split-files] [--lazy]
[--default-seq DEFAULT_SEQ] [-d DELIMITER]
[--mask-with-default-seq | --mask-by-case] [--version]
fasta [regions [regions ...]]
faidx: error: unrecognized arguments: tests/data/genes.fasta
I do not understand what I am missing here.
Specifically need to worry any time I do string replacement.
I should just print the invalid sequence and line number to assist users in correcting invalid input sequence raised in #47.
It would be great if the index were automatically recreated if the fasta file has a new time stamp than the .fai file. I just did this test as we had to rebuild a fasta we are using and reindexing did not occur.
thanks for considering!
One current design limitation of pyfaidx is that it mirrors the samtools indexing behavior of truncating headers after whitespace. There is a good reason for this - any whitespace in the identifier would break the index file. A side effect of this is that frequently the "description" in a header will be lost when reading into the file using the index. It seems like an option to recover the full header line would be useful, and pretty cheap to implement.
To determine the byte offset and length of the header from the index file, we can determine the byte end
of the preceding sequence by adding unprintable characters, and this should be the byte start
of the real header line. We can then read from header byte start
to sequence offset
and save this as something like Sequence.long_name
.
I am using fasta files representing the whole genome, with the only header lines being >chr1, >chr2 etc. and my indexes are huge, i.e. 37093012.
Will pyfaidx still perform well? If yes, this info should probably be added to the README.md.
Faidx.build_index gives different results than samtools
I'm sure it is me mucking up somehow, but when I use Pyfaidx and Bio.SeqIO I get different results...
reference_genome_fasta = "/home/endrebak/genomes/hg38/hg38.fa"
position = slice(68344571,68348572)
hg38 = Fasta(reference_genome_fasta)
gene_pyfaidx = hg38["chr15"][position].seq
gene_pyfaidx[:20] #"aaaaaaaaaaaattccctgt"
#but with Bio.SeqIO...
with open(reference_genome_fasta, "rU") as genome_file_handle:
chromosome_dictionary = SeqIO.to_dict(SeqIO.parse(genome_file_handle, "fasta"))
gene_bio_seqio = chromosome_dictionary["chr15"][position]
str(gene_bio_seqio[:20].seq) #'AGGGGGACTAGACTCGAAGA'
And when I try to look up the same sequence in UCSC, I get yet another sequence: https://genome-euro.ucsc.edu/cgi-bin/hgTracks?db=hg38&position=chr15:68346496-68346499&hgsid=198858039_mEdzdwARqPcSygOiGTzyEzcF87Ed
(only four first nucleotides shown ("ATGA") to get nuclotide resolution and added one to index since UCSC one-indexed).
This error is coming up in 0.3.4 and does not come up in 0.3.1
Traceback (most recent call last):
File "Step1.py", line 151, in
main()
File "Step1.py", line 123, in main
desired_sequence = -chromosome_infile[chromosome][start:stop] #if negative strand, give reverse complement in fasta file
File "/data/zappadata_p2/pschaugh/RepTags/venv/lib/python2.7/site-packages/pyfaidx/init.py", line 105, in neg
return self[::-1].complement
File "/data/zappadata_p2/pschaugh/RepTags/venv/lib/python2.7/site-packages/pyfaidx/init.py", line 140, in complement
comp = self.class(self.name, complement(self.seq),
File "/data/zappadata_p2/pschaugh/RepTags/venv/lib/python2.7/site-packages/pyfaidx/init.py", line 591, in complement
raise ValueError("Sequence contains non-DNA characters: \n {0}".format(''.join(wrap_sequence(70, seq))))
ValueError: Sequence contains non-DNA characters:
ttgtatggttttgaatggtctacgtagttttagacacgatcctcccatcaaatatcacatttacgaatac
tatttgttcttctttctagaatttaattgttagattgtaatatggaatttcttcatctttttccctcttg
tgtgatttgggtttaggtcatcttcctttctttattatttctgatctcatgtctgttgactttatttctt
atccttgtaccatctcattcaatcgttttggttcaagtcaagaaacttttttagtggttttaattacttg
ttaatcaagctgactcattctattttctcccttcagagttattgattttagtctttacattcacgtctgt
aatggtgactaaattgtctttattttactaatataattattctgttatcatacttataaacatacggttt
tttaacctgttggatctgttttacctatttaaggatctgtatatggtggatggttctgattnggtgtttc
tttatcttttatgcttgtctgtatattgatcgtttctctaacttagttactaggttttagacggttgttt
cttttagggtcctcgactaccgaagtaagcacttaagatggtttgtaaatttcttctcaattgtgattag
gaagagttttcaaaggttttttcaaattccattcccctgtgaaagtttgaataagaagttccggttctaa
tgagaaagtggtttcgttctgtttccatgatgttattttaattcgatgtctagttatagagactatttat
gactacatttttaggagttggtttataatcacttgtcttaattcactgtgtagtttttttaatatgtgta
ctggagcaccctaaataaagactttaagttcctatcaagttgtatacttttacttaattacattattgat
gtaattgtcttacttctttgtcaagggtacactagtagagttaactacctctttttataaactgttttag
gttgtggaaaggtactattttttgtangtggtttgatccttgtcttcctttgatggagttgtattaattc
cggtatatacctttcgggtaacgattgtagtatgaattactactttctaacttttgaaaagaagattcga
ttctgtgttcccttcttacggatgaaaatttaagtcgtatcattacctttagaactggtctatttaatcc
tttatttttcgtaattttaaccctttatttttcattttgctagggaaagtatctactacattaaagtata
catcttttggggtttctaaggtattctttgtactatcttgattattgttttaatcgtttgaatgtcttat
gagtcgtatgtctttaatcgacacaaagatatgttattgttacttgttggactttttctttaattctttc
gttaaagtaaatttcaccctggtttgttttattttttatctttgtttgacttggttcctccactttctac
acatgtgacttttgatactttataacgagtttctttaatttctgtgatgtttttatctgtaggaaacaag
tatctactcttctggattttaactttctacaccagagatctaatttacgttaaatagaattttaaggttg
tagttaaaaaaaaatgttttgatctttttagataggattgtaagtatatcttagagttccctgggattta
tcgtttttgttagaactttttcttgtgtcaacctccagagtgtgaagaatcaaagttttaaagattttta
gatgtcataatttttgtctcaccataactgcatatctgtctgtgtatctagtcactctatcttatctctc
gggtctttatttgagaacatatataccggaacactaaaaaatttttcaacgatcctgataggttatccct
tttcctgtcagaaaagttgtttacaatgacccttttgacctatcggtgtacgttttcttacttcaacatg
ggaatggaatgtggtatacgtctttaatttttatcagtttctgaatctatactctcaatcttatgagtgt
tcttttacatccccgtgtcgaagtaccgtaacgtaaaccgttcctaaagaaccgatgctgtggctttcgt
atccgttgtttttttagtctattaaccttgtacatttataatttttaaaaacacttagtttactgtgaca
gttgtctcattttccgttgtgtgacttaccgtcctttataaacatttaatatatagactattgcctaatg
ataggtatatatttcttgagaacgttaagttgttggttttttgttgtgttaattttttatctgtttcctg
aacttatatataaagaggttactatatgtttactggttattcatgtacttttctacgagttgttgtactt
attaatccttttacgtttagttttggtgtcactctatgatgtgttgatatctttgttataccaccaaaga
gtctaaagtgtactaggtcgttaatgtggagacccatatatggagtttcttaattttcgtctcggagttt
ctctacaaagatgtgggtacaggtgttgtcgtaataagtgttatcagttttctaccttcgttgggttcac
agacaactgtctacttacctatttgttttacaccgtataggtgttttaccttataataaattggaatttt
tcctttttatta
>AB821309.1
ATGGTCAGCTGGGGTCGTTTCATCTGCCTGGTCGTGGTCACCATGGCAACCTTGTCCCTGGCCCGGCCCT
CCTTCAGTTTAGTTGAGGATACCACATTAGAGCCAGAAGATGCCATCTCATCCGGAGATGATGAGA
CACCGATGGTGCGGAAGATTTTGTCAGTGAGAACAGTAACAACAAGAGAGCACCATACTGGACCAACACA
does not raise an error
I am finding that to get as_raw=True
to work correctly, I have to call it twice:
ipdb> fasta = Fasta(fasta_filename, as_raw = True)
ipdb> chrom = fasta.keys()[0]
ipdb> fasta[chrom][1:2]
>chrII:236683-237683(+):2-2
T
ipdb> fasta = Fasta(fasta_filename, as_raw = True)
ipdb> fasta[chrom][1:2]
u'T'
Note that the index for fasta_filename doesn't exist before the first call.
Is this expected behaviour? Does as_raw
only work on an existing index? i.e. the index is created at the first call, and then as_raw
is used?
It might be too harsh to exit faidx
when sequence names don't exist. Instead, warning the user that the sequence name doesn't exist and then moving on would be more useful. See: https://www.biostars.org/p/141241/#141306
The original module was developed for the use case of accessing DNA strings from a reference genome file. It should be easy to generalize this to protein and RNA strings as well.
.complement()
methodThis should be straightforward using PyVCF and pysam. The subclass could be called FastaVariant and return a consensus sequence with either homozygous variants or heterozygous variants or both included as substitutions. Skip over indels and complex variants. MNPs can be handled but the length of the return sequence should be checked.
I would like to replace pygr with pyfaidx.
Previously I have been using the human_g1k_v37_decoy.fasta
file, which is part of the GATK bundle (available through FTP) for getting sequences. I use the same file for alignments.
>>> from pyfaidx import Fasta
>>> genome = Fasta('human_g1k_v37_decoy.fasta')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/site-packages/pyfaidx/__init__.py", line 552, in __init__
filt_function=filt_function)
File "/usr/local/lib/python2.7/site-packages/pyfaidx/__init__.py", line 209, in __init__
self.read_fai(split_char)
File "/usr/local/lib/python2.7/site-packages/pyfaidx/__init__.py", line 244, in read_fai
raise ValueError('Duplicate key "%s"' % rname)
ValueError: Duplicate key "['2', 'dna:chromosome', 'chromosome:GRCh37:2:1:243199373:1']"
The human_g1k_v37_decoy.fasta.fai
file does not seem to contain that key twice.
I wrote the Fasta class to provide some pygr interface elements, and then mention that pyfaidx has a pygr-compatible interface. I'm not sure this is true, and should evaluate whether implementing more of the pygr interface would be worthwhile.
Using python 3.4.0 and whatever version of pyfaidx is currently grabbed by pip (you should add __version__
to the module)...
Simple access (and slicing) both end up trying to use floats where ints are required.
Traceback (most recent call last):
File "./Ngaps.py", line 207, in <module>
sys.exit(Main(argv=None))
File "./Ngaps.py", line 100, in Main
x = ref[0][100]
File "/usr/local/lib/python3.4/dist-packages/pyfaidx/__init__.py", line 460, in __getitem__
return self._fa.get_seq(self.name, n + 1, n + 1)
File "/usr/local/lib/python3.4/dist-packages/pyfaidx/__init__.py", line 561, in get_seq
return self.faidx.fetch(name, start, end)
File "/usr/local/lib/python3.4/dist-packages/pyfaidx/__init__.py", line 336, in fetch
self.fill_buffer(name, start, end + self.read_ahead)
File "/usr/local/lib/python3.4/dist-packages/pyfaidx/__init__.py", line 326, in fill_buffer
seq = self.from_file(name, start, end)
File "/usr/local/lib/python3.4/dist-packages/pyfaidx/__init__.py", line 380, in from_file
seq = self.file.read(seq_blen).decode()
TypeError: 'float' object cannot be interpreted as an integer
A quick fix (perhaps a hack) is to cast start
and end
to ints in from_file
:
start0 = int(start) - 1 # make coordinates [0,1)
and
seq_len = int(end) - start0
Sorry, no time at the moment to thoroughly test, much less fork your most current code and proved you with a proper diff.
PS: The title of this issue isn't quite correct... It is a call to read
where the error gets tripped, not an index.
PPS: Useful module! Thanks. Not super complicated, but thanks for doing it right and saving the rest of us from having to reinvent the wheel (or use SeqIO where it really isn't appropriate IMO).
When we can't install BioPython we should just skip those tests.
It looks like the read_ahead property is implemented with a bug that prevents proper caching.
I think I may have found a bug. I can work around it, but thought you would want to fix it. It is really weird.
fasta file: mouse NCBI 38: GRCm38_68.fa
Symptom: on one record (chr 4), the long_name is missing the leading chr number. All the others are OK.
Hope the listings below help. (and thanks VERY much for pyfaidx…)
-Al
Confirm that the record is correct in the file:
$ awk '/^>4/{print $0; exit;}' GRCm38_68.fa
4 dna:chromosome chromosome:GRCm38:4:1:156508116:1 REF
Open the file and see the top level:
from pyfaidx import Fasta
fa = Fasta('GRCm38_68.fa')
fa.keys()
['1', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '2', '3', '4', '5', '6', '7', '8', '9', 'MT', 'X', 'Y', 'JH584295.1', 'JH584292.1', 'GL456368.1', 'GL456396.1', 'GL456359.1', 'GL456382.1', 'GL456392.1', 'GL456394.1', 'GL456390.1', 'GL456387.1', 'GL456381.1', 'GL456370.1', 'GL456372.1', 'GL456389.1', 'GL456378.1', 'GL456360.1', 'GL456385.1', 'GL456383.1', 'GL456213.1', 'GL456239.1', 'GL456367.1', 'GL456366.1', 'GL456393.1', 'GL456216.1', 'GL456379.1', 'JH584304.1', 'GL456212.1', 'JH584302.1', 'JH584303.1', 'GL456210.1', 'GL456219.1', 'JH584300.1', 'JH584298.1', 'JH584294.1', 'GL456354.1', 'JH584296.1', 'JH584297.1', 'GL456221.1', 'JH584293.1', 'GL456350.1', 'GL456211.1', 'JH584301.1', 'GL456233.1', 'JH584299.1']
See that the long name is OK for chr 3:
print fa['3'].long_name
3 dna:chromosome chromosome:GRCm38:3:1:160039680:1 REF
See that chr 4 is not OK:
print fa['4'].long_name
dna:chromosome chromosome:GRCm38:4:1:156508116:1 REF
The next two listings are longish, so I describe them here: first a listing of all the names, then all the long names. The names are OK for all chrs. The long names are all OK except for 4.
for record in fa:
... print record.name
...
1
10
11
12
13
14
15
16
17
18
19
2
3
4
5
6
7
8
9
MT
X
Y
JH584295.1
JH584292.1
GL456368.1
GL456396.1
GL456359.1
GL456382.1
GL456392.1
GL456394.1
GL456390.1
GL456387.1
GL456381.1
GL456370.1
GL456372.1
GL456389.1
GL456378.1
GL456360.1
GL456385.1
GL456383.1
GL456213.1
GL456239.1
GL456367.1
GL456366.1
GL456393.1
GL456216.1
GL456379.1
JH584304.1
GL456212.1
JH584302.1
JH584303.1
GL456210.1
GL456219.1
JH584300.1
JH584298.1
JH584294.1
GL456354.1
JH584296.1
JH584297.1
GL456221.1
JH584293.1
GL456350.1
GL456211.1
JH584301.1
GL456233.1
JH584299.1
Now the long names
for record in fa:
... print record.long_name
...
1 dna:chromosome chromosome:GRCm38:1:1:195471971:1 REF
10 dna:chromosome chromosome:GRCm38:10:1:130694993:1 REF
11 dna:chromosome chromosome:GRCm38:11:1:122082543:1 REF
12 dna:chromosome chromosome:GRCm38:12:1:120129022:1 REF
13 dna:chromosome chromosome:GRCm38:13:1:120421639:1 REF
14 dna:chromosome chromosome:GRCm38:14:1:124902244:1 REF
15 dna:chromosome chromosome:GRCm38:15:1:104043685:1 REF
16 dna:chromosome chromosome:GRCm38:16:1:98207768:1 REF
17 dna:chromosome chromosome:GRCm38:17:1:94987271:1 REF
18 dna:chromosome chromosome:GRCm38:18:1:90702639:1 REF
19 dna:chromosome chromosome:GRCm38:19:1:61431566:1 REF
2 dna:chromosome chromosome:GRCm38:2:1:182113224:1 REF
3 dna:chromosome chromosome:GRCm38:3:1:160039680:1 REF
dna:chromosome chromosome:GRCm38:4:1:156508116:1 REF
5 dna:chromosome chromosome:GRCm38:5:1:151834684:1 REF
6 dna:chromosome chromosome:GRCm38:6:1:149736546:1 REF
7 dna:chromosome chromosome:GRCm38:7:1:145441459:1 REF
8 dna:chromosome chromosome:GRCm38:8:1:129401213:1 REF
9 dna:chromosome chromosome:GRCm38:9:1:124595110:1 REF
MT dna:chromosome chromosome:GRCm38:MT:1:16299:1 REF
X dna:chromosome chromosome:GRCm38:X:1:171031299:1 REF
Y dna:chromosome chromosome:GRCm38:Y:1:91744698:1 REF
JH584295.1 dna:scaffold scaffold:GRCm38:JH584295.1:1:1976:1 REF
JH584292.1 dna:scaffold scaffold:GRCm38:JH584292.1:1:14945:1 REF
GL456368.1 dna:scaffold scaffold:GRCm38:GL456368.1:1:20208:1 REF
GL456396.1 dna:scaffold scaffold:GRCm38:GL456396.1:1:21240:1 REF
L456359.1 dna:scaffold scaffold:GRCm38:GL456359.1:1:22974:1 REF
GL456382.1 dna:scaffold scaffold:GRCm38:GL456382.1:1:23158:1 REF
GL456392.1 dna:scaffold scaffold:GRCm38:GL456392.1:1:23629:1 REF
GL456394.1 dna:scaffold scaffold:GRCm38:GL456394.1:1:24323:1 REF
GL456390.1 dna:scaffold scaffold:GRCm38:GL456390.1:1:24668:1 REF
GL456387.1 dna:scaffold scaffold:GRCm38:GL456387.1:1:24685:1 REF
GL456381.1 dna:scaffold scaffold:GRCm38:GL456381.1:1:25871:1 REF
GL456370.1 dna:scaffold scaffold:GRCm38:GL456370.1:1:26764:1 REF
GL456372.1 dna:scaffold scaffold:GRCm38:GL456372.1:1:28664:1 REF
GL456389.1 dna:scaffold scaffold:GRCm38:GL456389.1:1:28772:1 REF
GL456378.1 dna:scaffold scaffold:GRCm38:GL456378.1:1:31602:1 REF
GL456360.1 dna:scaffold scaffold:GRCm38:GL456360.1:1:31704:1 REF
GL456385.1 dna:scaffold scaffold:GRCm38:GL456385.1:1:35240:1 REF
GL456383.1 dna:scaffold scaffold:GRCm38:GL456383.1:1:38659:1 REF
GL456213.1 dna:scaffold scaffold:GRCm38:GL456213.1:1:39340:1 REF
GL456239.1 dna:scaffold scaffold:GRCm38:GL456239.1:1:40056:1 REF
GL456367.1 dna:scaffold scaffold:GRCm38:GL456367.1:1:42057:1 REF
GL456366.1 dna:scaffold scaffold:GRCm38:GL456366.1:1:47073:1 REF
GL456393.1 dna:scaffold scaffold:GRCm38:GL456393.1:1:55711:1 REF
GL456216.1 dna:scaffold scaffold:GRCm38:GL456216.1:1:66673:1 REF
GL456379.1 dna:scaffold scaffold:GRCm38:GL456379.1:1:72385:1 REF
JH584304.1 dna:scaffold scaffold:GRCm38:JH584304.1:1:114452:1 REF
GL456212.1 dna:scaffold scaffold:GRCm38:GL456212.1:1:153618:1 REF
JH584302.1 dna:scaffold scaffold:GRCm38:JH584302.1:1:155838:1 REF
JH584303.1 dna:scaffold scaffold:GRCm38:JH584303.1:1:158099:1 REF
GL456210.1 dna:scaffold scaffold:GRCm38:GL456210.1:1:169725:1 REF
GL456219.1 dna:scaffold scaffold:GRCm38:GL456219.1:1:175968:1 REF
JH584300.1 dna:scaffold scaffold:GRCm38:JH584300.1:1:182347:1 REF
JH584298.1 dna:scaffold scaffold:GRCm38:JH584298.1:1:184189:1 REF
JH584294.1 dna:scaffold scaffold:GRCm38:JH584294.1:1:191905:1 REF
GL456354.1 dna:scaffold scaffold:GRCm38:GL456354.1:1:195993:1 REF
JH584296.1 dna:scaffold scaffold:GRCm38:JH584296.1:1:199368:1 REF
JH584297.1 dna:scaffold scaffold:GRCm38:JH584297.1:1:205776:1 REF
GL456221.1 dna:scaffold scaffold:GRCm38:GL456221.1:1:206961:1 REF
JH584293.1 dna:scaffold scaffold:GRCm38:JH584293.1:1:207968:1 REF
GL456350.1 dna:scaffold scaffold:GRCm38:GL456350.1:1:227966:1 REF
GL456211.1 dna:scaffold scaffold:GRCm38:GL456211.1:1:241735:1 REF
JH584301.1 dna:scaffold scaffold:GRCm38:JH584301.1:1:259875:1 REF
GL456233.1 dna:scaffold scaffold:GRCm38:GL456233.1:1:336933:1 REF
JH584299.1 dna:scaffold scaffold:GRCm38:JH584299.1:1:953012:1 REF
>>> gfasta = pyfaidx.Fasta('....fa', as_raw=True)
>>> gfasta['chr1'][0:0]
Returns the whole chromosome
>>> s='ABCD'
>>> s[0:0]
Returns an empty string
IMHO, pyfaidx should return an empty string.
The manuscript is something... maybe @brentp, @azalea, and @swheelan can give me some comments? Feel free to edit as you see fit - the document is version controlled.
@jsilter Has a use case for opening the entire NCBI non redundant nucleotide database, which is currently problematic. The index for this database is 24,312,914 entries and 1.2G on disk. The in-memory index is far larger than this, and so it's possible that we might not be able to store an index this large in memory with the current data structures. However, there are some simple ways to reduce memory footprint already:
raw_index
attribute. I'm not sure anyone was using this, so it should not be an issue.raw_index
list)Sometimes it would be useful to do something like:
for line in FastaRecord:
do something with the line
Currently FastaRecord's __getitem__
returns the entire contents of a FASTA entry as a string. Maybe a __iter__
method could return smaller chunks to limit memory consumption. Might be able to use the read-ahead buffer from #26.
Is this possible? Seems like it isn't based on the cl args shown in the README.md
Sometimes (masking regions) it is convenient to modify a fasta file in place if possible. We can read the number of line breaks to be inserted, and for entries of the same size as the original fetched region we can modify the fasta file in place.
The spec for htslib faidx mentions optional comments (;
) being handled properly, so even though this is an uncommon case we should handle it as well.
In certain cases (http://www.biostars.org/p/133245/) matching headers by regular expressions would be very helpful. This could be added as an argument to the Faidx class, or as a feature only in the faidx script.
See example below:
>>> from pyfaidx import Fasta >>> ref = Fasta('my.fasta') >>> ref.keys() ['myseq'] >>> ref['myseq'][:5].upper() Traceback (most recent call last): File "", line 1, in AttributeError: 'Sequence' object has no attribute 'upper' >>> ref['myseq'][:5].split('T') Traceback (most recent call last): File "", line 1, in AttributeError: 'Sequence' object has no attribute 'split'
To fix this:
The easy way is to change this line in Faidx.fetch():
return Sequence(name=rname, start=int(start + 1),
end=int(end), seq=seq.replace('\n', ''))
to
return seq.replace('\n', '')
i.e. return a pure string, rather than a Sequence object.
The hard way is to reimplement string methods in Sequence object, like Bio.Seq.Seq.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.