capsylar / luke-rnawalker Goto Github PK
View Code? Open in Web Editor NEWRNA diff and patching tool
RNA diff and patching tool
The code is taking into consideration previous deletes , inserts which it should not.
The patch sequence describes operations as if the previous inserts, deletes and updates weren't present. for ex seq: ACUGA
with update script <delete,1>
<delete,2>
would
result in AGA
but if we take into account previous ops => ACUGA
after executing <delete,1>
would be => AUGA
and after <delete,2>
=> AUA
which is wrong.
an easy solution would be based on the observation that the patching script is in increasing order so operations on a character that comes earlier in the sequence are listed before ops. on characters that come later in the sequence.
use a global index i
which is set to 0
in the beginning and that gets incremented with inserts and decremented with deletes. And for every update,delete and insert we do the operation at the (source index + i)
( I think )
These input files result in the errors described below:
Exception in thread "main" java.lang.StringIndexOutOfBoundsException: begin 6, end 5, length 5
at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3734)
at java.base/java.lang.String.substring(String.java:1903)
at java.base/java.lang.String.substring(String.java:1876)
at patching.main(patching.java:80)
Process finished with exit code 1
Source Sequence
<?xml version= "1.0"?>
<RNADataBank>
<RNA>
<accession>AB000263</accession>
<description>Homo sapiens mRNA for prepro cortistatin like peptide, complete cds.</description>
<length>207</length>
<sequence>
ACGCCGUUU
</sequence>
</RNA>
<DNA></DNA>
</RNADataBank>
Destination Sequence
<?xml version= "1.0"?>
<RNADataBank>
<RNA>
<accession>AB000263</accession>
<description>Homo sapiens mRNA for prepro cortistatin like peptide, complete cds.</description>
<length>207</length>
<sequence>
AUUU
</sequence>
</RNA>
<DNA></DNA>
</RNADataBank>
Sequence difference script
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE Diff SYSTEM "DiffScriptDefinition.dtd">
<Diff>
<meta>
<SourceString>c68e2e283162b8b8379a77740aaa8e20</SourceString>
<DestinationString>882711e4374e36ab999a950ac5dae669</DestinationString>
</meta>
<EditScript>
<Delete>
<index>1</index>
</Delete>
<Delete>
<index>2</index>
</Delete>
<Delete>
<index>3</index>
</Delete>
<Delete>
<index>4</index>
</Delete>
<Delete>
<index>5</index>
</Delete>
</EditScript>
</Diff>
patching produces wrong sequence
expected sequence : GAGUGGGUGGUGGUGGU
produced sequence : GUGGUGGUGGUGGGUGA
<?xml version= "1.0"?>
<RNADataBank>
<RNA>
<accession>AB000263</accession>
<description>Homo sapiens mRNA for prepro cortistatin like peptide, complete cds.</description>
<length>207</length>
<sequence>
A
</sequence>
</RNA>
<DNA></DNA>
</RNADataBank>
<?xml version= "1.0"?>
<RNADataBank>
<RNA>
<accession>AB000263</accession>
<description>Homo sapiens mRNA for prepro cortistatin like peptide, complete cds.</description>
<length>207</length>
<sequence>
GAGUGGGUGGUGGUGGU
</sequence>
</RNA>
<DNA></DNA>
</RNADataBank>
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<Diff>
<meta>
<SourceString>7fc56270e7a70fa81a5935b72eacbe29</SourceString>
<DestinationString>4cdcd78f0cde50ac95bdf0a4c2b3aea7</DestinationString>
</meta>
<EditScript>
<Insert>
<getIndex>0</getIndex>
<dropIndex>0</dropIndex>
</Insert>
<Insert>
<getIndex>2</getIndex>
<dropIndex>1</dropIndex>
</Insert>
<Insert>
<getIndex>3</getIndex>
<dropIndex>1</dropIndex>
</Insert>
<Insert>
<getIndex>4</getIndex>
<dropIndex>1</dropIndex>
</Insert>
<Insert>
<getIndex>5</getIndex>
<dropIndex>1</dropIndex>
</Insert>
<Insert>
<getIndex>6</getIndex>
<dropIndex>1</dropIndex>
</Insert>
<Insert>
<getIndex>7</getIndex>
<dropIndex>1</dropIndex>
</Insert>
<Insert>
<getIndex>8</getIndex>
<dropIndex>1</dropIndex>
</Insert>
<Insert>
<getIndex>9</getIndex>
<dropIndex>1</dropIndex>
</Insert>
<Insert>
<getIndex>10</getIndex>
<dropIndex>1</dropIndex>
</Insert>
<Insert>
<getIndex>11</getIndex>
<dropIndex>1</dropIndex>
</Insert>
<Insert>
<getIndex>12</getIndex>
<dropIndex>1</dropIndex>
</Insert>
<Insert>
<getIndex>13</getIndex>
<dropIndex>1</dropIndex>
</Insert>
<Insert>
<getIndex>14</getIndex>
<dropIndex>1</dropIndex>
</Insert>
<Insert>
<getIndex>15</getIndex>
<dropIndex>1</dropIndex>
</Insert>
<Insert>
<getIndex>16</getIndex>
<dropIndex>1</dropIndex>
</Insert>
</EditScript>
</Diff>
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.