Bug, feature request, or proposal:
Potential Bug
Expected Behaviour:
Order of StructUnits is the same as input
Current behavior:
For some polymer sequences, StructUnits are reordered, leading to inconsistencies in polymer structures.
To reproduce:
Here is an example of a function which is able to reproduce the issue:
def build_polymer(smiles1, smiles2, sequence):
a = rdkit.MolFromSmiles(smiles1)
rdkit.AddHs(a)
rdkit.AllChem.EmbedMolecule(a, rdkit.AllChem.ETKDG())
A = stk.StructUnit2.rdkit_init(a, 'bromine')
b = rdkit.MolFromSmiles(smiles2)
rdkit.AddHs(b)
rdkit.AllChem.EmbedMolecule(b, rdkit.AllChem.ETKDG())
B = stk.StructUnit2.rdkit_init(b, 'bromine')
polymer = stk.Polymer([A, B], stk.Linear(sequence, [0]*12, n=1))
stk.rdkit_ETKDG(polymer)
polymer.write('test.mol')
return polymer
Calling the function like so:
build_polymer('Brc1ccc(Br)s1', 'Brc1ccc(Br)cc1', "ABABBBBBABBB")
returns:
Polymer(building_blocks=["StructUnit2 ['bromine', 'InChI=1S/C4H2Br2S/c5-3-1-2-4(6)7-3/h1-2H']", "StructUnit2 ['bromine', 'InChI=1S/C6H4Br2/c7-5-1-2-6(8)4-3-5/h1-4H']"], topology=Linear(ends='h', n=1, orientation=[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], repeating_unit='ABABBBBBABBB'))
while calling the function with a different sequence like so:
build_polymer('Brc1ccc(Br)s1', 'Brc1ccc(Br)cc1', "AAABBBBBBBBB")
leads to an inversion of the StructUnits, returning:
Polymer(building_blocks=["StructUnit2 ['bromine', 'InChI=1S/C6H4Br2/c7-5-1-2-6(8)4-3-5/h1-4H']", "StructUnit2 ['bromine', 'InChI=1S/C4H2Br2S/c5-3-1-2-4(6)7-3/h1-2H']"], topology=Linear(ends='h', n=1, orientation=[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], repeating_unit='AAABBBBBBBBB'))
This results in monomer 'A' and monomer 'B' being inverted when the polymer structure is written. I have not been able to figure out what causes the reordering, though here are some other sequences that result in reordering:
AABBBBABBBBB
AABBABBBBBBB
AABBBBABBBBB