Comments (2)
Hey, I've had a go at this and wrote this function, let me know if it looks good and if it would need any additional functionality, I'm not sure of all the use cases for the radius of gyration (I just have amateur knowledge of bioinformatics). For one, the function only works properly for proteins, as it doesn't have masses for other atoms, and it adds a default mass of 0 if the atom is not in atomic_masses
. More masses could be added of course, but I'm not sure if anyone would need the rgyr for smaller molecules. I would also suggest that we call the function gyradius
as it is a synonym for radius of gyration and would be more descriptive.
@staticmethod
def rgyr(df):
"""Compute the Radius of Gyration of a molecule
Parameters
----------
df : pandas.DataFrame
DataFrame with HETATM, ATOM, and/or ANISOU entries.
Returns
---------
rg : float
Radius of Gyration of df in Angstrom
"""
# could be made as a class variable if it will be needed elsewhere
atomic_masses = {"C": 12.0107, "O": 15.9994, "N": 14.0067, "S": 32.065}
coords = df[["x_coord", "y_coord", "z_coord"]].to_numpy()
masses = np.array([atomic_masses.get(atom, 0) for atom in df["element_symbol"]])
total_mass = masses.sum()
center_of_mass = (masses[:, None] * coords).sum(axis=0) / total_mass
distances = np.linalg.norm(coords - center_of_mass, axis=1)
rg = np.sqrt((distances**2 * masses).sum() / total_mass)
return round(rg, 4)
I also wrote some tests for it, I don't know if they're appropriate (bear in mind that I'm a beginner in general), so please suggest more or different tests if needed. In the rgyr
function I used numpy to improve performance with vectorized calculations, but in test_accuracy
I used a more basic and straightforward implementation, in case any logic is changed in rgyr
, but I'm not sure if the test is even necessary, maybe test_pdb_df
is enough. I should also note that I looked at pymol
's rgyrate
function to see their implementation, so it is similar but with clearer variable names
# BioPandas
# License: BSD 3 clause
# Project Website: http://rasbt.github.io/biopandas/
# Code Repository: https://github.com/rasbt/biopandas
from pandas_pdb import PandasPdb
import os
import pandas as pd
TESTDATA_1t48 = os.path.join(os.path.dirname(__file__), "data", "1t48_995.pdb")
p1t48 = PandasPdb()
p1t48.read_pdb(TESTDATA_1t48)
def test_accuracy():
# Create test DataFrame with 3 atoms
test_df = pd.DataFrame({"element_symbol": ["C", "O", "N", "S"],
"x_coord": [1.0, 2.0, 3.0, 4.0],
"y_coord": [5.0, 6.0, 7.0, 8.0],
"z_coord": [9.0, 10.0, 11.0, 12.0]})
coords = test_df[["x_coord", "y_coord", "z_coord"]].to_numpy()
masses = [12.0107, 15.9994, 14.0067, 32.065]
total_mass = sum(masses)
weighted_coords = [(m*x, m*y, m*z) for (x, y, z), m in zip(coords, masses)]
weighted_deviation = sum(m * (x**2 + y**2 + z**2) for (x, y, z), m in zip(coords, masses))
mean_weighted_coords = [sum(coords) / total_mass for coords in zip(*weighted_coords)]
mean_weighted_deviation = sum(coord**2 for coord in mean_weighted_coords)
# rounding needs to be changed to pass test if final rounding in rgyr will be changed
expected_rg = round((weighted_deviation / total_mass - mean_weighted_deviation)**0.5, 4)
rg = PandasPdb.rgyr(test_df)
assert rg == expected_rg, f"Expected {expected_rg}, got {rg} instead"
def test_pdb_df():
rg = PandasPdb.rgyr(p1t48.df['ATOM'])
expected_rg = 18.1508
assert rg == expected_rg, f"Expected {expected_rg}, got {rg} instead"
And I could also write a small tutorial for the usage
from biopandas.
Awesome! Happy to review if you make a PR.
Small nit: you can replace masses = np.array([atomic_masses.get(atom, 0) for atom in df["element_symbol"]])
with pandas .map()
.
from biopandas.
Related Issues (20)
- Remove walrus operator for py3.7 support
- Docs tutorials & API reference out of date HOT 1
- Stream support for exporting pdbs HOT 6
- Biopandas PDB output formatting leads to a ton of segments when reading with MDAnalysis: reason and my quick fix HOT 1
- Using SIFTS data for renumbering residues to match the Uniprot sequence resids HOT 1
- [Refactor] Reduce code duplication by using abstract base classes
- 0.5.0dev Release HOT 1
- Unable to export PDB DFs with model_id columns
- SettingWithCopyWarning
- read_pdb_from_list() not containing newlines
- Saving new pdb from dataframe HOT 2
- Fetching PDB and saving it to a PDB file shifts one space element type due to b-factor 3 digits
- read_mol2 fails on normal-looking file HOT 2
- [RFC] Enabling Cython-based PDB parser backend for speed improvements HOT 4
- Stream support for exporting pdbs not working with OTHERS record HOT 1
- Adding possibility for a collection of structures HOT 1
- Test Scripts and Test Data Packaged with Main Library in PyPI Release
- Implement CIF and mmCIF format support for large biological assemblies HOT 2
- Get Carbon Method in PandasPDB HOT 1
- mmCIF -> PDB conversion method HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from biopandas.