Coder Social home page Coder Social logo

Comments (2)

goniochromatic avatar goniochromatic commented on August 19, 2024

Hey, I've had a go at this and wrote this function, let me know if it looks good and if it would need any additional functionality, I'm not sure of all the use cases for the radius of gyration (I just have amateur knowledge of bioinformatics). For one, the function only works properly for proteins, as it doesn't have masses for other atoms, and it adds a default mass of 0 if the atom is not in atomic_masses. More masses could be added of course, but I'm not sure if anyone would need the rgyr for smaller molecules. I would also suggest that we call the function gyradius as it is a synonym for radius of gyration and would be more descriptive.

@staticmethod
def rgyr(df):
    """Compute the Radius of Gyration of a molecule

    Parameters
        ----------
        df : pandas.DataFrame
            DataFrame with HETATM, ATOM, and/or ANISOU entries.

        Returns
        ---------
        rg : float
            Radius of Gyration of df in Angstrom

        """
    # could be made as a class variable if it will be needed elsewhere
    atomic_masses = {"C": 12.0107, "O": 15.9994, "N": 14.0067, "S": 32.065}

    coords = df[["x_coord", "y_coord", "z_coord"]].to_numpy()
    masses = np.array([atomic_masses.get(atom, 0) for atom in df["element_symbol"]])
    total_mass = masses.sum()
    center_of_mass = (masses[:, None] * coords).sum(axis=0) / total_mass
    distances = np.linalg.norm(coords - center_of_mass, axis=1)
    rg = np.sqrt((distances**2 * masses).sum() / total_mass)
    return round(rg, 4)

I also wrote some tests for it, I don't know if they're appropriate (bear in mind that I'm a beginner in general), so please suggest more or different tests if needed. In the rgyr function I used numpy to improve performance with vectorized calculations, but in test_accuracy I used a more basic and straightforward implementation, in case any logic is changed in rgyr, but I'm not sure if the test is even necessary, maybe test_pdb_df is enough. I should also note that I looked at pymol's rgyrate function to see their implementation, so it is similar but with clearer variable names

# BioPandas
# License: BSD 3 clause
# Project Website: http://rasbt.github.io/biopandas/
# Code Repository: https://github.com/rasbt/biopandas

from pandas_pdb import PandasPdb
import os
import pandas as pd


TESTDATA_1t48 = os.path.join(os.path.dirname(__file__), "data", "1t48_995.pdb")

p1t48 = PandasPdb()
p1t48.read_pdb(TESTDATA_1t48)


def test_accuracy():
    # Create test DataFrame with 3 atoms
    test_df = pd.DataFrame({"element_symbol": ["C", "O", "N", "S"],
                            "x_coord": [1.0, 2.0, 3.0, 4.0],
                            "y_coord": [5.0, 6.0, 7.0, 8.0],
                            "z_coord": [9.0, 10.0, 11.0, 12.0]})

    coords = test_df[["x_coord", "y_coord", "z_coord"]].to_numpy()
    masses = [12.0107, 15.9994, 14.0067, 32.065]
    total_mass = sum(masses)

    weighted_coords = [(m*x, m*y, m*z) for (x, y, z), m in zip(coords, masses)]
    weighted_deviation = sum(m * (x**2 + y**2 + z**2) for (x, y, z), m in zip(coords, masses))
    mean_weighted_coords = [sum(coords) / total_mass for coords in zip(*weighted_coords)]
    mean_weighted_deviation = sum(coord**2 for coord in mean_weighted_coords)

    # rounding needs to be changed to pass test if final rounding in rgyr will be changed
    expected_rg = round((weighted_deviation / total_mass - mean_weighted_deviation)**0.5, 4)
    rg = PandasPdb.rgyr(test_df)
    assert rg == expected_rg, f"Expected {expected_rg}, got {rg} instead"


def test_pdb_df():
    rg = PandasPdb.rgyr(p1t48.df['ATOM'])
    expected_rg = 18.1508
    assert rg == expected_rg, f"Expected {expected_rg}, got {rg} instead"

And I could also write a small tutorial for the usage

from biopandas.

a-r-j avatar a-r-j commented on August 19, 2024

Awesome! Happy to review if you make a PR.

Small nit: you can replace masses = np.array([atomic_masses.get(atom, 0) for atom in df["element_symbol"]]) with pandas .map().

from biopandas.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.