tjlane / thor Goto Github PK

View Code? Open in Web Editor NEW

3.0 3.0 3.0 8.3 MB

support code for x-ray scattering experiments

License: GNU General Public License v2.0

Python 66.88% C++ 3.69% Shell 0.02% Cuda 1.70% Jupyter Notebook 27.71%

python simulation xray-scattering-experiments

thor's People

Contributors

Stargazers

Watchers

Forkers

shenglan0407 apeck12 canbuoy

thor's Issues

optimize CPUScatter/GPUScatter

Many things to do:

make these share more code
optionally compute amplitudes
interface with replicate in structure.py to perform concentrated simulations
parallelize cpuscatter w/OMP
possibly switch to matrix based rotation (see #6)
if we go to a parallel rotation method, we might be able to do single molecules on the GPU
optimize code generally

individual masks for different shots

@dermen says this helps!

need to think about it but may be related to #4

lookup table is a general approach that includes one mask for each shot, different masks for all shots, and everything in between

misc extension fails to build with newer v of cython

not sure why, this only affects the misc extension

src/misc/misc_wrap.cpp: In function ‘int __Pyx_GetException(PyObject**, PyObject**, PyObject**)’:
src/misc/misc_wrap.cpp:18132:24: error: ‘PyThreadState’ has no member named ‘exc_type’
     tmp_type = tstate->exc_type;
                        ^
src/misc/misc_wrap.cpp:18133:25: error: ‘PyThreadState’ has no member named ‘exc_value’
     tmp_value = tstate->exc_value;
                         ^
src/misc/misc_wrap.cpp:18134:22: error: ‘PyThreadState’ has no member named ‘exc_traceback’
     tmp_tb = tstate->exc_traceback;
                      ^
src/misc/misc_wrap.cpp:18135:13: error: ‘PyThreadState’ has no member named ‘exc_type’
     tstate->exc_type = local_type;
             ^
src/misc/misc_wrap.cpp:18136:13: error: ‘PyThreadState’ has no member named ‘exc_value’
     tstate->exc_value = local_value;
             ^
src/misc/misc_wrap.cpp:18137:13: error: ‘PyThreadState’ has no member named ‘exc_traceback’
     tstate->exc_traceback = local_tb;
             ^
src/misc/misc_wrap.cpp: In function ‘void __Pyx_ExceptionSave(PyObject**, PyObject**, PyObject**)’:
src/misc/misc_wrap.cpp:19326:21: error: ‘PyThreadState’ has no member named ‘exc_type’
     *type = tstate->exc_type;
                     ^
src/misc/misc_wrap.cpp:19327:22: error: ‘PyThreadState’ has no member named ‘exc_value’
     *value = tstate->exc_value;
                      ^
src/misc/misc_wrap.cpp:19328:19: error: ‘PyThreadState’ has no member named ‘exc_traceback’
     *tb = tstate->exc_traceback;
                   ^
src/misc/misc_wrap.cpp: In function ‘void __Pyx_ExceptionReset(PyObject*, PyObject*, PyObject*)’:
src/misc/misc_wrap.cpp:19340:24: error: ‘PyThreadState’ has no member named ‘exc_type’
     tmp_type = tstate->exc_type;
                        ^
src/misc/misc_wrap.cpp:19341:25: error: ‘PyThreadState’ has no member named ‘exc_value’
     tmp_value = tstate->exc_value;
                         ^
src/misc/misc_wrap.cpp:19342:22: error: ‘PyThreadState’ has no member named ‘exc_traceback’
     tmp_tb = tstate->exc_traceback;
                      ^
src/misc/misc_wrap.cpp:19343:13: error: ‘PyThreadState’ has no member named ‘exc_type’
     tstate->exc_type = type;
             ^
src/misc/misc_wrap.cpp:19344:13: error: ‘PyThreadState’ has no member named ‘exc_value’
     tstate->exc_value = value;
             ^
src/misc/misc_wrap.cpp:19345:13: error: ‘PyThreadState’ has no member named ‘exc_traceback’
     tstate->exc_traceback = tb;

add a reduce_rings method

@dermen I agree this function is a good idea. I got rid of it in the memory branch temporarily, but I think we should add it back in with a few tweaks. Curious what you think about the following:

Have it automatically deal with the mask (should be possible)
Have an option to discard shots (by index) as well as phi values
Adding a unit test

Original Code:

    def reduce_rings ( self, factor ):
        """ reduce self.num_phi by factor; WARNING: removes the mask; must make a new one """
        if self.num_phi % factor != 0:
            raise NotImplementedError( 'As of now, *factor* must be a multiple of *self.num_phi*' )
        rp = np.copy( self.polar_intensities)
        new_rp = np.zeros(( rp.shape[0], rp.shape[1],rp.shape[2] / factor ) )
        # convolve algor. taken from stack overflow
        for q in self.q_values:
            for i in xrange( rp.shape[0] ) :
                d = rp[ i,self.q_index(q),:]
                d = np.convolve( d + [d[-1]], [1./factor]*factor,mode='valid' )[::factor]
                new_rp[i,self.q_index(q),:] = d

        return Rings( self.q_values, new_rp, self.k )

Citing from the api docs:
"The key power/functionality of Shotset is that it provides a layer of
abstraction over the intensity data on disk. Specifically, it provides many
powerful capabilities for analyzing data sets that are larger than can
fit in memory."
However, when I want to create 20000 shots via simulate() then I get:

[...]
File "/usr/local/lib/python2.7/dist-packages/thor/xray.py", line 1728, in simulate
ValueError: array is too big.

With 5000 shots it works without complains...

So the promise does not stand?

setup.py nvcc locate fails

A lot of cuda setup guides suggest the following for cuda env vars:

CUDA_ROOT="/usr/local/cuda-5.0:/usr/local/cuda-5.0/lib64"

so we should also check for this var in addition to CUDA_HOME..

Further, the code in setup.py

if 'CUDA_HOME' in os.environ:
        home = os.environ['CUDA_HOME']
        nvcc = pjoin(home, 'bin', 'nvcc')

will fail in the case where (e.g.) CUDA_HOME="/usr/local/cuda-5.0:/usr/local/cuda-5.0/lib64" , for instance on my comp:

In [5]: from os.path import join as pjoin

In [6]: home = os.environ['CUDA_HOME']

In [7]: nvcc = pjoin(home,'bin','nvcc')

In [8]: nvcc
Out[8]: '/usr/local/cuda-5.0:/usr/local/cuda-5.0/lib64/bin/nvcc'

Hence, nvcc is nonsensical. We should split home by ":" and then append pjoin to each split item.. Something like

if 'CUDA_HOME' in os.environ:
        home = os.environ['CUDA_HOME'].split(':')
        nvcc = map( lambda x: pjoin(x,'bin','nvcc'), home )

and then treat nvcc like a list thereafter, testing os.path.exists on each item...

scatter should be able to ignore H's

hydrogens aren't worth out time :)

Quaternion code needs clarification

The C++ (and also the CUDA) code that generates quaternions from three floats needs comments. Could anybody comment on the role of the parameters r1, r2, r3 in
https://github.com/tjlane/thor/blob/master/src/scatter/_cpuscatter.cpp#L22
please?

What is the meaning of these three numbers in the context of rotations? Are these Euler angles? Or how can I control the rotation that the resulting quaternion represents?

Thanks a lot!

setup.py not working on Ubuntu 14.10

Here are the changes I had to do in order to make it work:

139a140,144
>             cudaconfig_list=[{'home'   : home[x], 
>                               'nvcc'   : nvcc[x],
>                               'include': pjoin(home[x], 'include'),
>                               'lib64'  : pjoin(home[x], 'lib64')} \
>                              for x in xrange(len(home)) ]
148a154,158
>             cudaconfig_list=[{'home'   : home, 
>                               'nvcc'   : nvcc,
>                               'include': pjoin(home, 'include'),
>                               'lib64'  : pjoin(home, 'lib')} ]
>                               # 'lib64'  : pjoin(home, 'lib64')} ]
151,155d160
<         cudaconfig_list=[{'home'   : home[x], 
<                           'nvcc'   : nvcc[x],
<                           'include': pjoin(home[x], 'include'),
<                           'lib64'  : pjoin(home[x], 'lib64')} \
<                           for x in xrange(len(home)) ]
323c328,329
<         import pyfftw

---
>         # import pyfftw
>         import fftw3

Shotset add method not implemented

https://github.com/tjlane/thor/blob/master/src/python/xray.py#L3528

This should also probably be consistent with Rings, which doesn't have an add method, but does have an append method (!). I think having __add__ simply be an interface to append, and keeping both, would probably work. If people have comments let me know.

5-parameter gaussian formfactor

Good idea to extend the possibilities for form factor models. E.g. those from

Maslen et al (1992) Intl tables for crystallography

new pytables issues deprecation warning

/reg/neh/home2/tjlane/pytj/lib/python2.7/site-packages/thor/xray.py:1794: DeprecationWarning: createEArray() is pending deprecation, use create_earray() instead. You may use the pt2to3 tool to update your source code.

shot-by-shot energy and/or masks

Here's what needs to happen. There are two options.

OPTION 1: Python parallel
(1) Remove xray.Shotset._implicit_interpolation inner loop over shots -- just does one shot at a time
(2) Parallelize xray.Shotset._implicit_interpolation at that level

Benefits: easy to code
Downsides: will use a lot of memory
Challenges: need to parallelize a class method robustly

OPTION 2: OMP parallel
(1) If many energies are passed, compute detector intersections inside loop over shot intensities (https://github.com/tjlane/odin/blob/master/src/python/xray/xray.py#L1342)
(2) Parallize scipy.ndimage.map_coordiantes w/OMP and build it in odin
(3) Benchmark the detector intersection computation, and if rate-limiting, parallelize

Benefits: higher-performance final product
Downsides: more coding
Challenges: getting scipy code we call parallel/wrapped

solvent inclusion in scattering simulation

After attending this IUCr conference, receiving feedback from the community, and also just thinking about the future and moving towards biomolecules, I think it will quickly become critical to include solvent in the scattering simulations. How close are we to being able to do this ?

I am sure there are several techniques for doing this, and how to be most efficient about it is still unknown to me. We could start by creating a somewhat small layer of water (or whatever solvent) surrounding the particle and equilibrating, and then using that in the simulation. I say small shell because long range correlations between the particle and the solvent are less likely to exist and corrupt the signal.

Anyway, I guess this could be done separately from thor

xray.Rings.correlate_intra "num_shots" arg broken

Mine keeps failing, have tried to debug and its VERY strange

zero-intensity mask in correlations

@dermen @CoChrists

Gundolf pointed out that in the old c implementation of the correlation function, zeros were used as a mask proxy. Is this OK? Should we change it?

https://github.com/tjlane/thor/blob/master/src/python/xray.py#L3395
https://github.com/tjlane/thor/blob/master/src/corr/corr.cpp#L49

We should at least document this.

@CoChrists are there other places in the code where this happens that I missed?

MPI for scattering calculations

Previous we've used ghetto scripts, but it would be nice to have an MPI framework for running large scattering simulation calculations.

incompatible with latest mdtraj

In [1]: import thor
WARNING:xsdimage:lxml library is probably not part of your python installation: disabling xsdimage format
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-1-0103ce0c7928> in <module>()
----> 1 import thor

/Users/mender/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/thor/__init__.py in <module>()
      7 
      8 # dump a few things into the THOR namespace
----> 9 from xray import *
     10 
     11 # list all the files included in THOR

/Users/mender/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/thor/xray.py in <module>()
     31 
     32 from mdtraj import io
---> 33 from mdtraj.utils.arrays import ensure_type
     34 
     35 # ------------------------------------------------------------------------------

ImportError: No module named arrays

Why do we have two random rotation methods?

@dermen is one better? We should choose the best and keep that one. I guess even if best is just fastest. Having two is confusing!

https://github.com/tjlane/thor/blob/master/src/python/structure.py#L190
https://github.com/tjlane/thor/blob/master/src/python/structure.py#L227

rings append behavior confusing

@dermen has reported that the automatic writing to disk this method can be confusing.

At the very least there should be a warning and a kwarg option to avoid modifying your data on disk. Maybe there should be a prompt to save the user.

correctness of function: sph_hrm_coefficients()

I am wondering if Thor's scatter.py should actually read

306:                    Plm = special.lpmv(l, m, sph_quad_900[:,2])

The documentation for scipy.special.lpmv() says:
1st parameter is Order
2nd parameter is Degree
3rd parameter is Argument
see: http://docs.scipy.org/doc/scipy-dev/reference/generated/scipy.special.lpmv.html#scipy.special.lpmv

However, if that's true I get this warning:
C_l coefficient has non-zero imaginary component (15314813.269857) -- this is theoretically forbidden and usually a sign something went wrong numerically

Comments?

comer mann electron dens

It seems that the scatter.atomic_electrondens had a small typo:

phi[inds] = cromermann[i] * \

should be

phi[inds] += cromermann[i] * \

Other than that there was a discrepancy from the formula I found via quick wolfram, and the one used.. how does this look:

click here for wolfram alpha computation

I tried loading the referenced paper in the function documentation, but could not find it (minimal effort used)

I made a little test script :

import numpy as np
import pylab as plt
from thor import scatter
def single_atom_dens(atomic_num, grid_size=5., grid_spacing=0.05):
    """
    atomic_num, atomic number of element 
    grid_size, size of cubic density grid in Angstrom
    grid_spacing, spacing of cubic grid in Angstrom
    """
    Ngrid_pt = int( float(grid_size)/grid_spacing)

    vals = np.arange( -grid_size/2, grid_size/2, grid_spacing)
    
    x,y,z = np.meshgrid( vals,vals,vals, sparse=True)
    R = np.sqrt( x**2 + y**2 + z**2 )
    form_fact = scatter.atomic_electrondens( atomic_num, R.ravel() )

    return form_fact.reshape( (Ngrid_pt, Ngrid_pt, Ngrid_pt))


# get the density
some_dens = {n:single_atom_dens(n) for n in range( 1,26)}


# plot
fig,axs = plt.subplots( 5,5, figsize=(10,10))
atom = 1
for i in range (5):
    for j in range(5):
        axs[i][j].clear()
        axs[i][j].imshow( sum(some_dens[atom],0), vmax=50)
        axs[i][j].set_title( "Z=%d"%atom, fontsize=15)
        axs[i][j].axis('off')
        atom += 1

correlate_intra test not working well

needs investigation, skipping for now

compare to bornagain and see what code organization makes sense

@dermen
@rkirian

When I have some free mental cycles, I will document what I find here regarding the code that is in Thor and in bornagain, and suggest a few plans we can consider. Feel free to jump in and comment, though no obligation to do so!

From Derek:
http://prickly-pythons.github.io/python_code/pricklies_speedops.html

add difference based inter correlation

@dermen