Is it possible to parallelize emcee with MPI? From what I understand, both threading a

emcee and MPI about emcee HOT 12 CLOSED

dfm commented on May 26, 2024

emcee and MPI

from emcee.

Comments (12)

dstndstn commented on May 26, 2024

Possible, yes, but at present I don't think anyone is planning to do it. (But if you're interested in trying we'd be happy to help you get started!)
--dstn

from emcee.

dfm commented on May 26, 2024

It's definitely an interesting prospect and as Dustin says, it would be possible. That being said, it's not obvious that it would be a good idea with the current algorithm. The way that it works is that each step is made in parallel (i.e. ~Nwalkers/2 likelihood evaluations are made in parallel) but after each step, the code has to synchronize and this is a limitation of the algorithm. If the likelihood calls were extremely expensive then it might help to run on multiple nodes but it's not obvious when this gain would outweigh the extra cost of network speed, etc. That being said, if there is a problem where this would be useful, it is already possible for a use of emcee to implement this themselves. You can provide you own pool to the sampler which should just be an object with a .map() method with a calling sequence like the built in map or multiprocessing.Pool.map. For someone familiar with using MPI and Python, it shouldn't be too hard to homebrew your own MPI mapping capability. Please keep us posted if you wanted to give this a try because I'd be pretty stoked to merge it!

from emcee.

marius311 commented on May 26, 2024

Thanks for the quick responses. I have a little bit of experience with mpi4py, and I think you're right that it should be fairly easy to parallelize. I'm going to give it a try, and keep you updated.

from emcee.

marius311 commented on May 26, 2024

Wow that really was simple. Here's the example from the main page done with MPI. This assumes that this script gets called with, for example, mpiexec -n 10 test.py where the contents of test.py are:

import emcee as emcee
import numpy as np
from collections import namedtuple
from mpi4py import MPI
import itertools

#==============
#Stuff for MPI
#==============
def mpi_map(function,sequence):
    """
    A map function parallelized with MPI. 
    Assumes this program was called with mpiexec -n $NUM.
    Partitions the sequence into $NUM blocks and each MPI process does the rank-th one.
    """
    comm = MPI.COMM_WORLD
    (rank,size) = (comm.Get_rank(),comm.Get_size())
    sequence = mpi_consistent(sequence)
    return flatten(comm.allgather(map(function, partition(sequence,size)[rank])))

def mpi_consistent(value):
    """
    Returns the value that the root process provided.
    """
    return MPI.COMM_WORLD.bcast(value)

def flatten(l):
    """Returns a list of lists joined into one"""
    return list(itertools.chain(*l))

def partition(list, n):
    """Partition list into n nearly equal length sublists"""
    division = len(list) / float(n)
    return [list[int(round(division * i)): int(round(division * (i + 1)))] for i in range(n)]
#==========



def lnprob(x, ivar):
    return -0.5*np.sum(ivar * x**2)


ndim, nwalkers = 10, 100

#Note the mpi_consistent, otherwise each process would start in a different point
ivar = mpi_consistent(1./np.random.rand(ndim))
p0 = mpi_consistent([np.random.rand(ndim) for i in xrange(nwalkers)])

#Our 'pool' is just an object with a 'map' method which points to mpi_map
pool = namedtuple('pool',['map'])(mpi_map)

sampler = emcee.EnsembleSampler(nwalkers, ndim, lnprob, args=[ivar],pool=pool)
sampler.run_mcmc(p0, 1000)

if MPI.COMM_WORLD.Get_rank()==0:
    try:
        import matplotlib.pyplot as pl
    except ImportError:
        print "Try installing matplotlib to generate some sweet plots..."
    else:
        pl.plot(sampler.flatchain[:,0])
        pl.show()

A slightly more useful version would use dynamic process management, although since functions can't be picked there's the problem of communicating the likelihood function between processes. Here we get away with never pickling it.

I don't have the time to incorporate this into the project, but feel free to steal the code/idea from above!

from emcee.

mfouesneau commented on May 26, 2024

Hey could that be a way to avoid the pickling issues?

from emcee.

dstndstn commented on May 26, 2024

Hi Morgan,

I don't really understand what your pickling issue is. What exactly is
the problem?

multiprocessing needs the function and its arguments, and the results, to
be picklable. Sometimes this requires writing custom getstate and
setstate functions.

cheers,
dstn

On Thu, 3 May 2012, mfouesneau wrote:

Hey could that be a way to avoid the pickling issues?

Reply to this email directly or view it on GitHub:
#14 (comment)

from emcee.

mfouesneau commented on May 26, 2024

Most of the time when I define my lnp function it is not picklable. Why? Most of the time it is because my function depends on data that are not fixed and are declared later.

I thought defining a callable object would be a good bypass, but this does not work for me every time.
writing custom getstate and setstate functions is the only thing I did not tried yet.

The good thing with MPI is that you do not need to send the objects anymore so you avoid picklable issues. I am currently not sure this works correctly though.

from emcee.

dstndstn commented on May 26, 2024

Hi Morgin,

If your situation is:

def lnprob(data, params):
return ...

def main():
data = load_data_from_wherever()
callable = ...
sampler = emcee.EnsembleSampler(nw, ndim, callable, pool=...)

What about doing:

class Callable(object):
def init(self, data):
self.data = data
def call(self, params):
return lnprob(self.data, params)

and in main:

def main():
data = load_data_from_wherever()
callable = Callable(data)
sampler = emcee.EnsembleSampler(nw, ndim, callable, pool=...)

It sounds like you are saying that you tried this and it still doesn't
work -- so I must be misunderstanding something about your situation.

cheers,
dustin

On Fri, 11 May 2012, mfouesneau wrote:

Most of the time when I define my lnp function it is not picklable. Why? Most of the time it is because my function depends on data that are not fixed and are declared later.

I thought defining a callable object would be a good bypass, but this does not work for me every time.
writing custom getstate and setstate functions is the only thing I did not tried yet.

The good thing with MPI is that you do not need to send the objects anymore so you avoid picklable issues. I am currently not sure this works correctly though.

Reply to this email directly or view it on GitHub:
#14 (comment)

from emcee.

mfouesneau commented on May 26, 2024

Hi Dustin,

Yes this does not work is you create callable outside main (level 0). This is one of the limitations of picklable stuffs as explained on the official website.

If you want it to work, you want to do
callable(data)
def main():
... sampler = emcee.EnsembleSampler(nw, ndim, callable, pool=...)

from emcee.

dstndstn commented on May 26, 2024

Hi Morgan,

I use that approach all the time and it works fine -- see attached
example.

What documentation makes you think it won't work?

And could you please send a stripped-down test case that fails.

Note that THIS does not work:

def main():
data = load_data()
def local_func(params):
return lnprob(data, params)

sampler = emcee.EnsembleSampler(nw, ndim, local_func, pool=...)

cheers,
dustin

On Fri, 11 May 2012, mfouesneau wrote:

Hi Dustin,

Yes this does not work is you create callable outside main (level 0). This is one of the limitations of picklable stuffs as explained on the official website.

If you want it to work, you want to do
callable(data)
def main():
... sampler = emcee.EnsembleSampler(nw, ndim, callable, pool=...)

Reply to this email directly or view it on GitHub:
#14 (comment)

import numpy as np
import emcee
import multiprocessing

class Callable(object):
def init(self, data):
self.data = data
def call(self, params):
return lnprob(self.data, params)

def lnprob(data, params):
return -np.sum((data - params)**2)

def main():
data = np.random.normal(size=100)
print 'data', data.shape
cable = Callable(data)
p0 = np.array([1.])
stdev = 0.001
nw = 100
pool = multiprocessing.Pool(8)
sampler = emcee.EnsembleSampler(nw, len(p0), cable, pool=pool)
params = np.vstack([p0 + stdev * np.random.normal(size=len(p0))
for i in range(nw)])
allp = []
lnp,state = None,None
for step in xrange(101):
params,lnp,state = sampler.run_mcmc(params, 1, rstate0=state, lnprob0=lnp)
allp.append(params.copy().T)
allp = np.vstack(allp)
print 'allp', allp.shape

import matplotlib
matplotlib.use('Agg')
import pylab as plt

plt.clf()
plt.plot(allp, 'r-', alpha=0.5)
plt.savefig('allp.png')

if name == 'main':
main()

from emcee.

mfouesneau commented on May 26, 2024

Right, sorry I discovered my data object (catalog table) is not always picklable.

I see that you initiate all the walkers at the same position (+random).
I remember working with Hogg, we also did so, but I do not remember why it is better than uniform in the parameter space.

from emcee.

dstndstn commented on May 26, 2024

Unless your parameter space has many modes and you accidentally start in
the wrong one, I think it will be much better to initialize at a
reasonable place, rather than randomly in the parameter space! If your
parameter space is very simple and smooth then it shouldn't matter, but in
strongly structured parameter spaces the sampler may never find the
high-probability area if you scatter the walkers randomly.

Good to hear you're making progress on your pickling issue.

--dstn

On Wed, 16 May 2012, mfouesneau wrote:

Right, sorry I discovered my data object (catalog table) is not always picklable.

I see that you initiate all the walkers at the same position (+random).
I remember working with Hogg, we also did so, but I do not remember why it is better than uniform in the parameter space.

Reply to this email directly or view it on GitHub:
#14 (comment)

from emcee.

emcee and MPI about emcee HOT 12 CLOSED

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent