Comments (12)
Possible, yes, but at present I don't think anyone is planning to do it. (But if you're interested in trying we'd be happy to help you get started!)
--dstn
from emcee.
It's definitely an interesting prospect and as Dustin says, it would be possible. That being said, it's not obvious that it would be a good idea with the current algorithm. The way that it works is that each step is made in parallel (i.e. ~Nwalkers/2 likelihood evaluations are made in parallel) but after each step, the code has to synchronize and this is a limitation of the algorithm. If the likelihood calls were extremely expensive then it might help to run on multiple nodes but it's not obvious when this gain would outweigh the extra cost of network speed, etc. That being said, if there is a problem where this would be useful, it is already possible for a use of emcee to implement this themselves. You can provide you own pool
to the sampler which should just be an object with a .map()
method with a calling sequence like the built in map
or multiprocessing.Pool.map
. For someone familiar with using MPI and Python, it shouldn't be too hard to homebrew your own MPI mapping capability. Please keep us posted if you wanted to give this a try because I'd be pretty stoked to merge it!
from emcee.
Thanks for the quick responses. I have a little bit of experience with mpi4py, and I think you're right that it should be fairly easy to parallelize. I'm going to give it a try, and keep you updated.
from emcee.
Wow that really was simple. Here's the example from the main page done with MPI. This assumes that this script gets called with, for example, mpiexec -n 10 test.py
where the contents of test.py are:
import emcee as emcee
import numpy as np
from collections import namedtuple
from mpi4py import MPI
import itertools
#==============
#Stuff for MPI
#==============
def mpi_map(function,sequence):
"""
A map function parallelized with MPI.
Assumes this program was called with mpiexec -n $NUM.
Partitions the sequence into $NUM blocks and each MPI process does the rank-th one.
"""
comm = MPI.COMM_WORLD
(rank,size) = (comm.Get_rank(),comm.Get_size())
sequence = mpi_consistent(sequence)
return flatten(comm.allgather(map(function, partition(sequence,size)[rank])))
def mpi_consistent(value):
"""
Returns the value that the root process provided.
"""
return MPI.COMM_WORLD.bcast(value)
def flatten(l):
"""Returns a list of lists joined into one"""
return list(itertools.chain(*l))
def partition(list, n):
"""Partition list into n nearly equal length sublists"""
division = len(list) / float(n)
return [list[int(round(division * i)): int(round(division * (i + 1)))] for i in range(n)]
#==========
def lnprob(x, ivar):
return -0.5*np.sum(ivar * x**2)
ndim, nwalkers = 10, 100
#Note the mpi_consistent, otherwise each process would start in a different point
ivar = mpi_consistent(1./np.random.rand(ndim))
p0 = mpi_consistent([np.random.rand(ndim) for i in xrange(nwalkers)])
#Our 'pool' is just an object with a 'map' method which points to mpi_map
pool = namedtuple('pool',['map'])(mpi_map)
sampler = emcee.EnsembleSampler(nwalkers, ndim, lnprob, args=[ivar],pool=pool)
sampler.run_mcmc(p0, 1000)
if MPI.COMM_WORLD.Get_rank()==0:
try:
import matplotlib.pyplot as pl
except ImportError:
print "Try installing matplotlib to generate some sweet plots..."
else:
pl.plot(sampler.flatchain[:,0])
pl.show()
A slightly more useful version would use dynamic process management, although since functions can't be picked there's the problem of communicating the likelihood function between processes. Here we get away with never pickling it.
I don't have the time to incorporate this into the project, but feel free to steal the code/idea from above!
from emcee.
Hey could that be a way to avoid the pickling issues?
from emcee.
Hi Morgan,
I don't really understand what your pickling issue is. What exactly is
the problem?
multiprocessing needs the function and its arguments, and the results, to
be picklable. Sometimes this requires writing custom getstate and
setstate functions.
cheers,
dstn
On Thu, 3 May 2012, mfouesneau wrote:
Hey could that be a way to avoid the pickling issues?
Reply to this email directly or view it on GitHub:
#14 (comment)
from emcee.
Most of the time when I define my lnp function it is not picklable. Why? Most of the time it is because my function depends on data that are not fixed and are declared later.
I thought defining a callable object would be a good bypass, but this does not work for me every time.
writing custom getstate and setstate functions is the only thing I did not tried yet.
The good thing with MPI is that you do not need to send the objects anymore so you avoid picklable issues. I am currently not sure this works correctly though.
from emcee.
Hi Morgin,
If your situation is:
def lnprob(data, params):
return ...
def main():
data = load_data_from_wherever()
callable = ...
sampler = emcee.EnsembleSampler(nw, ndim, callable, pool=...)
What about doing:
class Callable(object):
def init(self, data):
self.data = data
def call(self, params):
return lnprob(self.data, params)
and in main:
def main():
data = load_data_from_wherever()
callable = Callable(data)
sampler = emcee.EnsembleSampler(nw, ndim, callable, pool=...)
It sounds like you are saying that you tried this and it still doesn't
work -- so I must be misunderstanding something about your situation.
cheers,
dustin
On Fri, 11 May 2012, mfouesneau wrote:
Most of the time when I define my lnp function it is not picklable. Why? Most of the time it is because my function depends on data that are not fixed and are declared later.
I thought defining a callable object would be a good bypass, but this does not work for me every time.
writing custom getstate and setstate functions is the only thing I did not tried yet.The good thing with MPI is that you do not need to send the objects anymore so you avoid picklable issues. I am currently not sure this works correctly though.
Reply to this email directly or view it on GitHub:
#14 (comment)
from emcee.
Hi Dustin,
Yes this does not work is you create callable outside main (level 0). This is one of the limitations of picklable stuffs as explained on the official website.
If you want it to work, you want to do
callable(data)
def main():
... sampler = emcee.EnsembleSampler(nw, ndim, callable, pool=...)
from emcee.
Hi Morgan,
I use that approach all the time and it works fine -- see attached
example.
What documentation makes you think it won't work?
And could you please send a stripped-down test case that fails.
Note that THIS does not work:
def main():
data = load_data()
def local_func(params):
return lnprob(data, params)
sampler = emcee.EnsembleSampler(nw, ndim, local_func, pool=...)
cheers,
dustin
On Fri, 11 May 2012, mfouesneau wrote:
Hi Dustin,
Yes this does not work is you create callable outside main (level 0). This is one of the limitations of picklable stuffs as explained on the official website.
If you want it to work, you want to do
callable(data)
def main():
... sampler = emcee.EnsembleSampler(nw, ndim, callable, pool=...)
Reply to this email directly or view it on GitHub:
#14 (comment)import numpy as np
import emcee
import multiprocessing
class Callable(object):
def init(self, data):
self.data = data
def call(self, params):
return lnprob(self.data, params)
def lnprob(data, params):
return -np.sum((data - params)**2)
def main():
data = np.random.normal(size=100)
print 'data', data.shape
cable = Callable(data)
p0 = np.array([1.])
stdev = 0.001
nw = 100
pool = multiprocessing.Pool(8)
sampler = emcee.EnsembleSampler(nw, len(p0), cable, pool=pool)
params = np.vstack([p0 + stdev * np.random.normal(size=len(p0))
for i in range(nw)])
allp = []
lnp,state = None,None
for step in xrange(101):
params,lnp,state = sampler.run_mcmc(params, 1, rstate0=state, lnprob0=lnp)
allp.append(params.copy().T)
allp = np.vstack(allp)
print 'allp', allp.shape
import matplotlib
matplotlib.use('Agg')
import pylab as plt
plt.clf()
plt.plot(allp, 'r-', alpha=0.5)
plt.savefig('allp.png')
if name == 'main':
main()
from emcee.
Right, sorry I discovered my data object (catalog table) is not always picklable.
I see that you initiate all the walkers at the same position (+random).
I remember working with Hogg, we also did so, but I do not remember why it is better than uniform in the parameter space.
from emcee.
Unless your parameter space has many modes and you accidentally start in
the wrong one, I think it will be much better to initialize at a
reasonable place, rather than randomly in the parameter space! If your
parameter space is very simple and smooth then it shouldn't matter, but in
strongly structured parameter spaces the sampler may never find the
high-probability area if you scatter the walkers randomly.
Good to hear you're making progress on your pickling issue.
--dstn
On Wed, 16 May 2012, mfouesneau wrote:
Right, sorry I discovered my data object (catalog table) is not always picklable.
I see that you initiate all the walkers at the same position (+random).
I remember working with Hogg, we also did so, but I do not remember why it is better than uniform in the parameter space.
Reply to this email directly or view it on GitHub:
#14 (comment)
from emcee.
Related Issues (20)
- Tests for longdouble behavior have started failing
- 3.1.3: pytest warnings HOT 1
- 3.1.3: sphinx warnings `reference target not found` HOT 12
- How to import log uniform prior in emcee?
- Issue on class emcee.moves.GaussianMove(cov, mode='vector', factor=None)
- Add v3 JOSS citation info to README and docs
- Parallelization of emcee is working not as fast as expected HOT 4
- My samplers do not converge to maximum likelihood
- HDF backend file size
- Multiprocessing Pool scaling not as expected HOT 4
- [3.1.3] test failure with scipy 1.10 HOT 3
- Progress bar HOT 1
- Investigate the use of optree for structured parameter sets
- Issue on page /index.html HOT 1
- Add set_description to `pbar._NoOpPBar` HOT 1
- In emcee.EnsembleSampler threads command is not working
- Problem using 4-byte floats with EnsembleSampler HOT 4
- JAX implementation of emcee HOT 3
- issue with emcee installation HOT 3
- (solved) issue with sampler stalling with multiprocessing
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from emcee.