kyleaoman / martini Goto Github PK
View Code? Open in Web Editor NEWMock spatially resolved spectral line observations of simulated galaxies.
Home Page: https://martini.readthedocs.io/
License: GNU General Public License v3.0
Mock spatially resolved spectral line observations of simulated galaxies.
Home Page: https://martini.readthedocs.io/
License: GNU General Public License v3.0
it seems your ipynb notebook was accidentally stored in html format.
One pyOpenSci review point is:
Python 3.7 is no longer supported.
Additional context
pyOpenSci/software-submission#164
A requirement for the PyOpenSci review is a link to the main package documentation in the README.
Additional context
pyOpenSci/software-submission#164
Currently the total mass in the source is recorded in the log, and then the fraction of that mass that is inserted in the cube. This often leads to low fractions simply because much of the mass is outside of the datacube aperture. Should additionally record the mass in the source after it is pruned to the desired aperture, and also log fractions of this mass. It's useful to flag for the user both (i) when much of their source is outside of the aperture (which they might not have expected), and (ii) when significant amounts of mass that are in aperture don't end up in the cube (probably indicating that something has gone wrong in the approximations used in the SPH kernels or similar).
From a simulation point of view, velocity is the fundamental quantity that we convert to frequency. Realistic radio detectors, however, have frequency as the fundamental quantity, e.g. channels are evenly spaced in frequency. Martini supports switching back and forth between frequency and velocity channels for a DataCube object, but current implementation is that channels are initialised with an even velocity spacing, so that frequency spacing is irregular. Instead, should initialise with a constant spacing in frequency and convert to velocity. Should accept either a velocity width or a frequency width. If a velocity width is provided, it can be converted to a frequency spacing at the velocity of the central channel.
The pruning mechanism of source modules and the padding mechanism of datacubes mean that they can be put into a state where they should not be passed to martini. For example, once a mock is made with a source (it has been pruned in the process), applying a rotation and trying to produce a second mock of the same source should be invalid because it has lost its particles. Either the object should be flagged as no longer valid and martini modified to reject the source, or pruned particles should be kept aside (or masked) within the object and restored after use. Some treatment of padding of datacubes along the same lines would also be a good idea.
Implement unit tests? Hosting sample input data is the main hurdle. Consider using the jupyter notebooks to fill this role, I think they can be used in automated testing, but the sample data remains the hurdle.
I seem to have forgotten to document the optional spec_dtype
parameter when I implemented it, that should be added in.
If a source undergoes a mix of rotations and translations saving the "current rotation" with SPHSource.save_current_rotation("cache.txt")
and then applying that result to the source before transformations with `SPHSource.rotate(rotmat=np.loadtxt("cache.txt")) will (perhaps unintuitively) not necessarily give back the transformed source.
Options to remedy this include:
save_current_rotation
if a translation has been applied.A helper function that makes histograms of particle smoothing lengths in pixel units and is annotated with kernel validity intervals would be useful. This should be hinted in error messages for kernel validation failure.
Make sure they all work, are consistent, use current best practice.
...and make sure tests are passing.
The evaluation of the particle spectra is the second most expensive calculation in martini and could potentially be done in parallel. The parallelization of the core source insertion loop could serve as a rough template. Currently everything happens in this one line:
raw_spectra = self.spectral_function(
(
np.tile(
channel_edges.to_value(channel_edges_unit)[:-1], vmids.shape + (1,)
)
* channel_edges.unit
).astype(self.spec_dtype),
(
np.tile(
channel_edges.to_value(channel_edges_unit)[1:], vmids.shape + (1,)
)
* channel_edges.unit
).astype(self.spec_dtype),
(
np.tile(
vmids.to_value(vmids_unit),
np.shape(channel_edges[:-1]) + (1,) * vmids.ndim,
).T
* vmids.unit
).astype(self.spec_dtype),
).to_value(U.dimensionless_unscaled)
Would just need to split up vmids into chunks, evaluate the same thing in parallel, and assemble the result. The overhead could plausibly makes the code slower for small to moderate numbers of particles, but could speed up particle-heavy use cases.
Currently as soon as the source is initialised the particles are rotated and translated according to any provided rotation and according to the (RA, Dec, D). Allowing a source to be manipulated after it is initialised (e.g. when consecutively making multiple mocks of one source from different viewing angles) could make some workflows more efficient. This feature is also important for simulation-specific source modules where the user probably wants to inspect the obtained particles and orient them before making a mock, without having to separately extract the corresponding particles from the simulation and derive a rotation. The (RA, Dec, D) rotations and translations could be applied at the last minute by the Martini module when it obtains the particles from the source.
Add log messages with input HI mass and some diagnostics along the way: HI mass that survives pruning, that is inserted in cube, that is in final cube, flux of final cube.
Narrative docs pages still missing for beam, datacube, noise.
Describe the bug
h5py
.To Reproduce
$ conda create -n martini python=3.7
$ conda activate martini
$ pip install pytest
$ pip install -e .
$ pytest
...
============================================== short test summary info ==============================================
ERROR tests/test_martini.py
ERROR tests/test_write.py
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 2 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
================================================= 2 errors in 0.11s =================================================
$ pip install h5py
$ pytest
...
============================================== short test summary info ==============================================
FAILED tests/test_martini.py::TestMartini::test_preview - ModuleNotFoundError: No module named 'matplotlib'
FAILED tests/test_martini.py::TestParallel::test_parallel_consistent_with_serial[True] - ModuleNotFoundError: No module named 'multiprocess'
FAILED tests/test_martini.py::TestParallel::test_parallel_consistent_with_serial[False] - ModuleNotFoundError: No module named 'multiprocess'
FAILED tests/test_martini.py::TestGlobalProfile::test_preview - ModuleNotFoundError: No module named 'matplotlib'
FAILED tests/test_martini.py::TestGlobalProfile::test_view_spectrum - ModuleNotFoundError: No module named 'matplotlib'
FAILED tests/test_sources.py::TestSPHSource::test_preview - ModuleNotFoundError: No module named 'matplotlib'
FAILED tests/test_sources.py::TestSPHSource::test_preview_save[pdf] - ModuleNotFoundError: No module named 'matplotlib'
FAILED tests/test_sources.py::TestSPHSource::test_preview_save[png] - ModuleNotFoundError: No module named 'matplotlib'
FAILED tests/test_spectral_models.py::TestParallelSpectra::test_parallel_spectra[GaussianSpectrum] - ModuleNotFoundError: No module named 'multiprocess'
FAILED tests/test_spectral_models.py::TestParallelSpectra::test_parallel_spectra[DiracDeltaSpectrum] - ModuleNotFoundError: No module named 'multiprocess'
==================================== 10 failed, 887 passed, 7 xfailed in 57.08s =====================================
Expected behaviour
Tests should pass or be skipped if optional dependencies are not installed.
Additional context
Review for pyOpenSci/software-submission#164.
The README file provides a link https://github.com/kyleaoman/martini/tree/2.0.X which does not exist.
Additional context
pyOpenSci/software-submission#164
It would be good to extend the preview
functionality to Martini, too. This can simply use the existing preview function from SPHSource
but draw a box or truncate the axes according to the datacube aperture and bandwidth. This could be done in the "observed" frame (RA, Dec, ...) but then a separate plotter is needed (or some extensive modification of the existing one). A simpler option is to let Martini
call SPHSource.preview
, grab the figure instance (maybe want to return the axes, too), query the datacube and source to work out the conversions (arcsec->kpc and so on), then draw the approximate cube target region in "physical" coordinates.
Describe the bug
The link to get an eagle account http://icc.dur.ac.uk/Eagle/database.php link (in examples/martini_eagle.ipynb
) fails to load and just hangs.
Expected behaviour
Getting an account should be possible via the link.
Additional context
pyOpenSci/software-submission#164
Last thing to do before v2.0 release!
Type hints for everything!
It's probably reasonable to expect a user to be able to hold (1) the particle data of the source and (2) the datacube in memory simultaneously as a minimum memory requirement. As currently implemented, however, martini is more memory-hungry than this because it pre-computes and stores the spectra of all particles simultaneously, requiring nchannels * nparticles floating point values, which for some setups could be a very large number. Pre-computing the spectra is done for speed, but it would be possible to cut down on memory by segmenting the source into a series of chunks of ~equal numbers of particles. The simplest implementation would be to run the source insertion loop once for each such chunk, which is a significant hit in terms of cpu time cost. Giving some flexibility to keep memory requirements under control would be good though. Could explore other options like optionally using a memmap to store the spectra on disk rather than in memory. Costs a bit of access speed but maybe better than running the main loop more than once.
Astropy provides a beam_angular_area equivalency. Use this instead of our own implementation (currently in martini.beams.GaussianBeam.arcsec_to_beam
). Instead each beam class can define its angular area in steradians for use with the astropy helper.
Some example code showing how to inspect and manipulate source particles before making the data cube would be a good addition to the docs. This could be a good starting point (the plot_yz
function could be a good addition to the source module code as a "preview" function):
import numpy as np
import unyt as u
import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages
from astropy import units as U
from swiftgalaxy import SWIFTGalaxy, Velociraptor
from velociraptor import load as load_catalogue
from os import path
from martini.sources import SWIFTGalaxySource
from martini.sources._L_align import L_align
np.seterr(divide="ignore", invalid="ignore")
snapnum = 23
base_dir = "106e3_104b2_norm_0p3_new_cooling_L006N188"
velociraptor_filebase = path.join(base_dir, f"halo_{snapnum:04d}")
snapshot_filename = path.join(base_dir, f"colibre_{snapnum:04d}.hdf5")
catalogue = load_catalogue(f"{velociraptor_filebase}.properties")
# select centrals with vmax > 80km/s
target_mask = np.logical_and.reduce(
(
catalogue.ids.hosthaloid == -1,
catalogue.velocities.vmax > 80 * u.km / u.s,
)
)
target_halo_indices = np.argwhere(target_mask).flatten()
# pick the first one from the list, why not...
target_halo_index = target_halo_indices[0]
sg = SWIFTGalaxy(
snapshot_filename=snapshot_filename,
halo_finder=Velociraptor(
velociraptor_filebase=velociraptor_filebase,
halo_index=target_halo_index,
),
)
source = SWIFTGalaxySource(
sg,
distance=3 * U.Mpc,
vpeculiar=0 * U.km / U.s,
rotation=dict(rotmat=np.eye(3)), # identity matrix, we'll rotate "by hand"
ra=0 * U.deg,
dec=0 * U.deg,
)
# source.coordinates_g is an astropy CartesianRepresentation object.
# It has an attached CartesianDifferential object for the velocities.
# See https://docs.astropy.org/en/stable/coordinates/representations.html
def plot_yz(source, pdffile, prune=np.s_[::25], title=""):
# prune: by default plot every 25th particle.
fig = plt.figure(1)
plt.clf()
sp = fig.add_subplot(1, 1, 1, aspect="equal")
scatter = sp.scatter(
source.coordinates_g.y[prune].to_value(U.kpc),
source.coordinates_g.z[prune].to_value(U.kpc),
c=source.coordinates_g.differentials["s"].d_x[prune].to_value(U.km / U.s),
marker="o",
s=1,
vmin=-150,
vmax=150,
cmap="RdBu",
)
sp.set_xlabel(r"$y\,[\mathrm{kpc}]$")
sp.set_ylabel(r"$z\,[\mathrm{kpc}]$")
sp.set_xlim((-30, 30))
sp.set_ylim((-30, 30))
sp.set_title(title)
cb = fig.colorbar(mappable=scatter)
cb.set_label(r"$v_x\,[\mathrm{km}\,\mathrm{s}^{-1}]$")
plt.savefig(pdffile, format="pdf")
with PdfPages("forKatharina.pdf") as pdffile:
plot_yz(source, pdffile, title="unrotated")
# This is the same as you would get using
# source.rotate(L_align=(0 * U.deg, 0 * U.deg, 270 * U.deg))
# but you can adapt as needed:
rotmat = L_align(
source.coordinates_g.xyz,
source.coordinates_g.differentials["s"].d_xyz,
source.mHI_g,
frac=0.3,
Laxis="x",
)
source.rotate(rotmat=rotmat)
plot_yz(source, pdffile, title="face-on")
# rotation about the galactic pole:
source.rotate(axis_angle=("x", 45 * U.deg))
plot_yz(source, pdffile, title="face-on, polar rotation 45deg")
# inclination:
source.rotate(axis_angle=("y", 60 * U.deg))
plot_yz(source, pdffile, title="inclined 60deg")
# SWIFTGalaxy has already re-centred (x,y,z)=(0,0,0) on the velociraptor
# centre of potential, and the (vx,vy,vz)=(0,0,0) on the velociraptor
# subhalo bulk velocity. The centre of potential is already what we want,
# but the bulk velocity probably does not result in the central gas
# particles having 0 net velocity, so we can adjust like this:
r = np.sqrt(np.sum(np.power(source.coordinates_g.xyz.to_value(U.kpc), 2), axis=0))
rsort = np.argsort(r)
# we'll reject particles with small HI fractions with this:
HI_mask = (source.mHI_g / source.mHI_g.max()).to_value(U.one) > 0.3
# LoS velocities:
vxyz = source.coordinates_g.differentials["s"].d_xyz
# mean velocity of 100 innermost HI-rich gas particles:
vxyz0 = np.mean(vxyz[:, rsort][:, HI_mask[rsort]][:, :100], axis=1)
# In principle we only need to adjust the LoS velocity, but if we adjust
# the 3D velocity the correction will be ok for any orientation.
source.boost(-vxyz0)
plot_yz(source, pdffile, title="inclined 60deg, with velocity recentred")
Astropy has a logging framework. Instead of ad-hoc print statement, should adopt already available tools.
The recommended best practice is to use something like AdaptiveKernel((CubicSplineKernel, DiracDeltaKernel, GaussianKernel))
, so why not just implement directly? Conceptually: CubicSplineKernel = _AdaptiveKernel((_CubicSplineKernel, _DiracDeltaKernel, _GaussianKernel))
. This would remove some technical burden from users.
Sometimes you just want a spectrum. Simplified output formats for this case, and simplified/optimised source insertion. Need just a single "pixel" and to calculate the convolution of the beam and sph kernel for each particle. This could be via a new "dataspectrum" mirroring datacube module and new "insert_source_in_spectrum" function mirroring insert_source_in_cube.
Inserting more than one source into a DataCube seems like a reasonable workflow to support. Need to consider best implementation. Some options:
Might be good to support inserting a source into a cube that is read in from file. This implies initialising a datacube from file, which is easy enough. However if the cube is already beam-convolved (which seems the most likely case) then we need a separate cube internally until convolution is done.
Given the above maybe it makes sense to not load the file into a DataCube but use it to guide its creation (match channel and pixel size, could take a subregion as an argument to specify a target insertion location?). Some functions to mostly automate the process plus some cookbook examples should about do it.
Current the RMS of noise to be generated is the raw RMS. Once this is smoothed by the beam, the RMS is lower. However, usually the user knows the RMS desired in the final mock observation (i.e. after convolution with the beam). It's straightforward to calculate the raw RMS required to produce a desired RMS after convolution with a Gaussian smoothing kernel, should therefore accept the desired RMS as the parameter instead of the raw RMS.
At ADASS recently the CARTA visualization tool is using a new IDIA standard (?) scheme to efficiently store large cubes, but serves just as well for normal cubes. It looks of course much like your HDF5 files, but if we're going to write cubes in hdf5 instead of fits, might as well use some kind of standard in astronomy. I just don't know if there is one.
See https://github.com/idia-astro/cpp_hdf5converter for the code they use to create these HDF5 files.
Nadine reports that on Ilifu the TypeDict class (in martini demo) is not found. Probably need to specify a more strict version dependency for numpy (?).
I could not reproduce the martini_eagle notebook, but then continued with the python converted one. The first error is:
TypeError: init() got an unexpected keyword argument 'db_user'
so I comment this out, and then get another error from which it appears as if there is a 2nd EAGLESource class, now appearing in the installed sources.py which seems to use local files.The parent sources.py in the github repo is very different. I guess this is some python trick that newbies like me are not used to, but this is then very confusing. Is that related to the optional pip installs mentioned inthe installation notes?
I am trying to run the Illustris TNG example in the Illustris server, but I found the following error:
from martini import tngdemo; tngdemo()
AttributeError Traceback (most recent call last)
in
----> 1 from martini import tngdemo; tngdemo()
/opt/conda/lib/python3.6/site-packages/martini/_tngdemo.py in tngdemo(cubefile, beamfile, hdf5file)
123 # pixel scale) and consulting the documentation of the sph_kernels sub-
124 # module to select an appropriate kernel.
--> 125 sph_kernel = GaussianKernel.mimic(
126 CubicSplineKernel(rescale_sph_h=.5),
127 truncate=3
AttributeError: type object 'GaussianKernel' has no attribute 'mimic'****
Thanks for your help!
Add a keyword argument in DataCube initialisation with stokes=True/False
, Stokes axis can often be omitted for convenience.
Describe the bug
The installation instructions include directives like python3 -m pip install astromartini[hdf5_output]
. On zsh
this fails:
zsh: no matches found: astromartini[hdf5_output]
Expected behaviour
Installation instructions should work with copy/paste for common shells.
Additional context
pyOpenSci/software-submission#164
I think that as currently implemented, a datacube could be loaded with DataCube.load_state
with a padding region that is too small for the beam. Roughly:
m = Martini(beam=small_beam, ...)
m.insert_source_in_cube()
m.datacube.save_state(cachefile)
del m
m = Martini(beam=big_beam, ...)
m.load_state(cachefile) # I'll just save some compute time to make a lower angular resolution mock...
m.convolve_beam() # pad is too small!
Filter out emission on a range of scales according to a set of interferometer baselines.
It would be nice to have an idea of the processing history (e.g. original filenames) etc. such that in the HISTORY or COMMENT section of the fits file, one can read how the cube was made.
I should set up the github-Zenodo integration hook as suggested by hamogu.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.