Coder Social home page Coder Social logo

lsd's Introduction

lsd -- Large Survey Database

Building

  • clone 'master' to a directory (I usually have it in ~/project)
  • run:
python ./setup.py build_ext --inplace

to build the required modules.

  • run:
export PYTHONPATH="$PYTHONPATH:$PWD/src"

to set up the environment.

  • after that, you should be able to run all lsd-* stuff directly from src/

lsd's People

Contributors

mjuric avatar

Stargazers

 avatar AM avatar Russ avatar Adam Malyali avatar Robert Casey avatar Miles Cranmer avatar Rohan Naidu avatar Jan Corazza avatar August Muench avatar Leo Singer avatar Eddie Schlafly avatar  avatar Eric Bellm avatar Branimir Sesar avatar Colin Slater avatar Dan Foreman-Mackey avatar

Watchers

 avatar Gregory Green avatar Will Clarkson avatar Robert Casey avatar

lsd's Issues

Switch to file IO in pool2 instead of mmap, to support OSX

OSX' HSF+ filesystem does not support sparse file so backing to disk is disabled on OSX in pool2 mapreduce engine. Since there appears to be no need to use mmap instead of normal file IO, we should switch to the latter in pool2 and enable backing to disk for OSX.

python 3 incompatible

I cannot run the setup using python 3. Despite cleaning tab mixed with spaces in setup.py some libraries seems to be different.

NWORKERS keeps CPU down but not memory

Hi Mario,

I've found another case in which the NWORKERS environmental variable doesn't seem to work as I would expect. I'm importing data from 2MASS into an existing database - I previously imported half the data, and now I'm importing the second half.

I've set NWORKERS to 8 by invoking export NWORKERS=8. Then, after entering lsd-import text tmass 2MASS_data/psc_b*, it looks in htop like LSD has spawned ~125 processes.

I tried using NWORKERS=24 before, and a much larger number of processes were spawned, which filled the RAM and swap, and almost brought pan down before I managed to terminate the processes. It looks like NWORKERS has some effect on the number of spawned processes, but it doesn't set the number exactly.

The CPU usage is, however, constrained by NWORKERS. Although a very large number of processes are spawned, the number actually doing work at any given time seems to be NWORKERS. The problem is that all of the inactive processes take up memory, which can easily crash the machine.

Thanks,
-Greg

lsd-admin: error message if .__dblock.lock already present in $LSD_DB

Would it be possible for lsd-admin to warn the user and exit gracefully if the .__dblock.lock file is present in $LSD_DB? Currently the software just seems to hang on encountering this file.

lsd-admin already gracefully exits if the .__transaction file is present, alerting the user; the same behavior would be useful on encountering the .__dblock.lock file.

Check for reserved words in column names

Reported by Bertrand Goldman:

I tried import CMC14 into LSD, using keys including "nt", "na" and "np" (see attachment). That failed with the error below. Using now ntot, nastr and nphot instead, that worked fine. Are those reserved keywords?

Float-Array mismatch error on lsd-import fits?

Summary:
lsd-import of a fits table produces float/array mismatch error

lsd-import fits sdss_ps1_match $INFILE
Importing from 1 pieces:
------ rolling back 20110821140158.500967 ---------

Traceback (most recent call last):
File "/a41217d5/LSD/stable/bin/lsd-import", line 88, in
args.func(args)
File "/a41217d5/LSD/stable/bin/lsd-import", line 53, in chunk_importer
import_from_chunks(db, importer, chunks)
File "/a41217d5/LSD/stable/bin/lsd-import", line 30, in import_from_chunks
for (chunk, nloaded, nin) in pool.imap_unordered(chunks, import_from_chunks_aux, (db, importer,), progress_callback=pool2.progress_pass):
File "/a41217d5/LSD/stable/lib/python2.7/site-packages/lsd/pool2.py", line 459, in imap_unordered
for result in mapper(item, *mapper_args):
File "/a41217d5/LSD/stable/bin/lsd-import", line 20, in import_from_chunks_aux
ret = importer(db, chunk, *importer_args)
File "/a41217d5/LSD/stable/lib/python2.7/site-packages/lsd/importers/fits.py", line 43, in call
ids = db.table(self.tabname).append(rows, _update=self.import_primary_key)
File "/a41217d5/LSD/stable/lib/python2.7/site-packages/lsd/table.py", line 1167, in append
cols[key][need_key] = self.pix.obj_id_from_pos(lon, lat, t)
File "/a41217d5/LSD/stable/lib/python2.7/site-packages/lsd/pixelization.py", line 222, in obj_id_from_pos
(x, y) = bhpix.proj_bhealpix(ra, dec)
File "/a41217d5/LSD/stable/lib/python2.7/site-packages/lsd/bhpix.py", line 53, in proj_bhealpix
(hxx, hyy) = proj_healpix(l, b)
File "/a41217d5/LSD/stable/lib/python2.7/site-packages/lsd/bhpix.py", line 23, in proj_healpix
z = cos(radians(90.-b))
TypeError: unsupported operand type(s) for -: 'float' and 'numpy.ndarray'

Background:
$INFILE is a fits file output produced by LSD. It initially had two columns called "sdss.ra" and "sdss.dec" which were not the spatial columns. I thought the '.' might cause problems and changed them to sdss_ra and sdss_dec. This did not affect anything. Looking at the ra and dec used for spatial sorting:

blah[1].data['ra']
array([ 350.00996409, 350.04883905, 350.03341515, ..., 355.77349801,
355.71030779, 355.77579255])
blah[1].data['dec']
array([ 70.01368437, 70.02155921, 70.01437617, ..., 71.53092708,
71.53523562, 71.53410173])

They are not obviously invalid.

LSD @ Harvard directory not found

I am trying to use Odyssey to install LSD but the sourced directory is not available. That is:
source /n/panlfs/mjuric/lsd/lsd-setup-odyssey.sh

Web Installer Broken

I get a 404 error when trying to access the go.sh script. The link given in the documentation appears to be broken.

"Exception: Another transaction is already ongoing" (even though there isn't)

I am getting an error any time I try to create a table or have any query that ends in "into database" in our defaults LSD directory. The error is

"Exception: Another transaction is already ongoing" (full text below).

But nothing else is going on in LSD world. There are no queries. But we have had some jobs crash (particularly a query that had 'into' database"). I cannot make a new database in that directory. I can, however, do anything I want if I use --db=BLAH to set a new directory. We have restarted the computer since this problem started. So I think there must be some half finished file somewhere. But I have clear my temp and cache directories.

lsd-admin create table --comp blosc --comp-level 5 --primary-key obj_id --spatial-keys ra,dec zdrops_small obj_id:u8 ra:f8 dec:f8 y:f4 erry:f4 stdevy:f4 nok:i2 mjdy:5f4
Traceback (most recent call last):
File "/a41217d5/LSD/stable/bin/lsd-admin", line 580, in
args.func(args)
File "/a41217d5/LSD/stable/bin/lsd-admin", line 121, in do_create_table
with db.transaction():
File "/usr/local/misc/python/Python-2.7.2/lib/python2.7/contextlib.py", line 17, in enter
return self.gen.next()
File "/a41217d5/LSD/stable/lib/python2.7/site-packages/lsd/join_ops.py", line 1834, in transaction
self.begin_transaction()
File "/a41217d5/LSD/stable/lib/python2.7/site-packages/lsd/join_ops.py", line 1760, in begin_transaction
raise Exception('Another transaction is already ongoing')
Exception: Another transaction is already ongoing

Replace pyfits dependency with astropy.io.fits

The module pyfits has been merged into astropy, and is no longer under active development outside of astropy. For all intents and purposes, pyfits now lives at astropy.io.fits, and many users will no longer have pyfits independently installed. One solution to this problem, which I use in my own scripts, is to replace all import pyfits statements with the following:

try:
    import astropy.io.fits as pyfits
except ImportError:
    import pyfits

Could you make this change?

bounds fails at specific galactic location

I am finding that the bounds box rectangle(302.4,-34,304.8,-32,coordsys=galaxy) fails to bound queries. The relevant code:

self.bounds=[l1,b1,l2,b2]
bounds_xy = bounds.rectangle(*(self.bounds),coordsys='gal')
bounds_t = bounds.intervalset((-np.inf, np.inf))
rows=db.query(fields+' FROM '+dbname+' WHERE '+where).fetch(bounds=[(bounds_xy, bounds_t)])

By "fail" I mean thequery is over 2392 elements instead of 2-4 that I would expect. It takes forever and makes a huge fits file. So I think it really is querying the whole sky.

The funny thing, this code is iterating chunk by chunk. It works perfectly in b > 10 region and -90 < b < -34 region (an probably everywhere else). So I think this must be a problem in the way LSD converts galactic coordinates at this one location. When I print out the xy bounds, I get:

Polygon:
<0:Hole : [0:0.99, 0.96] [1:0.99, 0.96] [2:0.99, 0.96] [3:0.99, 0.96] [4:0.99, 0.95] [5:0.99, 0.95] [6:0.99, 0.95] [7:0.99, 0.95] [8:0.99, 0.95] [9:0.99, 0.95] [10:0.99, 0.95] [11:0.99, 0.95] [12:0.99, 0.95] [13:0.99, 0.95] [14:0.99, 0.95] [15:0.99, 0.94] [16:0.99, 0.94] [17:0.99, 0.94] [18:0.99, 0.94] [19:0.99, 0.94] [20:0.99, 0.94] [21:0.99, 0.94] [22:0.99, 0.94] [23:0.99, 0.94] [24:0.99, 0.94] [25:0.99, 0.94] [26:0.99, 0.94] [27:0.99, 0.94] [28:0.99, 0.94] [29:0.99, 0.93] [30:0.99, 0.93] [31:0.99, 0.93] [32:0.99, 0.93] [33:0.99, 0.93] [34:0.99, 0.93] [35:0.99, 0.93] [36:1.0, 0.93] [37:1.0, 0.93] [38:1.0, 0.93] [39:1.0, 0.93] [40:1.0, 0.93] [41:1.0, 0.93] [42:1.0, 0.93] [43:1.0, 0.93] [44:1.0, 0.92] [45:1.0, 0.95] [46:1.0, 0.95] [47:1.0, 0.95] [48:1.0, 0.95] [49:1.0, 0.95] [50:1.0, 0.95] [51:1.0, 0.95] [52:1.0, 0.95] [53:1.0, 0.95] [54:0.99, 0.95] [55:0.99, 0.95] [56:0.99, 0.95] [57:0.99, 0.95] [58:0.99, 0.96] [59:0.99, 0.96] [60:0.99, 0.96] [61:0.99, 0.96] [62:0.99, 0.96]>
<1:Hole : [0:1.0, 0.92] [1:1.0, -0.93] [2:1.0, -0.93] [3:1.0, -0.93] [4:1.0, 0.92]>
<2:Hole : [0:1.0, -0.93] [1:1.0, -0.93] [2:1.0, -0.93] [3:1.0, -0.93] [4:1.0, -0.93] [5:1.0, -0.93] [6:1.0, -0.93] [7:1.0, -0.94] [8:1.0, -0.94] [9:1.0, -0.94] [10:1.0, -0.94] [11:1.0, -0.94] [12:1.0, -0.94] [13:1.0, -0.94] [14:1.0, -0.94] [15:1.0, -0.94] [16:1.0, -0.95] [17:1.0, -0.95] [18:1.0, -0.95] [19:1.0, -0.95] [20:1.0, -0.95] [21:1.0, -0.95] [22:1.0, -0.95] [23:1.0, -0.95]>
<3:Contour: [0:1.0, -1.0] [1:-1.0, -1.0] [2:-1.0, 1.0] [3:1.0, 1.0]>

But I don't know how to interpret that.

Impossible to match a single detection to an object

I was trying to match the nearest detection to an object table as a hack way to determine if that object is a star. This seems to uncover a bug (or maybe I just don't understand matchto).

hw_objects is a subset of averages. When I query a small area:

lsd-query --format=text --output=hw_stars_2.text --bounds='rectangle(0, 0, 1, 1, coordsys="gal")' 'select hw_objects.ra as ra, hw_objects.dec as dec, g_ps1, g_ps1_err, g_nmag_ok, r_ps1, r_ps1_err, r_nmag_ok, i_ps1, i_ps1_err, i_nmag_ok FROM hw_objects' [2 el.]::::::::::::::::::::> 0.88 sec
8571 rows selected.

I get 8571 objects

When try to match it to ps1_det, setting nmax=1
lsd-query --format=text --output=hw_stars_1.text --bounds='rectangle(0, 0, 1, 1, coordsys="gal")' 'select hw_objects.ra as ra, hw_objects.dec as dec, ps1_det.l as l, ps1_det.b as b, g_ps1, g_ps1_err, g_nmag_ok, r_ps1, r_ps1_err, r_nmag_ok, i_ps1, i_ps1_err, i_nmag_ok, ps1_det.flags as flags, ps1_det.flags2 as flags2 FROM hw_objects, ps1_det(matchedto=hw_objects, nmax=1, dmax=.3) WHERE (flags & 1)'
[76 el.]::::::::::::::::::::> 70.15 sec
72639 rows selected.

I get 9x as many objects.

Thinking that maybe matchedto works the opposite of how I thought it worked, I take ps1_det as the first cat and match hw_objects to it.

When try to match it to ps1_det, setting nmax=1
lsd-query --format=text --output=hw_stars_0.text --bounds='rectangle(0, 0, 1, 1, coordsys="gal")' 'select hw_objects.ra as ra, hw_objects.dec as dec, ps1_det.l as l, ps1_det.b as b, g_ps1, g_ps1_err, g_nmag_ok, r_ps1, r_ps1_err, r_nmag_ok, i_ps1, i_ps1_err, i_nmag_ok, ps1_det.flags as flags, ps1_det.flags2 as flags2 FROM ps1_det, hw_objects(matchedto=ps1_det, nmax=1, dmax=.3) WHERE (flags & 1)'
[37 el.]::::::::::::::::::::> 72.50 sec
154311 rows selected.
I get 20x as many objects.

Shouldn't one of the queries give me one ps1_det row per hw_object?

Suppress stdout and stderr from workers

How would one modify LSD to make it possible to suppress stdout and stderr from workers? Tracking query progress is difficult when there are a thousand messages like the following:

udfs:1: RuntimeWarning: invalid value encountered in greater

"AttributeError: Table instance has no attribute 'remote'"

So the end result is that lsd is looking for the attribute "remote" and finding undefined and crashing. I am wondering if the databases I download from the internet need to identified as tables rather than random piles of files.

Pertinent information:

  • The line of code is
    for all_rows in colgroup.partitioned_fromiter(qresult, "_ID", 5_1000_1000, blocks=True):
  • This is in a modified version of PS1 make_average_magnitudes.py
  • We downloaded ps1_obj, ps1_exp and ps1_det from CFA not using the inter LSD program
  • This happens for some areas and not others (I have not determined if this is because of different areas of the sky or different job sizes)

My best guess is that since I downloaded these tables, they does not have proper headers or something.

It just occurred to me that I should really recheck my ps1_exp, ps1_det and ps1_obj databases to make sure they are complete. I did this once, but it is good to confirm.

Error below:

[944 el.]^[[B::Remote Traceback (most recent call last):
File "/a41217d5/LSD/stable/lib/python2.7/site-packages/lsd/pool2.py", line 81, in _worker
for result in mapper(item, mapper_args):
File "/a41217d5/LSD/stable/lib/python2.7/site-packages/lsd/join_ops.py", line 2348, in mapper
for result in mapper(qresult, mapper_args):
File "make_average_magnitudes.py", line 373, in calc_objmag
for all_rows in colgroup.partitioned_fromiter(qresult, "_ID", 5_1000
1000, blocks=True):
File "/a41217d5/LSD/stable/lib/python2.7/site-packages/lsd/colgroup.py", line 651, in partitioned_fromiter
for rows in it:
File "/a41217d5/LSD/stable/lib/python2.7/site-packages/lsd/join_ops.py", line 1289, in iter
for rows in QueryInstance(self, cell_id, bounds, include_cached):
File "/a41217d5/LSD/stable/lib/python2.7/site-packages/lsd/join_ops.py", line 711, in iter
rows = self.eval_select(globals
)
File "/a41217d5/LSD/stable/lib/python2.7/site-packages/lsd/join_ops.py", line 763, in eval_select
cols = eval(name, globals
, self)
File "", line 1, in
File "/a41217d5/LSD/stable/lib/python2.7/site-packages/lsd/join_ops.py", line 944, in getitem
self[name] = self.load_column(colname, tabname)
File "/a41217d5/LSD/stable/lib/python2.7/site-packages/lsd/join_ops.py", line 853, in load_column
col = self.tcache.load_column(self.cell_id, name, table)
File "/a41217d5/LSD/stable/lib/python2.7/site-packages/lsd/join_ops.py", line 86, in load_column
rows = table.fetch_tablet(cell_id, cgroup, include_cached=include_cached)
File "/a41217d5/LSD/stable/lib/python2.7/site-packages/lsd/table.py", line 1597, in fetch_tablet
if self.tablet_exists(cell_id, cgroup): # Note: this will download the tablet from remote, if needed
File "/a41217d5/LSD/stable/lib/python2.7/site-packages/lsd/table.py", line 430, in tablet_exists
if self.remote is None:
AttributeError: Table instance has no attribute 'remote'

bounds fails at specific galactic location

I am finding that the bounds box rectangle(302.4,-34,304.8,-32,coordsys=galaxy) fails to bound queries. The relevant code:

self.bounds=[l1,b1,l2,b2]
bounds_xy = bounds.rectangle(*(self.bounds),coordsys='gal')
bounds_t = bounds.intervalset((-np.inf, np.inf))
rows=db.query(fields+' FROM '+dbname+' WHERE '+where).fetch(bounds=[(bounds_xy, bounds_t)])

By "fail" I mean thequery is over 2392 elements instead of 2-4 that I would expect. It takes forever and makes a huge fits file. So I think it really is querying the whole sky.

The funny thing, this code is iterating chunk by chunk. It works perfectly in b > 10 region and -90 < b < -34 region (an probably everywhere else). So I think this must be a problem in the way LSD converts galactic coordinates at this one location. When I print out the xy bounds, I get:

Polygon:
/<0:Hole : [0:0.99, 0.96] [1:0.99, 0.96] [2:0.99, 0.96] [3:0.99, 0.96] [4:0.99, 0.95] [5:0.99, 0.95] [6:0.99, 0.95] [7:0.99, 0.95] [8:0.99, 0.95] [9:0.99, 0.95] [10:0.99, 0.95] [11:0.99, 0.95] [12:0.99, 0.95] [13:0.99, 0.95] [14:0.99, 0.95] [15:0.99, 0.94] [16:0.99, 0.94] [17:0.99, 0.94] [18:0.99, 0.94] [19:0.99, 0.94] [20:0.99, 0.94] [21:0.99, 0.94] [22:0.99, 0.94] [23:0.99, 0.94] [24:0.99, 0.94] [25:0.99, 0.94] [26:0.99, 0.94] [27:0.99, 0.94] [28:0.99, 0.94] [29:0.99, 0.93] [30:0.99, 0.93] [31:0.99, 0.93] [32:0.99, 0.93] [33:0.99, 0.93] [34:0.99, 0.93] [35:0.99, 0.93] [36:1.0, 0.93] [37:1.0, 0.93] [38:1.0, 0.93] [39:1.0, 0.93] [40:1.0, 0.93] [41:1.0, 0.93] [42:1.0, 0.93] [43:1.0, 0.93] [44:1.0, 0.92] [45:1.0, 0.95] [46:1.0, 0.95] [47:1.0, 0.95] [48:1.0, 0.95] [49:1.0, 0.95] [50:1.0, 0.95] [51:1.0, 0.95] [52:1.0, 0.95] [53:1.0, 0.95] [54:0.99, 0.95] [55:0.99, 0.95] [56:0.99, 0.95] [57:0.99, 0.95] [58:0.99, 0.96] [59:0.99, 0.96] [60:0.99, 0.96] [61:0.99, 0.96] [62:0.99, 0.96]/>
<1:Hole : [0:1.0, 0.92] [1:1.0, -0.93] [2:1.0, -0.93] [3:1.0, -0.93] [4:1.0, 0.92]>
<2:Hole : [0:1.0, -0.93] [1:1.0, -0.93] [2:1.0, -0.93] [3:1.0, -0.93] [4:1.0, -0.93] [5:1.0, -0.93] [6:1.0, -0.93] [7:1.0, -0.94] [8:1.0, -0.94] [9:1.0, -0.94] [10:1.0, -0.94] [11:1.0, -0.94] [12:1.0, -0.94] [13:1.0, -0.94] [14:1.0, -0.94] [15:1.0, -0.94] [16:1.0, -0.95] [17:1.0, -0.95] [18:1.0, -0.95] [19:1.0, -0.95] [20:1.0, -0.95] [21:1.0, -0.95] [22:1.0, -0.95] [23:1.0, -0.95]>
<3:Contour: [0:1.0, -1.0] [1:-1.0, -1.0] [2:-1.0, 1.0] [3:1.0, 1.0]>

But I don't know how to interpret that.

Joins on PS1 MDF databases exceedingly slow

This profiles and reproduces the problem:

NWORKERS=1 /n/sw/python-2.7/lib/python2.7/cProfile.py -s time lsd-query --format=fits --bounds='beam(333.3978, 0.4723, 0.1)' 'SELECT obj_id, cal_psf_mag, cal_psf_mag_sig FROM md_obj, md_det'

The runtime on a random Odyssey node is ~1100 sec, while the runtime for a query with no join is ~15 sec.

(originally reported by Dae-Won Kim)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.