Coder Social home page Coder Social logo

Comments (6)

bsipocz avatar bsipocz commented on July 23, 2024 1

I would suggest separating these into two different issues, one for simbad and one for irsa. If possible including code examples, too as that would help any debugging/benchmarking as well as that way we can spot if something is used in a non-intended way (and thus can improve the docs to point out what not to do)

I can say for irsa, that we totally switched out the backend, but not much has changed in the method's code, but a lot could have happened in the past 3 years on server side, etc. So an example code would also help us narrow down the problem to a useful suggestion (as e.g. new methods has been added since then)

from astroquery.

ManonMarchand avatar ManonMarchand commented on July 23, 2024 1

On the SIMBAD part

If I assume that you want the list of identifiers, the main identifier, and the positions for your 2MASS objects, then the proper way to do your query for now is with a TAP query (in the next astroquery version, this will be used behind the scenes by query_objects).

Let's first generate a sample of 10k 2MASS identifiers:

# let's get 10000 random 2MASS objects
from astroquery.simbad import Simbad
query = """SELECT TOP 10000 id from ident
WHERE id like '2MASS%'
"""
random_2MASS = Simbad.query_tap(query)
print(random_2MASS)
           id          
-----------------------
2MASS J00000002+7417074
2MASS J00000007-0529397
2MASS J00000007-3044366
2MASS J00000009-5455467
2MASS J00000011+0522500
2MASS J00000014+6055141
2MASS J00000015-2913020
2MASS J00000016+3208474
2MASS J00000019-1924498
2MASS J00000021+0105203
2MASS J00000022-3008557
2MASS J00000023-5709445
2MASS J00000024-5742487
2MASS J00000025+5210402
2MASS J00000025-7541166
2MASS J00000026-3441523
.
.
.

This part will be skipped for you, as you already have your own list. But you should have an astropy table with a single column with your own sample (if there are more columns you will loose upload time when we send the table to SIMBAD)

We will now write the TAP query:

query = """SELECT main_id, ra, dec, ids 
FROM random_2MASS 
JOIN ident ON ident.id = random_2MASS.id
JOIN basic ON basic.oid = ident.oidref 
JOIN ids ON basic.oid = ids.oidref 
"""

result = Simbad.query_tap(query, random_2MASS=random_2MASS)
<Table length=10000>
        main_id          ...
                         ...
         object          ...
------------------------ ...
        UCAC4 822-000001 ...
               HD 224701 ...
              CTLGD 2509 ...
   GES J00000009-5455467 ...
        UCAC4 477-000001 ...
        UCAC4 755-000001 ...
              CTLGD 9869 ...
   ATO J000.0007+32.1464 ...
        UCAC4 353-000001 ...
               HD 224700 ...
              CTLGD 5514 ...
        UCAC4 165-000001 ...
        UCAC4 162-000001 ...
         TYC 3258-1994-1 ...
        UCAC4 072-000001 ...
          TYC 6992-893-1 ...
   ATO J000.0011+31.2017 ...
.
.
.

It took 5.2 seconds on my machine.

Query explanation

We select

  • main_id : the one that apperas on top of SIMBAD's pages
  • ra, dec, : the position in ICRS
  • ids : a string with all the identifiers known to SIMBAD for this object

You could chose more columns from Simbad.list_columns().

The random_2MASS is our astropy table that we sent to SIMBAD's servers. It has to be joined to the tables containing the columns we want :

  • ident contains all the ids, so it's the only one that can be joined to our random_2MASS
  • basic has main_id, ra, and dec
  • ids has the string with all the identifiers

See this help page for more explanation.

An other possible speed-up for you is to be sure that you use the SIMBAD mirror closer to you (there is one in Europe and one in the USA).

On Xmatch

@fxpineau : you have a happy user 🙂

from astroquery.

ericasaw avatar ericasaw commented on July 23, 2024

@ManonMarchand Thank you for the SIMBAD example! I've never used the tap search function before since query_objects has always worked for me up until now so this is super helpful :-)

@bsipocz Here is an example for the IRSA behavior I'm noticing (particularly for the name matching using IRSA.query_region where it still seems to be using a coordinate match rather than searching using the 2MASS identifier)

These are a few example 2MASS identifiers I have noticed the behavior for: 2MASS J21065473+3844265, 2MASS J21065341+3844529, 2MASS J11052903+4331357, 2MASS J05420897+1229252, 2MASS J23055131-3551130

If you run the following code:

from astroquery.ipac.irsa import Irsa
import astropy.units as u

#this is just one of the example names
result = Irsa.query_region('2MASS J21065473+3844265', catalog="fp_psc", radius=5 * u.arcsec)

result turns up as an astropy table with no entries.

If instead you expand the radius to 10 arcseconds using the same code above, the appropriate object is found. Perhaps I am making the same mistake here as I was with SIMBAD as @ManonMarchand pointed out and instead I should be using a TAP query?

As for the time, I used IRSA.query_region to look for 16,055 objects in a loop one by one (the 16,055 is not a unique list, there are some objects repeated multiple times) which took 13 hours to run. Granted there are a few other things happening in the loop (saving the results table to a dictionary and printing out a progress report for the loop) so that is likely an exaggerated run time, but still the querying takes much longer than in astroquery 4.3.

The loop looks like this:

from astroquery.ipac.irsa import Irsa
Irsa.TIMEOUT = 3600
from termcolor import colored
import astropy.units as u

#for the objects with found 2MASS names search for them in the IRSA catalog
results = {}
i = 0
for name in names_2mass:
    #5 arcsec is the size of the IGRINS slit, 10 arcsec is required to search the names well
    result = Irsa.query_region(name, catalog="fp_psc", radius=10 * u.arcsec)
    #if there is a result returned
    if len(result) > 0:
        #if the result is multiple objects, keep the one closest in distance
        if len(result) > 1:
            results[has_2mass[i]] = result.to_pandas().head(1)
        #save the results df to a dictionary for later
        else:
            results[has_2mass[i]] = result.to_pandas()
    #if the name search doesnt return an object, print the object name
    else:
        print(colored(f"FAILED {name}", 'light_red'))
    #update the terminal with loop progress
    print(colored(f"{i+1}", 'magenta'), colored(f"/ {len(names_2mass)}", 'light_blue')) 
    i += 1

from astroquery.

ericasaw avatar ericasaw commented on July 23, 2024

I spent some time this afternoon looking into this and it seems like the Irsa.query_region function in 4.7 builds a TAP query based on input coordinates (which I guess come from the 2MASS identifier name) and then uses the Irsa.query_tap function to look for the object within a specified radius. It's still unclear to me why the TAP query doesn't return the object as expected, maybe it is the type of shape I choose to query with (cone)?

Looking through the IRSA VO Table Access Protocol (TAP) Instructions there is no way to TAP query by name as there is for SIMBAD, which is kind of frustrating. I think that the old Irsa.query_region function in 4.3 worked via requests but also seems to have used coordinates instead of names? Looking at the IRSA Catalog Search Service
Application Program Interface
it looks like you can feed in names, but still the search seems to use coordinates even if the name is given.

My guess is that the search result now is slower than in astroquery 4.3 due to the response time of IRSA. Based on my experience with how fast the SIMBAD.query_tap function this afternoon (which is very fast) it is interesting to me how slow the Irsa.query_tap function seems to work (behind the scenes of Irsa.query_region). I'm not sure if it is worth the time for me to go through and build a ADQL query for all of the objects since that is basically what Irsa.query_region does anyway.

from astroquery.

ManonMarchand avatar ManonMarchand commented on July 23, 2024

Perhaps I am making the same mistake here as I was with SIMBAD as @ManonMarchand pointed out

Sorry that I made it sound like a mistake, query tap is new since astroquery 0.4.7 for Simbad.

from astroquery.

aoberto avatar aoberto commented on July 23, 2024

If we want to dig a bit more in the SIMBAD time issue using query_objects, it will be better having more details on selected columns in the output and list of example names. I just tried 5000 object names, 2MASS or not, in SIMBAD or not, and it tooks about 30s.
But as the new version of astroquery.simbad is in the way to be released, may be it is not so necessary to dig here.

from astroquery.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.