Coder Social home page Coder Social logo

ropensci / fishbaseapi Goto Github PK

View Code? Open in Web Editor NEW
42.0 12.0 12.0 395 KB

Fishbase API

Home Page: https://fishbaseapi.readme.io/

License: MIT License

Ruby 54.11% Shell 32.69% R 1.75% HTML 11.08% Dockerfile 0.37%
rest-api sinatra database unicorn caddy-server

fishbaseapi's Introduction

FishBase API

Update The Ruby-based fishbase API with custom endpoints has been deprecated.

Fishbase and Sealifebase data can now be accessed programmatically using a standard S3 API at the following endpoints:

https://fishbase.ropensci.org/fishbase https://fishbase.ropensci.org/sealifebase

These endpoints are provided by the open source MINIO Server which conforms to the current (v4) AWS S3 REST API. This supports direct REST queries or any of the many great and well-maintained client packages and tools, including minio client, python boto, Apache Arrow, etc.

fishbaseapi's People

Contributors

cboettig avatar dependabot[bot] avatar evilscott avatar sckott avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fishbaseapi's Issues

DO box

@cboettig

So we currently have discuss.ropensci.org on a 1 GB droplet . A few thoughts:

Will we run into any trouble with serving the API from the same server as the discussion forum since discuss.ropensci.org is mapped to that servers IP address, and I think port 80. Can we map <server.ip.address>:80 to a domain name e.g., fishbaseapi.org exposing only the fishbase API, and not allow exposing the forum?

Box size: currently on a 1 GB droplet at $10/mo - If we decide we need more room, the next one up is 2 GB at $20/mo - I guess we should decide what resources we'll need first, then adjust the box size accordingly. The discussion forum i think takes up very little resources right now since there is little activity.

nginx / login-only settings for elasticsearch and kibana?

Do we want secure credentials to be required for login access to the elasticsearch api?

I think the best way to do this would be a nginx layer with a CA certificate; since that should allow the API to work under programmatic calls without the user having to click / authenticate stuff, as long as they had the secure CA certificate installed on their machine.

Add a function to ping / reconnect the mysql server?

Hey @sckott ,

I'm having some stability problems with the mysql server still. It seems that often it is sufficient to ping the server to re-establish the connection. Could we add an endpoint that would just ping the server? (Would at least be handy for debugging). Might need to explore some more tricks to babysit the mysql server too.

Redis thoughts

  • We may at various points need to flush the redis cache, etc. so need to look into how to make this easy since it's running in a docker container
  • We may want to set an expiration time for caching in redis, e.g., items are only cached for 24 hrs, or 1 week? or no expiration?

Search for term anywhere in database

@sckott I'm wondering if we can construct an endpoint to search for a term anywhere in the database.

Given how disorganized the FishBase SQL is, it can be pretty hard to know which table to find something in (e.g. min/max temp, ropensci/rfishbase#47, that several people have requested recently -- it must be there somewhere since it's on the species summary pages).

Haven't found a great solution for doing this in SQL, but there's a few ideas:

The information_schema answer in the first link looks promising. Let me know if you get a chance to take a whack at this?

DRY out the API script a bit?

@sckott Now that it seems like we've somewhat pinned down a standard pattern for defining endpoints, would it be worth functionalizing the API a bit more (I think you suggested this a bit earlier). I'm getting to the point where it would be convenient to add a bunch more endpoints but don't want to do a bunch of copy-paste that will make more work for you later, so thought I touch base on this first. I'm happy either way, just want to follow your lead here.

error handling / recovery wrt redis

Should we handle failures of get_cached in error handling with a rescue method that would just repeat the call? e.g. I think something like this would fall-back successfully if either redis call failed.

get '/species/?:id?/?' do
    key = rediskey('species', params)
    begin
      if redis_exists(key)
        obj = get_cached(key)
      else
      obj = get_new_ids(client, key, 'species', 'SpecCode', params)
      end
    rescue
      obj = get_new_ids(client, key, 'species', 'SpecCode', params)
    end
    return give_data(obj)
  end

(related to #23)

moving log file location

Hey @sckott ,

Trying to get the logstash to work with kibana URL being configured at runtime instead of the crazy solution in #25. I should just be able to pass in the name of the server as the env var ES_HOST when running the logstash container, but for some reason doing so is causing the logstash container to crash.

Apparently the problem may be due to writing logs (and thus having to link volume of) /root (which is ~ since api runs as root on the docker container). See: pblittle/docker-logstash#56

I tried to switch the location of the logfile to /var/log/fishbase/api.log but it does not seem to be writing logs now, even though it is creating the file: (see https://github.com/ropensci/fishbaseapi/blob/master/api.rb#L13 and https://github.com/ropensci/fishbaseapi/blob/master/api.rb#L100). And for some reason, linking this volume instead still causes a crash.

Stability / Error handling

Might be good for the app to be robust to any of the non-essential components going down or not responding? This would also allow a user to deploy without things like the redis and/or logstash containers if they'd prefer.

  • recover option to skip handle a failed attempt to connect to redis, which would also need to trip some kind of flag to avoid any later calls to cache or read from cache I think.
  • similarly for logging, though I suppose logs could be written locally
  • might be good to be able to turn off logging completely?

Kibana needs manual configuration

docker exec -ti fblogstash bash
apt-get update && apt-get install -y vim-tiny
cd /opt/logstash/vendor/kibana
vi config.js

Find the line that has the server address as http://127.0.0.1 and change to match your external server address.

unit testing

Might be worth thinking if there's any kind of testing we can run here?

Might look more like queries against the testing API instance than actually deploying the API here (though I suppose the tests on rfishbase2.0 rather fulfill that role already). Or perhaps we could have a dummy SQL database with no real data, for testing purposes only.

The docker calls can all be run on circle.io.

Inline/API-based documentation of endpoints?

Hey @sckott,

Keep feeling it would be nice to have some inline documentation of the endpoints and I'm wondering if just editing the heartbeat list directly is the best way to do this. It's nice to get a table of endpoints and all, but at minimum it would be useful to have a description of what the endpoint does; (particularly since using the fishbase SQL table names helps in programming, but makes it all the harder to know what an endpoint with a name like intrcase actually returns). Thoughts?

Which tables need api routes?

@cboettig curious what tables you think would be good to have routes on? or are you mostly interested in replicating what the package already does, and the routes are secondary?

handle reconnecting to mysql if container goes down

Currently if the mysql container exits, I need to bring everything down and restart. We should just be able to (auto) restart the mysql container.

e.g.

  • deploy on server: ./docker.sh. Then docker.ps shows all 5 containers running, and we can hit /mysqlping successfully.
  • Do: docker stop fbmysql, simulates the sql container going down. /mysqlping is now false.
  • Restart it: docker start fbmysql. Container comes back up, but /mysqlping is still false.

Currently I need to take everything down and restart to recover:

docker rm -f fbapi fbmysql fbredis fblogstash fbnginx
./docker.sh

If we resolve this, then we should be able to also add auto-restarting for the mysql docker container and get a more stable system. Again not sure why it goes down as frequently as it does, though could be the limited resources on my test server.

mangled json responses

Hey @sckott ,

Occasionally I will get the following error message (e.g. on a large call such as: "http://fishbaseapi.info/taxa?family=&limit=40000", though it happens occasionally on smaller calls as well)

Error in parseJSON(txt) : parse error: trailing garbage
           "Actinopterygii"     }   ] }HTTP/1.1 500 Internal Server Er
                     (right here) ------^

If you put that link into a browser you will probably get the same thing -- a long field of valid JSON that suddenly terminates with the HTTP error message. Because the header is intact, httr etc see the return as response code 200 and continue, and the error doesn't occur until the JSON parser attempts to parse the content and freaks out.

Any idea why this error is ending up in the JSON output like this? Or ideas on how to avoid it?

Use stats/logging

Just thinking it might be valuable to be able to compute some basic metrics of use for the FishBase team to monitor traffic; e.g. which endpoints are getting hit, how often, perhaps from what geographic areas. Have you looked into this?

Query-able fields?

@cboettig what fields should users be allowed to query? since you're more familiar with the data...

Right now, I've set it up so that users can query on any field. Do you think this is best? Or do you think only certain fields should be exposed to query on? If we do limit to only some fields can be queried on that does make it a little easier to solve #7 because we don't have to account for any field queried.

note that this is different from what gets returned (All fields unless there some reason not to)

Elasticsearch fail behavior

Wonder if there's a way to not lose old indices (each day's collection of logs) when the container crashes. E.g., today @cboettig you put it back up, but we only have today's index. However, it's weird, cause it had data only from back in April, and we know there's been requests since then

Check parameters

Some routes will fail when a parameter is passed that is not a field in the table being queried. Right now we use check_fields() to make sure the user isn't requesting a field that doesn't exist, but we should do similarly for fields queried on.

fix geolocation of IP

API does not seem to be correctly identifying the ip address in the first place, all ips in the log come back as the docker ip.

Once this is fixed, logs need to switch over to actually doing the geolocation and reporting the anonymized location instead of the ip address.

license

this project still needs a license. MIT?

Error catching

  • Fields that do not exist/spelled wrong - seems like most APIs I've played with silently ignore fields that don't exist - if we do this, we have to compare to fields that exist in a table, and drop those that don't match...
  • path doesn't exist (e.g., user calls /swisscheese) -> throw 404

Synonyms

We now have the /taxa endpoint, but one thing that's left out is the synonyms table and its data. Perhaps that should be a separate endpoint ? @cboettig

Fix bad routes

For some reason this works

curl -v http://fishbaseapi.info/genera56

and returns genus code 56.


  "count": 1,
  "returned": 1,
  "error": null,
  "data": [
    {
      "GenCode": 56,
      "GenName": "Euprotomicrus",
      "GenAuthorYear": "Gill, 1865",
      "GenAuthor": "Gill",
      "GenYear": 1865,
...cutoff

i imagine this is b/c https://github.com/ropensci/fishbaseapi/blob/master/api.rb#L203 the trailing slash is allowed to be absent, but really this should throw a 404

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.