Coder Social home page Coder Social logo

afsc-gap-products / gap_public_data Goto Github PK

View Code? Open in Web Editor NEW
5.0 2.0 3.0 34.79 MB

Public facing data for the Groundfish and Shellfish Assessment Program. https://afsc-gap-products.github.io/gap_products/content/foss-intro.html

Home Page: https://www.fisheries.noaa.gov/foss/

R 28.91% HTML 26.71% CSS 44.25% JavaScript 0.12%
data alaska groundfish survey crab data-public noaa-fisheries open-data

gap_public_data's People

Contributors

emilymarkowitz-noaa avatar lewis-barnett-noaa avatar margaretsiple-noaa avatar sampottinger avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

gap_public_data's Issues

Check new species data integration in public data

I took the latest worms and itis files created by @SarahFriedman-NOAA and integrated them into the newest version of the FOSS data. They are currently uploading to Oracle, which usually takes a day or two.

  • @SarahFriedman-NOAA, can you review how I combined the worms and itis files, applied them to the public data, and the output file in oracle RACEBASE_FOSS.FOSS_CPUE_ZEROFILLED (and/or RACEBASE_FOSS.JOIN_*) and let me know if I have used the files correctly/if any changes need to be made?
  • @Lewis-Barnett-NOAA, do you want to take a look and make sure this meets the needs for the west coast data join package?
  • @SarahFriedman-NOAA If you like, I can post or give you the code to post your latest (recognizing beta) versions of these itis and worms tables to the GAP_PRODUCTS schema along with the rest of the OLD_* tables. The code for uploading to oracle from R would look something like this:
# hopefully a temporary location for these functions
source("https://raw.githubusercontent.com/afsc-gap-products/metadata/main/code/functions_oracle.R") 

# establish oracle connection 
# you will need the GAP_PRODUCTS user/pass, which I can send you separately
channel <- oracle_connect()

# establish what files you want to upload to oracle. 
# The names of the files will be used as the files on oracle
# In this example, tables are saved in a parent directory as taxon_itis and taxon_worms, so switch those out with whatever name you like
file_paths <- data.frame(file_path = c("./taxon_itis.csv", "./taxon_worms.csv"), 
                         table_metadata = Sys.Date()) # develop a more sophisticated table metadata as you see fit, though GAP_PRODUCTS.METADATA_TABLE may be helpful. 

# upload tables to oracle
oracle_upload <- function(
    file_paths = file_paths, 
    # metadata_column = ..., # possibly borrowed from GAP_PRODUCTS.METADATA_TABLE, but a dummy table is default
    channel = channel,
    schema = "GAP_PRODUCTS")

catch_haul_cruises not found

I was running the run.R script and could go further than line 36. I got this error:
image

I ran the data.dl script with my Oracle credentials and then started run.R.

I would troubleshoot more intensely but I don't have tons of time rn and this seems like something you'll know faster than I! I want to run your code to get cpue tables for some of our overlapping species and compare them to the outputs from my code and the tables in RACEBASE/GOA.

Thanks!

API/query code in readme does not return all data

Related to Megsie's earlier comment, the code in the readme for accessing data by API is either incorrect or there is a problem with the API itself, as the example code to pull all the data only returns 25 records from two hauls on the 2002 AI survey

Make a better API pull process

Currently, we can use a variation on the following code to pull data, but it could be easier to use and pull more data each pull.

Approach 1: work on the partially built {fossAPI} R package:

It would be great to develop a clean, easy way to pull data from the FOSS data platform, like Alex Richardson's partially built FOSS API pull package.

API R packages that are already being used for other data streams that could be an example for our needs

Approach 2: loop through data content:
To loop through data from an API, you want to set up a loop that exits when the api call returns hasMore = false. In the loop, you will adjust your offset (e.g. offset = offset + limit), make another api call, then append the returned items to an items variable.

If you look at the output from https://apps-st.fisheries.noaa.gov/ods/foss/afsc_groundfish_survey/ you'll see the following. hasMore tells you there is more data. offset tells the API where to start to retrieve the next page of data.

"hasMore": true,
"limit": 25,
"offset": 0,
"count": 25,

So to get the next page of data, set offset = 25 (e.g., https://apps-st.fisheries.noaa.gov/ods/foss/afsc_groundfish_survey/?offset=25). Also notice that there's a links section that tells you what the next page of data says offset = 50. In your code, instead of having to keep track of what record you're on, you can just use the element 5 from the links array to request the next chunk of data.

"links": [
{
"rel": "self",
"href": "https://apps-st.fisheries.noaa.gov/ods/foss/afsc_groundfish_survey/"
},
{
"rel": "edit",
"href": "https://apps-st.fisheries.noaa.gov/ods/foss/afsc_groundfish_survey/"
},
{
"rel": "describedby",
"href": "https://apps-st.fisheries.noaa.gov/ods/foss/metadata-catalog/afsc_groundfish_survey/"
},
{
"rel": "first",
"href": "https://apps-st.fisheries.noaa.gov/ods/foss/afsc_groundfish_survey/"
},
{
"rel": "next",
"href": "https://apps-st.fisheries.noaa.gov/ods/foss/afsc_groundfish_survey/?offset=50"
},
{
"rel": "prev",
"href": "https://apps-st.fisheries.noaa.gov/ods/foss/afsc_groundfish_survey/"
}
]

cc'ing @Lewis-Barnett-NOAA and @MargaretSiple-NOAA, who would also be interested in this development.

survey years included could be revised

@EmilyMarkowitz-NOAA GOA data in FOSS tables in Oracle only include data from 1993 onward. I think we should include data from 1990 onward, unless @Ned-Laman-NOAA thinks there is good reason to stick with this cutoff. We know there were some ID issues in these years, but I think it makes sense to exclude only the 1980s data due to lack of standardization.

I checked the years included for the other surveys and they seem good. I noticed that the reduced 2018 NBS survey is included, so we leave that in we should make sure the uniqueness of this year is emphasized.

Repo website needs work

Some links go to just a map of survey areas with no links to move forward. The main page doesn't have much there that specifically helps with getting/understanding our data.

Overall I don't see the website being useful until we do more work on it. Maybe we should consider taking it down until it has more utility?

query = list(srvy = "EBS") still returns AI table

I would like to narrow my queries to specific regions and am trying to get GOA tables only for a project. I tested out the region specificity by running the examples in the README. httr::get() seems to return the same table regardless of what I put in the query list. Following the example in the README, this:

res <- httr::GET(api_link)
data <- jsonlite::fromJSON(base::rawToChar(res$content))
head(data)

returns this:
image

and this:

res <- httr::GET(api_link, query = list(srvy = "EBS", year = 2018))
data <- jsonlite::fromJSON(base::rawToChar(res$content))
x <- data$items
head(x)

also returns the same thing:
image

Shouldn't that query argument mean that httr::get() returns a table that is specific to your query? This may need to be fixed in the README.

Thank you for making these tables accessible! This is very helpful.

Quick way to 0-fill CPUE tables (or option, when downloading FOSS data)

Data product requested (stratum CPUE, etc.):
CPUE by tow
Species:
All
Region (GOA, AI, Bering Sea):
All
Research team making the request (PI /requester name, and tag team members with GitHub accounts, or email address if requestor does not have a GitHub account. Division is nice too.):
@jimianelli loves FOSS...but for tow-tow CPUE data it would be good to include the zeros too. So when I select a species, and year, I get the null tows (for that species) along with the positive tows.

some columns not returned via API

Emily, this is so amazing, thanks a ton. I finally started working with it and love it, but ran into a few issues for my use case that are easy fixes:

  • Need hauljoin to be passed and it is not returned by API
  • Itis code is not returned by API
  • only start lat/lon was available by API, but we should return end lat/lon also\
  • Haul performance should perhaps be included as well (even though we already are removing "bad" tows)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.