afsc-gap-products / gap_public_data Goto Github PK

Public facing data for the Groundfish and Shellfish Assessment Program. https://afsc-gap-products.github.io/gap_products/content/foss-intro.html

Home Page: https://www.fisheries.noaa.gov/foss/

R 28.91% HTML 26.71% CSS 44.25% JavaScript 0.12%

data alaska groundfish survey crab data-public noaa-fisheries open-data

gap_public_data's People

Contributors

Stargazers

Watchers

Forkers

lewis-barnett-noaa margaretsiple-noaa sampottinger

gap_public_data's Issues

Check new species data integration in public data

I took the latest worms and itis files created by @SarahFriedman-NOAA and integrated them into the newest version of the FOSS data. They are currently uploading to Oracle, which usually takes a day or two.

@SarahFriedman-NOAA, can you review how I combined the worms and itis files, applied them to the public data, and the output file in oracle RACEBASE_FOSS.FOSS_CPUE_ZEROFILLED (and/or RACEBASE_FOSS.JOIN_*) and let me know if I have used the files correctly/if any changes need to be made?
@Lewis-Barnett-NOAA, do you want to take a look and make sure this meets the needs for the west coast data join package?
@SarahFriedman-NOAA If you like, I can post or give you the code to post your latest (recognizing beta) versions of these itis and worms tables to the GAP_PRODUCTS schema along with the rest of the OLD_* tables. The code for uploading to oracle from R would look something like this:

# hopefully a temporary location for these functions
source("https://raw.githubusercontent.com/afsc-gap-products/metadata/main/code/functions_oracle.R") 

# establish oracle connection 
# you will need the GAP_PRODUCTS user/pass, which I can send you separately
channel <- oracle_connect()

# establish what files you want to upload to oracle. 
# The names of the files will be used as the files on oracle
# In this example, tables are saved in a parent directory as taxon_itis and taxon_worms, so switch those out with whatever name you like
file_paths <- data.frame(file_path = c("./taxon_itis.csv", "./taxon_worms.csv"), 
                         table_metadata = Sys.Date()) # develop a more sophisticated table metadata as you see fit, though GAP_PRODUCTS.METADATA_TABLE may be helpful. 

# upload tables to oracle
oracle_upload <- function(
    file_paths = file_paths, 
    # metadata_column = ..., # possibly borrowed from GAP_PRODUCTS.METADATA_TABLE, but a dummy table is default
    channel = channel,
    schema = "GAP_PRODUCTS")

catch_haul_cruises not found

I was running the run.R script and could go further than line 36. I got this error:

I ran the data.dl script with my Oracle credentials and then started run.R.

I would troubleshoot more intensely but I don't have tons of time rn and this seems like something you'll know faster than I! I want to run your code to get cpue tables for some of our overlapping species and compare them to the outputs from my code and the tables in RACEBASE/GOA.

Thanks!

API/query code in readme does not return all data

Related to Megsie's earlier comment, the code in the readme for accessing data by API is either incorrect or there is a problem with the API itself, as the example code to pull all the data only returns 25 records from two hauls on the 2002 AI survey

Make a better API pull process

Currently, we can use a variation on the following code to pull data, but it could be easier to use and pull more data each pull.

Approach 1: work on the partially built {fossAPI} R package:

It would be great to develop a clean, easy way to pull data from the FOSS data platform, like Alex Richardson's partially built FOSS API pull package.

API R packages that are already being used for other data streams that could be an example for our needs

Approach 2: loop through data content:
To loop through data from an API, you want to set up a loop that exits when the api call returns hasMore = false. In the loop, you will adjust your offset (e.g. offset = offset + limit), make another api call, then append the returned items to an items variable.

If you look at the output from https://apps-st.fisheries.noaa.gov/ods/foss/afsc_groundfish_survey/ you'll see the following. hasMore tells you there is more data. offset tells the API where to start to retrieve the next page of data.

"hasMore": true,
"limit": 25,
"offset": 0,
"count": 25,

So to get the next page of data, set offset = 25 (e.g., https://apps-st.fisheries.noaa.gov/ods/foss/afsc_groundfish_survey/?offset=25). Also notice that there's a links section that tells you what the next page of data says offset = 50. In your code, instead of having to keep track of what record you're on, you can just use the element 5 from the links array to request the next chunk of data.

"links": [
{
"rel": "self",
"href": "https://apps-st.fisheries.noaa.gov/ods/foss/afsc_groundfish_survey/"
},
{
"rel": "edit",
"href": "https://apps-st.fisheries.noaa.gov/ods/foss/afsc_groundfish_survey/"
},
{
"rel": "describedby",
"href": "https://apps-st.fisheries.noaa.gov/ods/foss/metadata-catalog/afsc_groundfish_survey/"
},
{
"rel": "first",
"href": "https://apps-st.fisheries.noaa.gov/ods/foss/afsc_groundfish_survey/"
},
{
"rel": "next",
"href": "https://apps-st.fisheries.noaa.gov/ods/foss/afsc_groundfish_survey/?offset=50"
},
{
"rel": "prev",
"href": "https://apps-st.fisheries.noaa.gov/ods/foss/afsc_groundfish_survey/"
}
]

cc'ing @Lewis-Barnett-NOAA and @MargaretSiple-NOAA, who would also be interested in this development.

survey years included could be revised

@EmilyMarkowitz-NOAA GOA data in FOSS tables in Oracle only include data from 1993 onward. I think we should include data from 1990 onward, unless @Ned-Laman-NOAA thinks there is good reason to stick with this cutoff. We know there were some ID issues in these years, but I think it makes sense to exclude only the 1980s data due to lack of standardization.

I checked the years included for the other surveys and they seem good. I noticed that the reduced 2018 NBS survey is included, so we leave that in we should make sure the uniqueness of this year is emphasized.

Repo website needs work

Some links go to just a map of survey areas with no links to move forward. The main page doesn't have much there that specifically helps with getting/understanding our data.

Overall I don't see the website being useful until we do more work on it. Maybe we should consider taking it down until it has more utility?

query = list(srvy = "EBS") still returns AI table

I would like to narrow my queries to specific regions and am trying to get GOA tables only for a project. I tested out the region specificity by running the examples in the README. httr::get() seems to return the same table regardless of what I put in the query list. Following the example in the README, this:

res <- httr::GET(api_link)
data <- jsonlite::fromJSON(base::rawToChar(res$content))
head(data)

returns this:

and this:

res <- httr::GET(api_link, query = list(srvy = "EBS", year = 2018))
data <- jsonlite::fromJSON(base::rawToChar(res$content))
x <- data$items
head(x)

also returns the same thing:

Shouldn't that query argument mean that httr::get() returns a table that is specific to your query? This may need to be fixed in the README.

Thank you for making these tables accessible! This is very helpful.

Quick way to 0-fill CPUE tables (or option, when downloading FOSS data)

Data product requested (stratum CPUE, etc.):
CPUE by tow
Species:
All
Region (GOA, AI, Bering Sea):
All
Research team making the request (PI /requester name, and tag team members with GitHub accounts, or email address if requestor does not have a GitHub account. Division is nice too.):
@jimianelli loves FOSS...but for tow-tow CPUE data it would be good to include the zeros too. So when I select a species, and year, I get the null tows (for that species) along with the positive tows.

some columns not returned via API

Emily, this is so amazing, thanks a ton. I finally started working with it and love it, but ran into a few issues for my use case that are easy fixes:

Need hauljoin to be passed and it is not returned by API
Itis code is not returned by API
only start lat/lon was available by API, but we should return end lat/lon also\
Haul performance should perhaps be included as well (even though we already are removing "bad" tows)

afsc-gap-products / gap_public_data Goto Github PK

gap_public_data's People

Contributors

Stargazers

Watchers

Forkers

gap_public_data's Issues

Check new species data integration in public data

catch_haul_cruises not found

API/query code in readme does not return all data

Make a better API pull process

Currently, we can use a variation on the following code to pull data, but it could be easier to use and pull more data each pull.

survey years included could be revised

Repo website needs work

query = list(srvy = "EBS") still returns AI table

Quick way to 0-fill CPUE tables (or option, when downloading FOSS data)

some columns not returned via API

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent