Coder Social home page Coder Social logo

audubon-cbc's Introduction

audubon-cbc

"For the bird counters"

The general questions we are trying to answer are, "what are the geographic, socioeconomic, or climatic correlates of different types of CBC participation and effort? Is it possible to model and predict participation and effort?". Keep in mind that some years, counts could be cancelled due to nasty weather. There will not be a record for that count during that year. Field counters, parties, hours, and distances make up the largest chunks of effort and are the most interesting to us.

Research Questions:

1. What relationship, if any, exsists between geographic, socioeconomic, or climatic correlates features and CBC participation and effort for the chirstmas birdcouter?

Volenteers For this Research Question will produce descriptive anaylsis and predictive modeling to examine if a relationship can be determined.

2. Should the Audubon society continue to collect weather data during the Christmas Bird Count?

Volenteers will 1) Examine the weather data avilible in the cbc data for completness and trends, 2) connected with a source of weather data (NOAA has been suggested) to determine the past accuraccy of weather data recorded by particpants, 3) make a determination on if weather data provided by cbc participants is valuble to collect.

Volenteer Quick Start for June 13th

  1. Access the Most Current Data Here: bird_count_cleaned_may_29_2019.csv Data Dictonary: http://www.audubon.org/sites/default/files/documents/cbc_report_field_definitions_2013.pdf

  2. If a python user, Clone and branch your own copy of the ipython notebook used to clean the data: https://github.com/DataKind-DC/audubon-cbc/blob/master/audubon_eda.ipynb (audubon_eda.ipynb)

Progress So Far

DataKind has collected data from the Audobon Socity and did a pass at cleaning the data. The cleaned Data can be found here:

bird_count_cleaned_may_29_2019.csv Notes from Data Cleaning:

  • All the numeric values appear to possible (perhaps with exception of the maximum number of hours for some tasks.
  • Note that impossible values for derived columns with imperial and metric suffixes were replaced with NaN. The original column values were not replaced (e.g. max_wind wasn't replaced).
  • Outliers remain.
  • Also note that missing values were not imputed. Depending upon the variable of interest and the analysis, missing values might want to be treated various ways.

Known Next Steps

- Data Cleaning

The previous Team of Volenteers has left a todo list to finish cleaning the cbc data TODOS:

  • Numeric data (temperature, snow, wind, distance):

    • Convert to same unit (Done!)
    • Impute missing values and Invesitgate Outliers
      • TASK: Need to invesigate the Outliers and determine if they are most likly typos or data
    • Standardize/normalize
  • For categorical data:

    • TASK: Encode appropriately
      • Composites -> if no 4 then take average and treat as new category, 4 trumps all

Logitutial Check

  • Age of circles - older circles are more numerous

    • TASK: Check for Outlieres Any outliers for these
    • TASK: Question: Should All circles be used or should we limit the circles in the final data set
  • Open Questions:

    • what does snow_unit stand for?
    • What should we do with outliers for snow
    • what is wind unit? direction?
    • Suggestion: remove large outliers?
    • What are temperature units

- Weather Data API

For the weather data, DataKind needs to understand what weather data is avilible for the past based on lat long data (that we have from the cbc data file)

The NOAA api should be good for this.

The smallest and most precise unit in the NOAA data is for a Station. Via the NOAA api we can list of all the stations and their lat long data. I think the next task will be to write a script that finds the closest NOAA station to each of the circles in the data set.

To get started with the NOAA API, this site is pretty good. https://grantwinney.com/day-7-noaa-api/.

- Decriptive and Model Design

Given the data types and research questions for the Particpation Research Question, it would be benifical to have volenteers design table shells for metrics that would help answer the research question and model designs they would like to see implemented. These shells and designs should include notes on any data aggrgations or transformations that would be required by a tech side.

Most Current Data

Data with columns in imperial and metric units avaiable here:

bird_count_cleaned_may_29_2019.csv

Data Dictonary: http://www.audubon.org/sites/default/files/documents/cbc_report_field_definitions_2013.pdf

  • All the numeric values appear to possible (perhaps with exception of the maximum number of hours for some tasks.
  • Note that impossible values for derived columns with imperial and metric suffixes were replaced with NaN. The original column values were not replaced (e.g. max_wind wasn't replaced).
  • Outliers remain.
  • Also note that missing values were not imputed. Depending upon the variable of interest and the analysis, missing values might want to be treated various ways.

Resources

Read these brief pages for background:

https://www.audubon.org/conservation/join-christmas-bird-count

https://www.audubon.org/christmas-bird-count-compiler-resources

http://www.audubon.org/sites/default/files/documents/updated_compilers_manual_jan_2013.pdf

Some column descriptions here (not comprehensive): http://www.audubon.org/sites/default/files/documents/cbc_report_field_definitions_2013.pdf

Note the weather columns are often not absolutes. Excerpt:

  • MIN_TEMP - Low temp of the count day to the nearest degree, in degrees F
  • MAX_TEMP - High temp of the count day to nearest degree in F
  • MIN_WIND - Representative lower range of wind speed on count day; this is not the minimum wind speed observed on the count day; stored in MPH
  • MAX_WIND - Representative upper range of wind speed on count day; this is not the max wind speed observed on the count day; stored in MPH
  • MIN_SNOW - Representative lower range for snow depth, reported to the nearest 0.25 inches; reported as 0 if there was no snow; stored in inches -MAX_SNOW - Representative upper range for snow depth, reported to the nearest 0.25 inches; reported as 0 if there was no snow; stored in inches

audubon-cbc's People

Contributors

discdiver avatar minh5 avatar rectheworld avatar williamratcliff avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.