bids-collaborative / edam Goto Github PK

View Code? Open in Web Editor NEW

3.0 3.0 6.0 53.06 MB

Ecostations Data Access Monitor

Home Page: http://bids-collaborative.github.io/EDAM/

License: BSD 2-Clause "Simplified" License

Python 75.70% R 0.63% HTML 12.42% CSS 2.52% JavaScript 8.12% Makefile 0.61%

edam's People

Contributors

Stargazers

Watchers

Forkers

gitter-badger vhutheesing koalaboy808 jorjorbinx dfannjiang justinmi

edam's Issues

suggest to add IUCN International Union for Conservation of Nature

From a discussion with Robert Guralnick (co-author of the Meyer et al. 2015 paper about completeness), it appears that the source of expert knowledge was International Union for Conservation of Nature (IUCN, http://iucn.org). Suggest to add this to the existing comparison as a separate row with different islands metrics including (if possible) links to the data source / api call urls.

Add TravisCI Build System

(all) Add nano-bios to wiki

Please add your nano-bio to our wiki

Feedback from CEGA team

Real-world challenge

What new understanding will be obtained? What public good would be served? What business opportunities are available? Who would benefit, who are consumers/users of your data or technology? Why are you motivated to work on this project?

Ecostations Data Access Monitor (EDAM) identifies current gaps in several existing databases of species biodiversity information. Digital accessible information (DAI) is disjoint, inconsistent, and biased. The team identifies the major problems as the distance between researchers, locally available research funding and the ability/willingness to participate in data sharing networks.

The team hopes that comparing available ecological data across ecostations will help stimulate collaboration between scientists, technologists, educators, local governments and research foundations to help better understand and sustain ecosystems around them. As a result of the project, EDAM aims to help make ecological data available for specific spatio-taxonomic spaces.

It would be great if the team can offer some evidence to support their diagnosis of the main problem.
Are there any data or literature that indicates the magnitude of problem, associated factors, and possible alternatives? Could the team be able to come up with a logical frame underlying its approach toward the real world challenge? We think that it might be a strong assumption to say that “Once we provide methods to combine and process biodiversity data at global scales, institutions can start to re-examine existing data to coordinate data collection efforts, evolve data sharing strategies and discover methods to efficiently sustain ecosystems”. Specifying results chain with input-activity-output-outcome link will help the team to set up their goals in a more systematic fashion.

Data / Materials

What data sources or data collection methods have you used? Are they automatic/numerically collected, or manually collected (as with a survey)? Where / how is data stored? What will you do / have you done to get data? Is there important metadata (e.g., time of collection, source, etc.)? Why is the data you have or expect to have sufficient or insufficient to meet your challenge?

EDAM’s data are derived from openly available biodiversity data repositories (e.g. GBIF, iDigBio, GloBI). They don't store the data, but automatically query them each time they need. Initially only species lists and associated food webs are compiled for participating ecostations using automated data processing algorithms. For each ecostation, the completeness of the lists and webs are estimated. The similarity of the lists and webs are also calculated across the spatially separated island ecosystems to highlight ecological likeness.

We would suggest start from the kind of data base that provide same kind (file form, common category) of data, and dig deeper after having a systematic procedure. EDAM might need more scientists to contribute their own data to our project to share with the world.

Approach

What are the risks? What technical approaches have been explored, and which remain to be explored? Where does the prototype "run?" What methods (e.g., statistics, signal processing, transformation) and tooling (e.g., python libraries, hardware platforms) are being used/evaluated/considered? When does analysis come in. What skills are needed, and what skills remain to be learned by your team? How would someone learn these skills?

The EDAM project aims to build a platform that summarizes data, enabling data sharing to enable a wholesome, side-by-side data comparison of ecosystems. These aims would be achieved by automating data processing algorithms to compile species lists and associated food webs for participating ecostations, estimating the completeness of the lists and webs, calculating similarity of the lists and webs and creating a web accessible visualization tool that allows comparison. EDAM now has a webpage running on Jekyll on github at http://bids-collaborative.github.io/EDAM/ . At the guit hub, there are 21 closed issues and 4 open issues.

So far, EDAM’s analysis has been done manually as a prototype. If they can eventually fully or partially automate manual work, it can expedite the process. Given that the final purpose is building a large database that can support visualization selection, we’d also recommend oracle, by means of which data can be categorically and systematically stored.

Project Management

What is your first milestone? What is your current milestone? What is your "final" milestone for completing this project (even if that milestone is currently unreasonable ;). How do you evaluate progress and work remaining? How are next tasks determined and divided between team members? Who did what / what role did people play, and how could you find out (e.g., from the issue tracker, meeting notes, git commits, etc.)?

Eventually, EDAM plans to model links between climate, biodiversity, human activities and other changes over time. It chooses Islands as our tests models because they are tractable and contained. The first milestone toward integrating the data would be to provide a side-by-side comparison of existing data associated with active island ecostation communities to stimulate knowledge sharing and collaboration. For the current milestone, EDAM needs to gather more specific demand from our target users, scientists who will use and share data on our website. By the end of this initial stage, the team should achieve their goals to build a tool that enables scientists to compare ecological data.

It seems that mentor and advisor Jorrit Poelen is actively engaged with the project, and there are 3 undergraduates and 3 graduate students in the team. Their skill sets include computer science, stat, comparative biochemistry, information and visualization, data visualization, biosensors and software engineering. We would suggest that the team can put their heads together and come up with a plan on how to carry out the project after this class.

Others

Please provide any feedback you find useful! In particular, please help your partner team to identify potential problems that might cause their project to "fail." Also recommend any resources that you think may be useful. Ideally, also "execute" or "test" the current prototype. This could range from assembling the bill of materials (i.e., shopping cart) to build the hardware to actually running some code on your own computer. If this is not possible, please describe what would be necessary to do so, and why you are unable to.

For the next stage, EDAM can talk to research founders and host hackathons (or other forms of gathering) among scientists to promote the data sharing on and usage of our website. There will always be some bright idea when having a hackathon in order to select a better way to solve a problem. We’d also suggest EDAM to make a Kaggle competition on predict model building, there are always some fantastic data scientists ready for funny and challenge tasks.

score invasiveness

determine score for each species
export invasive species lists

Draft First Proposal About the Biodiversity Data Sharing Tool

Throw in ideas that our fellow teammates could think about and collect information before our next meeting to pitch the project.

Put up meeting 4 notes

Also, add your bio to the table on wiki home page

Add axis labels to plots

@Calebs97

parse input data into database

accumulate all input data into database
allow for filtering

Add collaborates to the EDAM repo

I'm Joseph Fang. github: @sherlockjjj.

Integrate maps, completeness, etc

Grab bootstrap and d3 from twitter and cloudfare or someone else

it'll save us and our users a little bandwidth

cluster similar species by feature

collapse input in webtool when results show

Set up Doodle poll for meeting

@sherlockjjj @Minsu-Daniel-Kim @jhpoelen Here is the link to the Doodle poll: http://doodle.com/poll/wm9arg685tmaxkr4

add plots from web api to webtool page

@lavanyaharinarayan

Define 100% completeness

Fix Style to Comply with Pep8

Files that still need to be fixed + added to the Makefile:

analysis/views.py
webtool/views.py

Restrict Merges with Master

toggle features

select which features to use in model and PCA

subset features
PCA loadings
feature correlations

Prototype web tool

@JinZhaoHong @vedants

set up page skeleton
determine inputs/outputs

further classification distinctions

native
endemic
invasive
non-native

review side-by-side comparison
discuss outcome/ feedback of Oct 15 presentation (see #13)
share what worked well in last weeks, and what we can improve on (communication/ distribute work)

feature names

create classes for each API
remove unnecessary fields

Directions

From conversation with Charlotte Cabasse, recalling conversation with Mattias:

Provide a tool for, e.g., CRIOBE that increases the value of their data, and reduces the burden of access, even if it's not open. A kind of index that supports a variety of access restrictions.
It would be interesting to integrate with open archaeology, folks like Eric Kansa and John Deck.
It would be productive, perhaps, to start with an island where we have "total access" like with Tetiaroa.

What is the narrative that excites, e.g., the IDEA leads about such a tool?

Read Meyer Paper

Goals for the Week: 9/15 - 9/21

Identify open data sources (GBIF)
Identify similar initiatives around the world (map of life (MOL))
Write reflections on the issues being talked about in the paper available at http://dx.doi.org/10.1038/ncomms9221

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.