datacarpentry / openrefine-ecology-lesson Goto Github PK
View Code? Open in Web Editor NEWData Cleaning with OpenRefine for Ecologists
Home Page: https://datacarpentry.org/OpenRefine-ecology-lesson/
License: Other
Data Cleaning with OpenRefine for Ecologists
Home Page: https://datacarpentry.org/OpenRefine-ecology-lesson/
License: Other
Add a section 'Additional resources' or 'References'
Eg. ODI material:
http://training.theodi.org/InPractice/Day_1/
http://training.theodi.org/InPractice/Day_3/
As described in datacarpentry/python-ecology-lesson#45.
Episodes seem to use "OpenRefine" and "Refine" interchangeably. This could be confusing for learners and should be standardized.
Same as issue as datacarpentry/spreadsheet-ecology-lesson#74
Same issue as datacarpentry/spreadsheet-ecology-lesson#71
Windows
The first two paragraphs in the 'Cluster' section of 01-working-with have some redundant language.
The current lesson on services wasn't set up to work with a service, so it needed more information and a platform to use. I took that lesson out for now, but it would be good to add a lesson on services with OpenRefine back in.
"In this first step, we’ll browse our computer " could be rephrased. I think the intent here is to look at the file and what it looks like, before opening it in OpenRefine.
Not sure if this is necessary, but it is good to describe the additional columns that were added
The section on splitting columns and removing leading/trailing whitespaces says:
This will reveal an error in a few names that have spaces at the beginning (so-called leading white space).
Using this dataset, I saw no errors when I split the scientificName column. Following the instructions for removing the whitespace led to "text transform on 0 cells" for both new columns. If we want learners to actually see this error, need to add whitespaces to original file.
Coincidentally, the lesson notes say the Refine feature is "Remove leading and trailing whitespace", but as of version 2.6.beta .1, it's actually called "Trim leading and trailing whitespace".
I'm interested in extending this lesson add two more objectives: validating data (such as species name) against a 3rd party authority (e.g., GBIF) and then supplementing the data with additional parameters associated with that genus species from the third party.
GBIF actually offers some Refine lessons on this skill set - see, for example, https://docs.google.com/document/d/1tkDRXlYhmassYAk5T4v5oac5prF0jAiSMr_JEGTvhRo/edit
My plan is to try to integrate these GBIF tutorials into the Data Carpentry lesson.
In https://github.com/dlstrong/OpenRefine-ecology-lesson/commits/gh-pages/README.md 's comment history, there's a discussion about whether to convert .md endings to .html endings.
At some point, though, someone did a revert that took a page with about 111 lines of content and replaced it with one that's the blank starter template. (I don't yet know enough Git to figure out exactly what happened, but the history has got a lot more stuff in it than what's showing in the readme right now.)
Here's what the readme shows right now:
It looks like that needs a different type of reverting to get the content back?
The services lesson 04-services.md needs content. @nickynicolson put in a nice PR #9 but noted that with our current data, it requires us to use a Google Maps API, which requires registration. GBIF would be an option that doesn't require a registration, but we would need to update the data.
This issue is for after the current planned lesson release, and we would need to update the data as well as the services lesson.
There is currently not enough content in this lesson for the time allocated. Addressing #4 might help.
This is something important to consider. We either need to add new content or adjust the time allocated.
In 01-working-with it might be good to have the step-by-step sections formatted as code blocks. It would make them more obvious to see and also help indicate that learners are supposed to do those things along with the instructor.
Under, "Excluding Entries"
"This will explicitly include this specie." Should be species
Also, the "Excluding Entries" header font is smaller than Filtering or Sort, is this intended? Is it a subtopic of "filtering"? I can't tell.
The rodent data comes from Dropbox. Dropbox links are changing https://www.dropbox.com/help/16
Check that this link will work for the foreseeable future OR consider moving data into the repo. The file is 5MB.
This question is worth asking of any external links.
The second task in the first exercise
Is the column formatted as Number, Date, or Text? How does chaning the format change the faceting display?
This task introduces the concept of the format of a column without explaining it. Also, the user has not previous been show how to change the format.
Two fixes:
A learner was using windows 10 and internet explorer. However, OpenRefine has some issues with internet explorer, so we should add a note in the installation instructions that learners may need to run openrefine from firefox (either by changing their default browser, or by opening firefox and go to http://127.0.0.1:3333)
The front page states that "working through this lesson requires working copies of the software described below", but there is nothing listed. Install instructions for OpenRefine should go here.
01-working-with-openrefine says:
Note the file types Open Refine handles: TSV, CSF, *SV, Excel (.xls .xlsx), JSON, XML, RDF as XML, Google Data documents. Support for other formats can be added with Google Refine extensions.
I'm pretty sure "CSF" is meant to be "CSV". CSF is apparently used in Adobe for setting color settings. http://fileinfo.com/extension/csf
When I try to use the Previous and Next links at the bottom of lesson pages like https://github.com/datacarpentry/OpenRefine-ecology-lesson/blob/gh-pages/01-working-with-openrefine.md , it 404s on me because the Previous and Next links are expecting .html extensions rather than .md extensions.
This seems to be distributed across the whole lesson.
I'm not sure if it's deliberate - if .md files get automatically renamed .html when they're displayed via another system that calls them in , for example - or if it was an accidental oversight when a file type conversion took place?
Please let me know whether this is a bug or a feature, so to speak. If it's a bug I can go make some commits and pull requests but I want to be sure I'm doing the right thing! (Brand new instructor applicant here, and not familiar with anything other than Git's web interface.)
'chaning' should be 'changing'
There could potentially be an exercise for the Undo/Redo section in 01-working-with
An OpenRefine tutorial using the Grateful Dead is really nice.
https://github.com/scottythered/gratefuldata/wiki
Would be good to include in our list of other resources
In OpenRefine, sorting does not change the data order itself, which is different than sorting in most spreadsheet programs. Under the Sort > menu there is an option to "Reorder rows permanently" if you wish to change the order. I think this is worth explaining in the lesson.
In a recent workshop some time was spent struggling to get OpenRefine started on participants' laptops with one unique error coming up which relates to 00-getting-started.html and spesifically:
If you are using a different browser, or OpenRefine does not automatically open for you, point your browser at http://127.0.0.1:3333/ to launch the program.
Possible issues with the host file on Windows caused that OpenRefine could only open using localhost instead of 127.0.0.1. The instructor came across the solutions, but only after some time. It may be useful to add the the following text "at http://127.0.0.1:3333/ or http://localhost:3333" to the instructor notes & lesson itself it may save some time for the instructor and participants.
There could be some more information about what data cleaning is and why it's important and how OpenRefine can help in the 00-getting-started lesson.
Some good ideas could come from
http://programminghistorian.org/lessons/cleaning-data-with-openrefine
http://thomaspadilla.org/dataprep/
If we move the data off of Dropbox, or maybe even if we don't, we can update the sentence
"However, this will not work on a dropbox address as above." maybe to, "however this won't work for all URLs"
The exercises due take some time :)
Ideally the data for this workshop would be included in the Portal teaching database or somewhere other than Dropbox.
Update CITATION file with how to cite
So, some notes I gathered when using this lesson to prepare a class.
The transform [cells["yr"].value, cells["mo"], cells["dy"].value].join("-")
will create the date as a yyyy-mm-dd value, which can be transform to a date in order to facet by time.
Is it worth mentioning?
Update contact information in _config.yml to go to [email protected]
If the loopback adapter isn't installed in the Windows instance - for example, on a new laptop - then 127.0.0.1 isn't routeable by the web browser and OpenRefine will look like it's starting up, but won't connect as 127.0.0.1 is hard-coded into all the connectivity bits.
It's fixable by installing the loopback adapter (https://technet.microsoft.com/en-gb/library/cc708322(v=ws.10).aspx) or by changing 127.0.0.1 to "localhost" in the address bar, but this can cause problems when uploading data from the client to the OpenRefine server.
This caused quite a bit of confusion in a workshop where a participant had a Win10 laptop and I spent a little while trying to work out why I couldn't even ping 127.0.0.1 from the command line.
An introduction to the OpenRefine interface could be added to 00-getting-started, like we have for RStudio in the R lesson
If we're keeping data where it is, it would be good to have further instructions on download, because you need to Ctrl or right click to download rather than open the data.
Typo in "Launch OpenRefine (see Getting Started with OpenRefine."
@naupaka created modified versions of the surveys.csv, plots.csv, and species.csv that can be cleaned up with OpenRefine. I think it would make sense to use these files instead of the file the lessons currently uses as they are simple modifications from what the other lessons use.
The files are here: https://github.com/datacarpentry/2015-05-14-wsu/tree/gh-pages/Workshop_Files/data/raw/tofix
Greetings everyone,
I'd appreciate your feedback on my current project. We've successfully revised the learning objectives for all of the Ecology lessons to reflect what we are teaching, and now we are in the process of developing surveys to assess our learners (before and after the workshop) on their skills and self-efficacy for the tools they were taught.
Your feedback is extremely valuable. In this document I've added the learning objectives for all of the Ecology lessons. I would appreciate if you'd open this document and add a +1 to the learning objectives you think are most important to assess our learners.
Additionally, I'm scheduling a virtual meeting to review the objectives, discuss your +1's, and come to a consensus. Please provide your availability here.
Our goal isn't to change what we're teaching, but to better understand what we're teaching, and ensure our learning objectives reflect that, so that we can assess our learners.
Thank you for your feedback and your time.
Kari
P.S. Thank you so much to all the maintainers who helped revise the learning objectives!
Any thoughts on addition of screenshots to this lesson, as the subject of the lesson is a GUI tool? Slight precedent re use of screenshots (SQLLite) in the SQL lesson, but needs to be balanced against the emphasis on "live coding" - screenshots probably more helpful for people following the material outside of a live workshop.
@debpaul first created this lesson for Data Carpentry, so I've asked if she can add that lesson so we can start the attributions appropriately.
Links appear at the bottom of some lesson pages. These are no longer needed now that the template includes navigation arrows.
Can remove:
"
[This part should probably be in 03-scripts.md file with follow-up: When doing this demo, the columns: JSON, decimalLatitude, decimalLongitude can be deleted, and then recreated if time, with a call to a locality service, and subsequent parsing of the JSON data returned by the service.]
"
There's a good description of how to do faceting in 01-working-with, but there could be more of a description of what faceting is conceptually as an introduction.
At the bottom of the "Working with OpenRefine" page the next page for "Scripts for OpenRefine" returns a "404 Page not found" below is the link:
http://www.datacarpentry.org/OpenRefine-ecology-lesson/02-scripts.html
The template format for instructor notes (guide.md) should be followed in this lesson
This is the template
https://github.com/swcarpentry/lesson-example/blob/gh-pages/_extras/guide.md
In different places in the lesson, OpenRefine is sometimes 'Refine' or 'Open Refine'. Check for consistency and use 'OpenRefine' as in http://openrefine.org
Inconsistency pointed about by @nickynicolson
#9
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.