Coder Social home page Coder Social logo

datacarpentry / openrefine-ecology-lesson Goto Github PK

View Code? Open in Web Editor NEW
24.0 18.0 114.0 12.98 MB

Data Cleaning with OpenRefine for Ecologists

Home Page: https://datacarpentry.org/OpenRefine-ecology-lesson/

License: Other

carpentries data-carpentry lesson openrefine data-management data-cleaning english ecology stable

openrefine-ecology-lesson's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

openrefine-ecology-lesson's Issues

Install check for OpenRefine on Windows

Windows

  • [OK] Check that you have Firefox or Chrome browsers installed and set as your default browser. OpenRefine runs in your default browser. It will not run correctly in Internet Explorer.
  • [More explanation needed] Download software from http://openrefine.org
    --It is not clear which release of OpenRefine you want the participant to download. There are now 5 releases of OpenRefine available for download under the "Offcial Distribution' page!
  • [OK] Unzip the downloaded file into a directory by right-clicking and selecting “Extract…”. Name that directory something like OpenRefine.
  • [OK] Go to your newly created OpenRefine directory.
  • [Need to concatenate this bullet with next one] Launch OpenRefine; ......
  • [Error in file name and incomplete instruction: Suggested rewrite follows] Click the openrefine.exe (this will launch a command prompt window first, but you can ignore that and wait for OpenRefine to launch in the web browser, which is where you will interact with the program)
  • [-OK-] If you are using a different browser, or OpenRefine does not automatically open for you, point your browser at http://127.0.0.1:3333/ or http://localhost:3333 to launch the program.

Add lesson on services

The current lesson on services wasn't set up to work with a service, so it needed more information and a platform to use. I took that lesson out for now, but it would be good to add a lesson on services with OpenRefine back in.

Some rephrasing in 01-working-with

"In this first step, we’ll browse our computer " could be rephrased. I think the intent here is to look at the file and what it looks like, before opening it in OpenRefine.

Not sure if this is necessary, but it is good to describe the additional columns that were added

Leading-trailing whitespaces - no errors

The section on splitting columns and removing leading/trailing whitespaces says:

This will reveal an error in a few names that have spaces at the beginning (so-called leading white space).

Using this dataset, I saw no errors when I split the scientificName column. Following the instructions for removing the whitespace led to "text transform on 0 cells" for both new columns. If we want learners to actually see this error, need to add whitespaces to original file.

Coincidentally, the lesson notes say the Refine feature is "Remove leading and trailing whitespace", but as of version 2.6.beta .1, it's actually called "Trim leading and trailing whitespace".

Additional component for validating and enhancing data using GBIF (or another 3rd party authority)

I'm interested in extending this lesson add two more objectives: validating data (such as species name) against a 3rd party authority (e.g., GBIF) and then supplementing the data with additional parameters associated with that genus species from the third party.

GBIF actually offers some Refine lessons on this skill set - see, for example, https://docs.google.com/document/d/1tkDRXlYhmassYAk5T4v5oac5prF0jAiSMr_JEGTvhRo/edit

My plan is to try to integrate these GBIF tutorials into the Data Carpentry lesson.

Readme.md got its content overwritten by a blank version of the readme template

In https://github.com/dlstrong/OpenRefine-ecology-lesson/commits/gh-pages/README.md 's comment history, there's a discussion about whether to convert .md endings to .html endings.

At some point, though, someone did a revert that took a page with about 111 lines of content and replaced it with one that's the blank starter template. (I don't yet know enough Git to figure out exactly what happened, but the history has got a lot more stuff in it than what's showing in the readme right now.)

Here's what the readme shows right now:

image

It looks like that needs a different type of reverting to get the content back?

add content to services lesson

The services lesson 04-services.md needs content. @nickynicolson put in a nice PR #9 but noted that with our current data, it requires us to use a Google Maps API, which requires registration. GBIF would be an option that doesn't require a registration, but we would need to update the data.

This issue is for after the current planned lesson release, and we would need to update the data as well as the services lesson.

Not currently enough content for time allocated

There is currently not enough content in this lesson for the time allocated. Addressing #4 might help.

This is something important to consider. We either need to add new content or adjust the time allocated.

step-by-step sections as code blocks?

In 01-working-with it might be good to have the step-by-step sections formatted as code blocks. It would make them more obvious to see and also help indicate that learners are supposed to do those things along with the instructor.

Typo and formatting in 02-filter-exclude-sort

Under, "Excluding Entries"

"This will explicitly include this specie." Should be species

Also, the "Excluding Entries" header font is smaller than Filtering or Sort, is this intended? Is it a subtopic of "filtering"? I can't tell.

First exercise of 01-working-with-openrefine has a unexplained concept

The second task in the first exercise

Is the column formatted as Number, Date, or Text? How does chaning the format change the faceting display?

This task introduces the concept of the format of a column without explaining it. Also, the user has not previous been show how to change the format.

Fix "Sorting by multiple columns." 02-filter-exclude-sort

Two fixes:

  1. Remove the period at the end of the header
  2. To me, the first challenge "Try sorting by a year after you have sorted by month. What happens to ordering?" is not clear. As well the second sentence should read "What happens to the ordering?"

use firefox as default browser for Windows users

A learner was using windows 10 and internet explorer. However, OpenRefine has some issues with internet explorer, so we should add a note in the installation instructions that learners may need to run openrefine from firefox (either by changing their default browser, or by opening firefox and go to http://127.0.0.1:3333)

index.md references requirements but none listed

The front page states that "working through this lesson requires working copies of the software described below", but there is nothing listed. Install instructions for OpenRefine should go here.

openrefine file types

01-working-with-openrefine says:

Note the file types Open Refine handles: TSV, CSF, *SV, Excel (.xls .xlsx), JSON, XML, RDF as XML, Google Data documents. Support for other formats can be added with Google Refine extensions.

I'm pretty sure "CSF" is meant to be "CSV". CSF is apparently used in Adobe for setting color settings. http://fileinfo.com/extension/csf

Previous and Next links aren't working

When I try to use the Previous and Next links at the bottom of lesson pages like https://github.com/datacarpentry/OpenRefine-ecology-lesson/blob/gh-pages/01-working-with-openrefine.md , it 404s on me because the Previous and Next links are expecting .html extensions rather than .md extensions.

This seems to be distributed across the whole lesson.

I'm not sure if it's deliberate - if .md files get automatically renamed .html when they're displayed via another system that calls them in , for example - or if it was an accidental oversight when a file type conversion took place?

Please let me know whether this is a bug or a feature, so to speak. If it's a bug I can go make some commits and pull requests but I want to be sure I'm doing the right thing! (Brand new instructor applicant here, and not familiar with anything other than Git's web interface.)

Browsers used by participants - Localhost vs 127.0.0.1

In a recent workshop some time was spent struggling to get OpenRefine started on participants' laptops with one unique error coming up which relates to 00-getting-started.html and spesifically:

If you are using a different browser, or OpenRefine does not automatically open for you, point your browser at http://127.0.0.1:3333/ to launch the program.

Possible issues with the host file on Windows caused that OpenRefine could only open using localhost instead of 127.0.0.1. The instructor came across the solutions, but only after some time. It may be useful to add the the following text "at http://127.0.0.1:3333/ or http://localhost:3333" to the instructor notes & lesson itself it may save some time for the instructor and participants.

Update mention of dropbox in 01-working-with

If we move the data off of Dropbox, or maybe even if we don't, we can update the sentence

"However, this will not work on a dropbox address as above." maybe to, "however this won't work for all URLs"

Data location

Ideally the data for this workshop would be included in the Portal teaching database or somewhere other than Dropbox.

Create a "Date" column

So, some notes I gathered when using this lesson to prepare a class.

The transform [cells["yr"].value, cells["mo"], cells["dy"].value].join("-") will create the date as a yyyy-mm-dd value, which can be transform to a date in order to facet by time.

Is it worth mentioning?

OpenRefine & loopback adapter in Win10

If the loopback adapter isn't installed in the Windows instance - for example, on a new laptop - then 127.0.0.1 isn't routeable by the web browser and OpenRefine will look like it's starting up, but won't connect as 127.0.0.1 is hard-coded into all the connectivity bits.

It's fixable by installing the loopback adapter (https://technet.microsoft.com/en-gb/library/cc708322(v=ws.10).aspx) or by changing 127.0.0.1 to "localhost" in the address bar, but this can cause problems when uploading data from the client to the OpenRefine server.

This caused quite a bit of confusion in a workshop where a participant had a Win10 laptop and I spent a little while trying to work out why I couldn't even ping 127.0.0.1 from the command line.

Downloading data instructions

If we're keeping data where it is, it would be good to have further instructions on download, because you need to Ctrl or right click to download rather than open the data.

Which learning objectives should we assess?

Greetings everyone,

I'd appreciate your feedback on my current project. We've successfully revised the learning objectives for all of the Ecology lessons to reflect what we are teaching, and now we are in the process of developing surveys to assess our learners (before and after the workshop) on their skills and self-efficacy for the tools they were taught.

Your feedback is extremely valuable. In this document I've added the learning objectives for all of the Ecology lessons. I would appreciate if you'd open this document and add a +1 to the learning objectives you think are most important to assess our learners.

Additionally, I'm scheduling a virtual meeting to review the objectives, discuss your +1's, and come to a consensus. Please provide your availability here.

Our goal isn't to change what we're teaching, but to better understand what we're teaching, and ensure our learning objectives reflect that, so that we can assess our learners.

Thank you for your feedback and your time.
Kari

P.S. Thank you so much to all the maintainers who helped revise the learning objectives!

@debpaul @tracykteal @QEDan @ctb @ErinBecker

Addition of screen-shots

Any thoughts on addition of screenshots to this lesson, as the subject of the lesson is a GUI tool? Slight precedent re use of screenshots (SQLLite) in the SQL lesson, but needs to be balanced against the emphasis on "live coding" - screenshots probably more helpful for people following the material outside of a live workshop.

Take out section of 01-working-with

Can remove:

"
[This part should probably be in 03-scripts.md file with follow-up: When doing this demo, the columns: JSON, decimalLatitude, decimalLongitude can be deleted, and then recreated if time, with a call to a locality service, and subsequent parsing of the JSON data returned by the service.]
"

More description of what faceting is

There's a good description of how to do faceting in 01-working-with, but there could be more of a description of what faceting is conceptually as an introduction.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.