Coder Social home page Coder Social logo

Comments (9)

naupaka avatar naupaka commented on June 10, 2024

I modified the main dataset (the separate, un-joined portal csv files) to have errors that facilitate using the various features of OpenRefine. I have a list of the changes I made and how to fix them, and could add this to the instructor materials once I have a minute...

from openrefine-ecology-lesson.

debpaul avatar debpaul commented on June 10, 2024

Hi @naupaka will you have a chance to add the list of changes and how to fix them to the instructor materials anytime soon? I just had a quick look. I didn't see leading white spaces or words where first letter is incorrect (for showing power of clustering)?

from openrefine-ecology-lesson.

debpaul avatar debpaul commented on June 10, 2024

@naupaka @fmichonneau another challenge then, is to figure out how to incoporate other folk's contributions that make changes to improve use of current dataset (example - to clustering section). If I merge their changes in, they won't apply to this new dataset.

from openrefine-ecology-lesson.

tracykteal avatar tracykteal commented on June 10, 2024

Thanks all. I agree it would be great to use that dataset here.

from openrefine-ecology-lesson.

naupaka avatar naupaka commented on June 10, 2024

Sorry for the delay. Just got back from conference travel. Will open a PR with the changes this week.

from openrefine-ecology-lesson.

naupaka avatar naupaka commented on June 10, 2024

@debpaul also should note that I did hit on all the major features of OpenRefine in the modified datasets, including clustering, but not in all three csv files. So you might have looked at one that wasn't modified for that specific purpose.

from openrefine-ecology-lesson.

naupaka avatar naupaka commented on June 10, 2024

So I have added the things that have been modified in a list below. Additionally, all of the specific changes needed to get the 'broken' files above to be correct are available here in OpenRefine JSON format.

My approach when teaching this material is to go through each of the columns in each of the three files, looking for outliers or inconsistencies with the tools appropriate to that type (e.g. outlier plots vs character-based clustering).

I am not sure where to put all of this information at the moment. It seems to me that building the lessons around fixing these files, instead of just generically demoing the features of OpenRefine, would be ideal, but it would also require a pretty substantial rewrite of the OpenRefine materials. In my mind, this fits nicely into a sequence from Excel -> OpenRefine -> SQL, but would also be omitted if needed without disrupting the continuity of the workshop narrative.

Alternatively, a link to the 'broken' data and a list of things to fix could go in the instructor notes and it could be left at that.

Things to fix in example portal data csv files:

In surveys.csv:

  • FF instead of F
  • MM instead of M
  • m instead of M
  • f instead of F
  • Errors in year column (extra 1s, extra 0s - I tried to make it unambiguous as to how they should be fixed)
  • 9999 in weight column (this is not in the JSON since the value was just deleted manually - i.e. should be NA for R analyses, but manual one-off changes are not logged to JSON)

In species.csv:

  • 3 genus groups are off/incorrect
    • Select nearest-neighbor for clustering, and increase radius and lower number of block characters to find all problems
  • "sp" vs "sp." vs "sp. " vs "sp. " (different numbers of trailing white spaces are much more tricky to find than leading white spaces)

In plots.csv:

  • plot type naming consistency issues

from openrefine-ecology-lesson.

tracykteal avatar tracykteal commented on June 10, 2024

For this release, we'll use the data that we have. But it would be good to update this lesson to use the same data as in the other lessons in the way suggested here.

from openrefine-ecology-lesson.

villanueval avatar villanueval commented on June 10, 2024

Closing old discussion.

from openrefine-ecology-lesson.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.