Comments (9)
I modified the main dataset (the separate, un-joined portal csv files) to have errors that facilitate using the various features of OpenRefine. I have a list of the changes I made and how to fix them, and could add this to the instructor materials once I have a minute...
from openrefine-ecology-lesson.
Hi @naupaka will you have a chance to add the list of changes and how to fix them to the instructor materials anytime soon? I just had a quick look. I didn't see leading white spaces or words where first letter is incorrect (for showing power of clustering)?
from openrefine-ecology-lesson.
@naupaka @fmichonneau another challenge then, is to figure out how to incoporate other folk's contributions that make changes to improve use of current dataset (example - to clustering section). If I merge their changes in, they won't apply to this new dataset.
from openrefine-ecology-lesson.
Thanks all. I agree it would be great to use that dataset here.
from openrefine-ecology-lesson.
Sorry for the delay. Just got back from conference travel. Will open a PR with the changes this week.
from openrefine-ecology-lesson.
@debpaul also should note that I did hit on all the major features of OpenRefine in the modified datasets, including clustering, but not in all three csv files. So you might have looked at one that wasn't modified for that specific purpose.
from openrefine-ecology-lesson.
So I have added the things that have been modified in a list below. Additionally, all of the specific changes needed to get the 'broken' files above to be correct are available here in OpenRefine JSON format.
My approach when teaching this material is to go through each of the columns in each of the three files, looking for outliers or inconsistencies with the tools appropriate to that type (e.g. outlier plots vs character-based clustering).
I am not sure where to put all of this information at the moment. It seems to me that building the lessons around fixing these files, instead of just generically demoing the features of OpenRefine, would be ideal, but it would also require a pretty substantial rewrite of the OpenRefine materials. In my mind, this fits nicely into a sequence from Excel -> OpenRefine -> SQL, but would also be omitted if needed without disrupting the continuity of the workshop narrative.
Alternatively, a link to the 'broken' data and a list of things to fix could go in the instructor notes and it could be left at that.
Things to fix in example portal data csv files:
In surveys.csv
:
- FF instead of F
- MM instead of M
- m instead of M
- f instead of F
- Errors in year column (extra 1s, extra 0s - I tried to make it unambiguous as to how they should be fixed)
- 9999 in weight column (this is not in the JSON since the value was just deleted manually - i.e. should be NA for R analyses, but manual one-off changes are not logged to JSON)
In species.csv
:
- 3 genus groups are off/incorrect
- Select nearest-neighbor for clustering, and increase radius and lower number of block characters to find all problems
- "sp" vs "sp." vs "sp. " vs "sp. " (different numbers of trailing white spaces are much more tricky to find than leading white spaces)
In plots.csv
:
- plot type naming consistency issues
from openrefine-ecology-lesson.
For this release, we'll use the data that we have. But it would be good to update this lesson to use the same data as in the other lessons in the way suggested here.
from openrefine-ecology-lesson.
Closing old discussion.
from openrefine-ecology-lesson.
Related Issues (20)
- Add Reconciliation section HOT 1
- clarification of filtering instructions HOT 1
- Need jump lists (anchors) for headings HOT 1
- Wording clarity suggestion
- Seems trim is no longer default when opening in 3.5.x
- Word edit suggestion in "Working with OpenRefine" HOT 1
- Missing episode: Reconciliation of Names HOT 1
- Faceting and Clustering in OpenRefine Link does not work HOT 1
- Scheduling early transition to Workbench
- Transition To Workbench in May HOT 15
- Follow up with CAC on updates HOT 2
- Bad link to OpenRefine documentation on clustering methods HOT 3
- URLs seem incorrect in setup.md for OpenRefine HOT 6
- Links need to be fixed in CONTRIBUTING.md
- Text and screenshot do not correspond
- Use "check" or "tick" consistently
- What browsers does Openrefine work on?
- What version of Openrefine should we use?
- Can't right click on a Mac HOT 2
- Project Portal Database is CC-0 not CC-BY
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from openrefine-ecology-lesson.