datacarpentry / deprecated-cloud-genomics-orig Goto Github PK
View Code? Open in Web Editor NEWDeprecated - this repository is an outdated fork from datacarpentry/cloud-genomics
License: Other
Deprecated - this repository is an outdated fork from datacarpentry/cloud-genomics
License: Other
The image links in the 0.cloud-introduction module appear to be broken.
Is there a reason to teach tmux
over screen
?
I think screen
is on most of the systems by default, tmux
not necessarily.
Lesson 1 uses ami-6516b30e. Lesson 3 uses ami-3c1c3454 which has a different organization and contents (and also appears not to be available in the AWS Community Marketplace). I think the solution is for Lesson 3 to be updated according to the organization and contents of ami-6516b30e. Are there plans to do this? (This may be the same issue that's described in #5.)
@JasonJWilliamsNY created a new genomics image on CyVerse in a way that should be producible on any cloud: https://atmo.cyverse.org/application/images/1476
Descriptions and details are on that link, which has links to GitHub. Launching an image on CyVerse is super easy and free, so it's a great place for a workshop attendee or just a person doing lessons on their own to try things ou
The EC2 image used in the Logging onto the Cloud instructions has a default password. Its really not good practice to leave EC2 instances running with a default password in them. Without prompting many people will never change the password and they might get taken over by SSH brute force bots or (more likely) malicious co-workers.
Proposed fix https://github.com/colinsauze/cloud-genomics/commit/9f727e02335757e6556b0a9f3b210ba18b33d18a
If it's Deprecated, why not archive it?
The TSW Workshop Williams 1.2 image in Cyverse results in a Deploy_error. Unfortunately the Atmosphere interface doesn't provide more details. You can log into the instance, but itools doesn't seem to be installed.
Needs to be consistent w/ organization modules
Please, consider adding a lesson
topic to the repository. To do so you can follow the help about how to add topics to the repository. Check out the topics that the Genomics R intro lesson has gotten to add others that may be relevant to this lesson.
This will help people to know which repositories are lessons and also could be used to automate analysis of the repositories.
G'day, I am planning to teach the DataCarpentry Genomics Workshop to a group of research students and staff in a few weeks time. In the course of preparing for the workshop, I noted a few inconsistencies w/in the material that I would like to point out / receive clarification about.
First, there are two AMI's referenced in the material. One of which, ami-3c1c3454, doesn't seem to exist w/in the Community AMI's. I tried following Jason Williams updated instructions (found here https://jasonjwilliamsny.github.io/cloud-genomics/logging-onto-cloud.html), and encountered a similar problem locating ami-07b4456a.
The image I could locate and run as an instance (ami-6516b30e) doesn't seem to contain all of the .fastq data described in the rest of the lesson.
So, I guess my questions are two:
Is there a single AMI for these lessons, if so, how can I access it?
Is the difference between the AMI's referred to just the data present? If so, could I use the SRA Toolkit prefetch command to add the necessary data to ami-6516b30e and run the workshop with that?
Attached are some screenshots of the portions of the lessons which reference the AMI's I couldn't find. Thanks in advance for your time and help with my enquiry.
Kind Regards,
Collin
AMI ami-6516b30e doesn't seem to be available as a community image in AWS. Searching by Carpentry also doesn't return any results.
We often get a question of how to get data onto the instances. I don't think we have a consistent place where we cover this, and we might want to put it sooner in the lesson.
@tkteal will do this
We often get a question of how to get data onto the instances. I don't think we have a consistent place where we cover this, and we might want to put it sooner in the lesson.
Create an Amazon instance with the the environment set up for a Genomics Data Carpentry workshop
At DIBSI SC Instructor Training, we discussed the need to re-organize the files and order of the lesson to target a novice audience.
https://github.com/datacarpentry/cloud-genomics
@taylorreiter @krmaas @blasseigne @carynJohansen
Why Cloud Computing? to establish motivation: 01-Introduction2.md
+ intro to different types of platforms from 02-logging-onto-cloud.md
Introduction to commandline basics:
a) Finder/file system hierarchy (GUI) vs. commandline navigation with cd
, ls
, pwd
b) New lesson needed for Windows users to learn how to force Windows OS to show file hierarchy in explorer
c) Use existing shell lesson on files hierarchy
Connecting to remote instance: 02-why-cloud-computing.md
Data to and from instance: 03-moving-data.md
and 06-data-roundtripping.md
, + filezilla info from hbc lesson: https://github.com/hbc/dc_2016_04/blob/master/lessons/07_read_qc.md
Keep background processes running with screen
(need explanation) and tmux
, with info from this lesson: http://www.datacarpentry.org/cloud-genomics/
Single analysis: 04-single-analysis.md
Parallel analysis: 05-parallel-analysis.md
There are multiple possibilities for cloud resources for these workshops. Places hosting multiple workshops that have local resources or organizations that have resources might want to use their own. We therefore likely need parallel documentation on those resources for components of these lessons.
Likely we should keep them all in this repo and name them 01-aws-logging-in-to-cloud.md and 01-iplant-logging-in-to-cloud.md but maybe we need to think about how to manage this and what cloud resources we want to develop lessons for.
This is just an issue to collect information on what people are saying they learned and liked about this lesson.
Students learned that:
"Amazon has cloud computing resources!"
The current AMI has some truncated files which lead to issues. We're updating to have a new AMI, but for now we should add documentation on how to fix this during a workshop.
This material is generating quite a lot of interest. Would be good to have explicitly stated license.
I'm working on helping direct instructor attention towards fixing up/contributing to instructor notes. Currently don't have a link to provide for instructor notes for this lesson. Please add - even a blank document would be somewhere to point towards.
Accessing the rendered version of this lesson, both through the URL given by Github and by the datacarpentry.org custom domain, yield 404s.
It's not clear to me what the reason is? I do notice that apparently this is using redcarpet for Markdown rendering, which is being phased out by Github.
Is there any particular reason to do the introduction to Cloud computing for genomics (module 3) before introduction to command line (module 4)? Personally, and discussing this with others the command line lesson should be before.
By reading in the contents for Cloud computing (CC) module 3, I found that in
cloud-genomics/lessons/4.parallel-analysis.md
it says: As we learned in the Unix shell lesson,
However that is not until the next module.
Should this order be re-scheduled?
Put together a tutorial on how to access the Genomics Data Carpentry AMI that we can point people to if they want to do things after the workshop.
Is there anywhere documentation on how was the virtual machine prepared:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.