Coder Social home page Coder Social logo

team2018's Introduction

TEAM2018

This is the repo for the 2018 TEAM sprint

This years topic will be:

our projects

... for the broadest definition of 'projects'. This includes our regular projects, eStep, Flagship, RSD, EU projects, etc.

Due to the fast growth of the eScience Center over the last year, it is no longer practical to jointly work on a single project in the team sprint. That's why we have decided to broaden the scope and go for several projects instead. We'll still schedule everything together in a sprint week (with standups, presentations, etc.) to keep the team building aspect. It's important to get to know your colleagues, especially since we are growing so fast.

Do you have an idea for a sprint?

Do you need the expertise of others for a week ? Is your project not moving fast enough ? Do you need to adapt an existing tool to a new project? This is your opportunity to submit an idea for a sprint!

If you have an idea, please add it to the ideas folder here:

https://github.com/NLeSC/TEAM2018/tree/master/ideas

There is a template.md you can start from.

Sprints ideas can be on any of the projects we do, provided they:

  • have a well defined goal.
  • clearly state what expertise is needed
  • contain enough work for 3-5 people for 4 days.
  • specify which project they relate to (can also be eStep, Flagship, EU etc).

Dates

The sprint dates for this year are:

  • 25-28 June
  • 24-27 September
  • 26-29 November

What hours are we writing this on?

The most often asked question during team sprints is on which project do I write the hours? For this year the rules are pretty clear: the hours go to the project that is the topic of your sprint (of course there will always be exceptions, such as multi project sprints, eStep sprints, etc.)

This does mean that it is good to involve your coordinator (and maybe the PI) when you write a sprint proposal.

Since we expect the sprints to have well defined goals, you are basically doing a month of work in four days time, and with added expertise you may not have yourself. So it should be pretty easy to convince everyone of the benefits ;-)

The first sprint

The topics for the first sprint are selected and can be found here:

https://github.com/NLeSC/TEAM2018/blob/master/june/overview.md

The second sprint

The schedule for the second sprint can be found here:

https://github.com/NLeSC/TEAM2018/blob/master/september/schedule.md

The third sprint

The schedule for the third sprint can be found here:

https://github.com/NLeSC/TEAM2018/blob/master/november/schedule.md

team2018's People

Contributors

jmaassen avatar romulogoncalves avatar a3nne avatar c-martinez avatar jspaaks avatar ipelupessy avatar arnikz avatar bpmweel avatar ridderl avatar lourensveen avatar maartenvm avatar nielsdrost avatar sverhoeven avatar

Stargazers

Ehab Ashour avatar Tom Klaver avatar

Watchers

Djura avatar  avatar Mateusz Kuzak avatar  avatar Evert Rol avatar James Cloos avatar Abel Soares Siqueira avatar Renee  avatar Stijn avatar Victor Azizi avatar Martine de Vos avatar  avatar  avatar Rob V. van Nieuwpoort avatar Atze van der Ploeg avatar  avatar Robin Richardson avatar Wouter Kouw avatar Maurice de Kleijn avatar Erik Tjong Kim Sang avatar  avatar  avatar  avatar Carsten Schnober avatar Gijs van den Oord avatar Kalai avatar Peter Kok avatar  avatar Christiaan Meijer avatar Floris-Jan Willemsen avatar  avatar Adithya Vijaykumar avatar Pablo Rodríguez-Sánchez avatar  avatar  avatar Sara Salimzadeh avatar Sander van Rijn avatar Dusan Mijatovic avatar Tom Klaver avatar Cunliang Geng avatar Sven van der Burg avatar Ronald van Haren avatar Nauman Ahmed avatar Vedran Kasalica avatar Hanno Spreeuw avatar Helge Hecht avatar Bart Schilperoort avatar Jens avatar  avatar yifatdzigan avatar  avatar Kody Moodley avatar  avatar Pablo Lopez Tarifa avatar Rena Bakhshi avatar  avatar Pranav Chandramouli avatar  avatar  avatar  avatar Lieke avatar Aron Jansen avatar Brei Soliño avatar  avatar  avatar Ji Qi avatar

team2018's Issues

NLeSC social media impact analysis

Analyze the impact of the Netherlands eScience Center on Twitter. Create data visualizations showing tweets, retweets, mentions and hastags over time. Link data to relevant events.

Case Law

We need a demo about the CaseLaw project. We need a little description about the demo.

@dafnevk

OMUSE wrapper for DALES

In the Cloud-resolving modeling project we have planning to write a software paper on the OMUSE interface for the Dutch Atmospheric Large Eddy Simulation (DALES). This MPI-parallel Fortran code is exposed as a python object through this interface and can be manipulated programmatically and dynamically from within e.g. a Jupyter Notebook. This should make setting up test cases for the program much easier and enable the application of external forcings on the system. It would be nice to actually demonstrate the value of this by re-creating dynamically forced test cases, such as the cold air outbreak, from within a python script.

EYRA benchmark platform MVP

The Nijmegen Diagnostic Image Analysis Group has created the grand-challenge.org website for hosting benchmark challenges in the medical imaging domain. We want to extend this platform to become a multi/cross-domain benchmark challenge platform. In a previous sprint, a demo version of the grand-challenge.org site was deployed on a HPC cloud server. During this sprint, we will update this instance to the latest version and start extending it to suit our needs. We will implement a full REST API, add community-related functionality, and start working on a new React-based user interface. If time permits, we will implement a demo challenge as well.

3De-e-Chem

3De-e-Chem demo

For the project http://3d-e-chem.github.io/ multiple KNIME workflows and nodes where written. They are described in 2 papers: http://dx.doi.org/10.1021/acs.jcim.6b00686 and http://dx.doi.org/10.1002/cmdc.201700754.

A Vagrant virtual machine (https://3d-e-chem.github.io/3D-e-Chem-VM/) has been made with KNIME, the workflows and nodes installed in it.

For the eScience symposium 2017 we made a couple of screencasts.
For a demo I would like use the screencasts and make a storyboard for that will lead the presenter through the virtual machine, opening and running several workflows.

AMUSE

@ipelupessy it seems there was a suggestion to work on AMUSE during the November Sprint, could you provide us some information?

UncertaintyViz

Can someone give details about this demo? To who should we assign it?

TICCLAT

The TICCLAT project is about extending TICCL, software that does ocr post correction and/or spelling correction and/or word normalization based on what word forms it sees in the corpus. In this project we want to run a number of experiments to evaluate the performance of different configurations of TICCL. The sprint will be about setting up the pipeline/infrastructure to run these experiments. We will focus on the task of OCR post-correction and have a data set available. Hopefully, we'll be able to run the baseline experiment at the end of the sprint.

Together with @egpbos

Paper: Exascale literature study

@stijnh Jason has suggested that you will be working on a paper during the Sprint. Could you give us a bit more information about it, little title (issue subject) and what will it be about?

NLeSC social media impact analysis

Analyze the impact of the Netherlands eScience Center on Twitter. Create data visualizations showing tweets, retweets, mentions and hastags over time. Link data to relevant events.

GGIR: An R package for multi-day high resolution accelerometer data analysis

Title:
GGIR: An R package for multi-day high resolution accelerometer data analysis

Abstract:
R package GGIR converts multi-day high resolution raw data from wearable movement sensors
into insightful reports for researchers investigating human daily physical activity and sleep. The
package includes a range of literature supported methods to process, clean and analyse the data and
provide day-by-day as well as weekly estimates of physical activity and sleep parameters. In
addition to the separate functions to do the different steps, the package also comes with a shell function that enables the user to process a set of input files and produce csv summary reports with a single function call, ideal for the users less proficient in R.

Editor:
Me (Vincent)

Relation with NLeSC:
Substantial parts of code were developed as part of projects we did with the University of Exeter and University College London.

Sprint objective:
I recently drafted this paper together with three domain scientists. The text itself is already fairly mature, but I could use some help with:

  1. Create (short) demonstration video of how the software works. For example, intro with demo on how data is collected, followed by screen capture summary of how to work with the software. Such a video could be a nice special feature of the paper and an extension to the existing documentation materials.
  2. Brainstorm about how to best profile the software and present the profiling results.
  3. I will bring an example movement sensor, such that one or two team members can record and analyse their own movement and sleep during the sprint week.
  4. General improvement to text, but not a major concern.
  5. Get paper ready for submission to SoftwareX, and circulate for final round of feedback.

Number of engineers needed:
My estimate is that this work can be done with a fairly small team of engineers (1 or 2 in addition to myself)

Crowd simulation + Monte Carlo methods

Goal: In order to complete an existing manuscript, we need to perform validation of the method by simulation. Sonja + 1 or 2 persons with background in analytics/statistics would be enough.

Background: We have data about estimations of concert visitors'positions using Wi-Fi technology and their smart phones in (then) Amsterdam Arena. The data comes with a lot of errors and uncertainties. We have proposed a method for estimating the crowd density out of this data. The advantage of this method over related work is that the estimations becomes more accurate as the crowd size increases. This has been shown theoretically as a proof-of concept, but we do not have enough video data to confirm experimentally. Therefore, the idea is to validate the method with simulations.

What needs to be done:

  1. Use state of the art crowd simulator to simulate concert crowd with various increasing crowd densities.
  2. Use the actual data of Wi-Fi location estimations to model the probability distributions of the errors in the estimations.
  3. Draw samples from those distributions to introduce errors/uncertainties in the simulated data through time.
  4. Apply the proposed method to estimate the crowd density.
  5. Compare the obtained crowd density with the method to the density as given by step 1.
  6. Check if the relative difference in step 5 becomes smaller as the crowd size increases. If it does (it should!), we have completed the paper :).

Allelic Variant Explorer

Allelic Variant Explorer demo

The Allelic Variation Explorer (AVE) is a web application to visualize (clustered) single-nucleotide variants across genomes.

There is a Docker image with the application and a sample dataset at https://github.com/nlesc-ave/ave-demo

This Docker images is for showing the application works, to create a proper demo a scientific storyboard has to be written.

The story I have in mind is to take a gene which encodes for some visual charactistic (color, shape, stem size etc.) of a tomato and show that different tomatos strains have different visuals.
Showe the visual difference as pictures combined with the clustering of the genomes in the explorer.

This would require some literature searches and if the gene is not included in the sample dataset, a new dataset must be constructed.

ReciPy

@jvdzwaan could you elaborate a bit on what ReciPy can do and what we could do during the Sprint?
Who else has worked with ReciPy?
What is the best label for it Software dev or Soft/Meth Paper?

Paper 3

@ridderl just for the records, could you then give a title and 2 lines summary about the paper?

SPOT demo

Sprint Name

SPOT-If-I...

Team leader

Faruk

Target project

IDARK / SPOT

Expertise required

  • Ability to use a web browser, keyboard and mouse

Size of team

3 - 5

Description

SPOT is a generic visual data analytics tool for multi-dimensional
datasets. Although, it was primarily developed for High Energy Physics project IDARK, it is a generic software that can be used in some other domains where the dataset is complex.
Currently, we have a demo using famous Titanic dataset (see http://spot.esciencecenter.nl).

Goals

  • The main goal will be to focus on finding a nice (scientific) use case and an interesting datasets for a demo
  • The demo will be used in external presentations and SPOT workshops which we are planning to organize when the materials are ready
  • Update the demo web site
  • Dockerize the demo
  • Identify missing features to be used in certain domains

Fair eWaterCycle I

The outcome of the EOSCPilot Hydro project could also lead to a paper (and a demo)

Main point is how to make an entire scientific software pipeline FAIR.

Includes Cylc, CWL, Docker, Singularity, OneData, and Notebooks.

Main author tbd.

EOSCPfL

The outcome of the EOSC Pilot for LOFAR project could also lead to a paper (and a demo).

The goal of this project is to unlock the LOFAR Long Term Archive (LTA).
It contains > 28 PB of LOFAR observations as visibility datasets with almost zero scientific output.
Almost all astronomical science starts with sky images, so these datasets have to be calibrated and imaged. But this is very labour intensive, i.e. there are a lot of steps in processing uncalibrated visibility datasets into a sky image that can be used for publication. That is a main reason why the LTA is hardly used. And that is a waste of taxpayers' money.
We want to bridge this gap by automating the processing and taking care of 70% of the work of the astronomer. So by selecting an observation from a webportal and starting processing in just a few mouse clicks, a pretty reasonable sky image will be produced. For an astronomer this should be enough to decide if it contains interesting science. In that case he/she can fine tune the processing to make a close to perfect sky image.

In the last sprint, it was shown that we could select observations directly from the archive and start processing them to coarsely calibrated compressed datasets with just a few mouse clicks.

However, it turned out that this did not include "staging" i.e. copying the observation from the LTA tapes to disk. Also, it did not include imaging the coarsely calibrated compressed datasets. These steps have to be added.

We want to add these steps, show that we can bridge the gap and unlock the LTA. This would make the LTA a much more attractive astronomical resource.

grpc-bmi-containers

A paper on how the combination of grpc, bmi, and docker, makes for a really nice way to share geo models with others (and makes them reproducable, etc)

Main author tbd.

Publications from January 2014 until June 2018

  • We need a list of all publications which are listed in the projects report from January 2014 until June 2018.
  • Check if all are in Zotero.
  • Check what is in Zotero and not in this list and if it should be there.
  • Go after publications which are not listed in the final reports.
  • Publications of project which do not have a final report, i.e., either projects for which a report was not created or are still in execution.

Ecosystems network

Please update the title and give a little description about the paper. I need to have an issue open to assign people, but I do not know the details of the work.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.