Coder Social home page Coder Social logo

sorse-data-filter's Introduction

sorse-data-filter

This project manages the visibility of data and automatises several workflows centred around the abstract submission process of SORSE – A Series of Online Research Softare Events. It is our aim to ensure that no connection between personal data and diversity information can be derived from data exported from the Indico event management platform.

Input of the tool is the exported data from the Indico system. Via a command line interface a user can pick one of the available workflows. For each workflow, allowed fields from the import data is specified along with the output format that is required.

The tool currently supports the following automatised workflows:

  • website: export of Markdown-formatted data after acceptance to the website
  • scheduling: export of data to a google drive to support the scheduling process

Further workflows are planned including:

  • rejected: export of information on rejected abstracts
  • mentoring: export of data to a google drive to support the mentoring process
  • statistics: export of statistical data to a google drive

Configuring workflows

The export of data is centred around the concepts of allowing specific data fields and providing formatting with the help of the templating engine Jinja2.

Configuring allowed fields

Allowed fields within a workflow are specified in the workflows.yaml file. For each workflow an entry is created that holds a section for allow_lists. The entries in this field follow the model specified in models and include the contribution itself, persons and questionnaires holding specific information for each contribution type as well as diversity information.

Configuring templates

Templates that can be used are contained in the templates folder. For example the template for the website workflow is templates/website.md. Available templating constructs can be found in the documentation of Jinja2. In principle, all different file formats can be configured with this templating engine.

The name of a template for a given workflow is given via the field output_template in the configuration file.

sorse-data-filter's People

Contributors

eileen-kuehn avatar chilipp avatar

Stargazers

PEP 8 Speaks avatar

Watchers

James Cloos avatar Max Kühn avatar  avatar

sorse-data-filter's Issues

Where to write data for rejected contributions?

I currently hesitate exporting data of rejected contributions into the spreadsheet for scheduling.
We could either

  • create another spreadsheet or
  • create another sheet within the spreadsheet.

Export Scheduling Data

For making a proper scheduling proper information of accepted abstracts is needed to design an inclusive and effective programme for SORSE.

  • Create allow_list for scheduling following defined data visibility
  • Export data to google drive

My current plans for scheduling is to create a document or spreadsheet in the google drive. We might consider having one document for all tracks and also all submission rounds. So appending new information to the end of the document. However, I am still hesitating on how it might work best.

Add event ID

As far as I remember, each submission get's a unique ID in Indico, right? Can we add this to the YAML front-matter and include it in the generated markdown? I.e. something like event-{{ eventID }}.md?

Updates to Indico form

  1. Remove the streaming question completely.

  2. Change the recording question to this -- 'I agree that my contribution may be recorded, the recording published and my talk streamed on youtube or similar'.

  3. Under that same recording question add this sentence 'We will stream the talk on Youtube or similar for diversity, inclusivity and accessibility reasons.' Please add after the first sentence and before the sentence starting with 'For each contribution'.

  4. Remove 'Track' if possible because we already have this question as 'Contribution type' further up the form

  5. Remove 'Latest Delivery date'

  6. Move all the talk questions to above the workshop questions, starting under 'Gender' so that the talk questions are first

  7. Under 'Posters Mentoring' the button needs to change from 'Montoring wanted' to 'Mentoring wanted' (spelling mistake)

  8. Under 'Panelists' - please change 'BY PROVIDING THESE NAMES YOU CONFIRM THAT THESE INDIVIDUAL HAVE AGREED TO PARTICIPATE AND HAVE GIVEN PERMISSION FOR YOU TO SHARE THEIR INFORMATION.' To 'BY PROVIDING THESE NAMES YOU CONFIRM THAT THESE INDIVIDUALS HAVE AGREED TO PARTICIPATE AND HAVE GIVEN PERMISSION FOR YOU TO SHARE THEIR INFORMATION.'

  9. Please change 'In case you seek for mentoring' to 'More details about mentoring needs'

  10. Under 'Posters only, mentoring' please change this text 'We aim to provide one-on-one mentoring to support inexperienced applicants who's poster is accepted. Please provide further information below.' to 'We aim to provide one-to-one mentoring to support inexperienced applicants whose poster is accepted. Please provide further information below.'

Add registration url

Each event has an individual url to register as a participant. Can we add this URL here?

Export Mentoring Data

For organising mentoring, information are required who actually needs mentoring.
My current plan is to export mentoring information to the Google Drive.

  • Create allow_list information for mentoring
  • Export to Google Drive

I currently still don't know if having a document or spreadsheet is the best option. I suppose a spreadsheet. Each entry then should contain a link to the website to allow checking for information such as the abstract.

Add possibility to filter abstracts

Currently, we cannot configure in which state abstracts are relevant for given workflows. This should be added as a filter in the main configuration file.

fix YAML encoding of &speaker

I just realized,

authors:
    - &speaker name:  someone
author: *speaker

does not work because *speaker renders to name. It needs to be

authors:
    - &speaker 
      name:  someone
author: *speaker

Allow optional fields when traversing

The questions for the different contribution types may differ. But the allow_list currently handles all the different types as each contribution is directly tight to one contribution type. Thus, it may happen that an allowed field is not available as another contribution type is processed. For these cases it should become possible to define optional types during traversing the namespace.

Group affiliations in website template

@eileen-kuehn: related to SORSE/sorse.github.io#248, do you have any problems if we group the affiliations in the YAML front-matter? This would make it easier in the PDF generation workflow and it's also easy to implement in the website. Instead of

authors:
    - name: Dr. Jack Brookes
      bio: University Health Network
    - name: Mr. Matthew Warburton
      bio: University Health Network
    - name: Prof. Mark Mon-Williams
      bio: University Health Network
    - name: Dr. Faisal Mushtaq
      bio: University Health Network

we would then have

authors:
    - name: Dr. Jack Brookes
      affiliation: 1
    - name: Mr. Matthew Warburton
      affiliation: 1
    - name: Prof. Mark Mon-Williams
      affiliation: 1
    - name: Dr. Faisal Mushtaq
      affiliation: 1
affiliations:
    - name: University Health Network
      index: 1

If you like, I can make a PR to do this. From an OOP perspective you likely don't want the affiliation number be a part of the Person class because the latter should work stand-alone. But we can avoid this and do everything with Jinja2 in the website.md template. I'd just need add a property named affiliations to the Contribution class, something like

@property
def affiliations(self):
    """Unique affiliations of the contributors"""
    affiliations = []
    for person in self.persons:
         if person.affiliation and person.affiliation not in affiliations:
            affiliations.append(person.affiliation)
    return affiliations

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.