Coder Social home page Coder Social logo

tess_scrapers's People

Contributors

03c avatar aapaolaza avatar anenadic avatar bebatut avatar dependabot[bot] avatar fbacall avatar inkuzmin avatar knirirr avatar malinahlberg avatar njall avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

tess_scrapers's Issues

IFB Events parser

We've got a materials one, but now they're implementing schema.org for events
They separate old and new

* http://www.france-bioinformatique.fr/en/evenements_upcoming for upcoming events
* http://www.france-bioinformatique.fr/en/evenements_previous for our previous events.

RDFa extractor is incorrectly extracting authors and target audiences

Not searching within nodes properly.

Probably due to overuse of "optional" flag for example in the following code:

pattern RDF::Query::Pattern.new(material_uri, RDF::Vocab::SIOC.has_creator, :author_obs, optional: true)
pattern RDF::Query::Pattern.new(:author_obs, RDF::Vocab::SCHEMA.name, :authors, optional: true)

if no author_obs were found, authors contain all the RDF::Vocab::SCHEMA.names from the whole page.

Allow updating of records

The scrapers need to be updated so that records can be updated. Ideally there will be some sort of versioning active on the main TeSS site so that the previous revision will be kept.

Add BioConductor training materials

The training materials are listed in a TSV format here;
https://github.com/Bioconductor/bioconductor.org/blob/master/etc/course_descriptions.tsv

For each material you can add the title column; keyword; instructor (as author); and URL. Extract and add the first 'material' url as the link; and any subsequent URLs as associated resources.
All can be linked with the bio.tools Tool; Bioconductor:
https://bio.tools/bioconductor

Any upcoming events can be added as events and they should have a link to the added material too!

Re-implement GOBLET API scraper

We need to bring back the old GOBLET API as the RDFa one is insufficuent
github.com/ElixirUK/TeSS_scrapers/blob/9f22a17065e0d2c97f9023568e69979f8ae68dd3/unrefactored_scrapers/goblet_api_scraper.rb
It needs putting in the new scraper framework format

Reporting of scraper errors via email not working

Could not email: From: TeSS <[email protected]>
To: TeSS <[email protected]>
Subject: Scraper Failure

It would seem that the following scrapers have failed to run:

GalaxyScraper: undefined method `each' for nil:NilClass

GobletRdfaScraper: redirection forbidden: http://www.mygoblet.org/training-portal/materials-xml -> https://www.mygoblet.orgtraining-portal/materials-xml
 | 553 5.1.8 <[email protected]>... Domain of sender address [email protected] does not exist

Prevent needless updates

Some scrapers randomly change the order of certain array fields, causing an activity log to be generated even though nothing really changed.

See: https://tess.elixir-europe.org/materials/key-terms-a-learning-game-for-conceptual-consolidation#activity_log

 scraper updated "Key-terms", a learning game for conceptual consolidation at 2017-10-05 03:10:19 UTC.
changed Remote updated date to: 2017-10-05
scraper updated "Key-terms", a learning game for conceptual consolidation at 2017-10-04 03:10:04 UTC.
changed Target audience to: ["Trainers", "Educators", "Ontologists"]
changed Remote updated date to: 2017-10-04
scraper updated "Key-terms", a learning game for conceptual consolidation at 2017-10-03 09:15:05 UTC.
changed Target audience to: ["Educators", "Trainers", "Ontologists"]
changed Remote updated date to: 2017-10-03
scraper updated "Key-terms", a learning game for conceptual consolidation at 2017-09-28 03:21:47 UTC.
changed Target audience to: ["Ontologists", "Trainers", "Educators"]
changed Remote updated date to: 2017-09-28

Integration of workflows in Bio.tools

Integrating TeSS workflows in bio.tools. The bio.tools want a read-only workflow viewer in their registry. I said we'd do it as we know the stuff well. They're using git in a private repo so I am waiting to get access. For the development of this:

  • phase 0: Add EDAM scientific topics to TeSS workflows. We'll have to extend this to include EDAM operations too at some point.

  • phase 1: They'll store some workflows in the original cytoscpae JSON format on their server. We'll add the cytoscape/TeSS-workflow Jquery library to their codebase to render it.

  • phase 2: Have them read the workflow from our API so it renders the latest version

Fix up SIB scraper

From a user regarding the SIB scraper:

Now a "last" request... could Switzerland appear in the Country drop down menu on the Events left hand side? :)
(hummm, I think Wageningen is not a country...).

BiVi scraper

BiVi - Bioinformatics Visualization - have made some RSS feeds for us. These are the three RSS feeds listed below: one event, one material, and one to be ignored for now as it seems more to be tools than TeSS content. The <description></description> element has encoded some attributes within the text in the format

#field: <value>

So these will need to be regexed out, and some are lists (e.g. keywords) so will need to be split by commas.

Ignore: http://bivi.co/visualisation-feed
Materials: http://bivi.co/presentation-feed
Events: http://bivi.co/event-feed

Text for content provider should be

About BiVi

The Biological Visualisation Network (BiVi) provides a forum for dissemination, training and discussion for life-scientists to discover and promote complex data visualisation ideas and solutions. BiVi, funded by the BBSRC, is a central resource for information on bio-visualisation and is supplemented with annual meetings for networking and educational purposes, focussed around emerging trends in visualisation and challenges facing biology.

DataCarpentry scraper failing

undefined method `include?' for nil:NilClass
/home/tess/TeSS_scrapers/app/scrapers/data_carpentry_scraper.rb:47:in `block in scrape'

Nominatim

TeSS itself is using Nominatim for Geocoder lookups:

ElixirTeSS/TeSS#478

The scrapers should perhaps be updated to operate similarly.

CSC website

Scraper hasn't run for ages.
Need to filter by tag that says 'tess'.

Archive feature on website. Click on an event that has been and gone. There's a field at the bottom called 'course materials' this gives you link to course material

nil error on SoftwareCarpentryEventsScraper

undefined method `address_components' for nil:NilClass
/home/tess/TeSS_scrapers/app/scrapers/software_carpentry_events_scraper.rb:38:in `block (2 levels) in scrape'
/home/tess/TeSS_scrapers/app/scrapers/software_carpentry_events_scraper.rb:29:in `each'

Cambridge Events

Training events from Cambridge need to be added. No API or possibility of better structured data. It'll have to be HTML :(

Create generic schema.org extractor

Separate out our schema.org extraction functionality into a separate repository in bioschemas. This will allow people to extend it for all sorts of schema types and use it in the creation of new tools.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.