The tess_scrapers from elixirtess

BITS VIB events (again)

The BITSVIB people are writing JSONLD expressions of all their things.
Would be worth replacing the existing BITS scraper with one for this feed. http://dev.bits.vib.be/eulife/all_events-vib_conferences.json
Keep the old data and associate any new stuff with the new old content providerr

Norway Training events

http://www.bioinfo.no/Training

Not well formatted so don't attemp. Approaching them about schema.org first

IFB Events parser

We've got a materials one, but now they're implementing schema.org for events
They separate old and new

* http://www.france-bioinformatique.fr/en/evenements_upcoming for upcoming events
* http://www.france-bioinformatique.fr/en/evenements_previous for our previous events.

RDFa extractor is incorrectly extracting authors and target audiences

Not searching within nodes properly.

Probably due to overuse of "optional" flag for example in the following code:

pattern RDF::Query::Pattern.new(material_uri, RDF::Vocab::SIOC.has_creator, :author_obs, optional: true)
pattern RDF::Query::Pattern.new(:author_obs, RDF::Vocab::SCHEMA.name, :authors, optional: true)

if no author_obs were found, authors contain all the RDF::Vocab::SCHEMA.names from the whole page.

Checkout IFB scraper

a) scraper hasn't been run in a long while

b) Duplicate records for some materials https://tess.elixir-europe.org/materials?content_provider=IFB+French+Institute+of+Bioinformatics&page=3

c) New scraper for events - https://www.france-bioinformatique.fr/en/evenements_upcoming

Use new RDF extractor gem in scrapers

https://github.com/ElixirTeSS/TeSS_RDF_Extractors

Allow updating of records

The scrapers need to be updated so that records can be updated. Ideally there will be some sort of versioning active on the main TeSS site so that the previous revision will be kept.

Goblet RDFa scraper redirection is broken

GobletRdfaScraper: redirection forbidden: http://www.mygoblet.org/training-portal/materials-xml -> https://www.mygoblet.orgtraining-portal/materials-xml

(missing / between domain and path)

But the actual URL returns a 404: https://www.mygoblet.org/training-portal/materials-xml

Contact the maintainer and let them know their web server config is broken.

Futurelearn scraper is putting the venue name in the "end" date field

Add BioConductor training materials

The training materials are listed in a TSV format here;
https://github.com/Bioconductor/bioconductor.org/blob/master/etc/course_descriptions.tsv

For each material you can add the title column; keyword; instructor (as author); and URL. Extract and add the first 'material' url as the link; and any subsequent URLs as associated resources.
All can be linked with the bio.tools Tool; Bioconductor:
https://bio.tools/bioconductor

Any upcoming events can be added as events and they should have a link to the added material too!

Re-implement GOBLET API scraper

We need to bring back the old GOBLET API as the RDFa one is insufficuent
github.com/ElixirUK/TeSS_scrapers/blob/9f22a17065e0d2c97f9023568e69979f8ae68dd3/unrefactored_scrapers/goblet_api_scraper.rb
It needs putting in the new scraper framework format

Reporting of scraper errors via email not working

Could not email: From: TeSS <[email protected]>
To: TeSS <[email protected]>
Subject: Scraper Failure

It would seem that the following scrapers have failed to run:

GalaxyScraper: undefined method `each' for nil:NilClass

GobletRdfaScraper: redirection forbidden: http://www.mygoblet.org/training-portal/materials-xml -> https://www.mygoblet.orgtraining-portal/materials-xml
 | 553 5.1.8 <[email protected]>... Domain of sender address [email protected] does not exist

Research IT Training

http://www.ucl.ac.uk/isd/services/research-it/training/#training
Contact them (or James Hetherington) about structured data first

Prevent needless updates

Some scrapers randomly change the order of certain array fields, causing an activity log to be generated even though nothing really changed.

See: https://tess.elixir-europe.org/materials/key-terms-a-learning-game-for-conceptual-consolidation#activity_log

 scraper updated "Key-terms", a learning game for conceptual consolidation at 2017-10-05 03:10:19 UTC.
changed Remote updated date to: 2017-10-05
scraper updated "Key-terms", a learning game for conceptual consolidation at 2017-10-04 03:10:04 UTC.
changed Target audience to: ["Trainers", "Educators", "Ontologists"]
changed Remote updated date to: 2017-10-04
scraper updated "Key-terms", a learning game for conceptual consolidation at 2017-10-03 09:15:05 UTC.
changed Target audience to: ["Educators", "Trainers", "Ontologists"]
changed Remote updated date to: 2017-10-03
scraper updated "Key-terms", a learning game for conceptual consolidation at 2017-09-28 03:21:47 UTC.
changed Target audience to: ["Ontologists", "Trainers", "Educators"]
changed Remote updated date to: 2017-09-28

Bioinformatics Canada workshops and lectures

https://bioinformatics.ca/

Elixir Czech site

http://www.elixir-czech.cz/events/workshops-and-courses

Have google calendar links

Implement a class for extracting Material metadata from RDF

Needs to handle cases where metadata isn't available, or is in a different form to what was expected.

Pass material/event ids/urls to TeSS to email when not scraped

Cross issue with: ElixirTeSS/TeSS#360

Integration of workflows in Bio.tools

Integrating TeSS workflows in bio.tools. The bio.tools want a read-only workflow viewer in their registry. I said we'd do it as we know the stuff well. They're using git in a private repo so I am waiting to get access. For the development of this:

phase 0: Add EDAM scientific topics to TeSS workflows. We'll have to extend this to include EDAM operations too at some point.
phase 1: They'll store some workflows in the original cytoscpae JSON format on their server. We'll add the cytoscape/TeSS-workflow Jquery library to their codebase to render it.
phase 2: Have them read the workflow from our API so it renders the latest version

Fix up SIB scraper

From a user regarding the SIB scraper:

Now a "last" request... could Switzerland appear in the Country drop down menu on the Events left hand side? :)
(hummm, I think Wageningen is not a country...).

BiVi scraper

BiVi - Bioinformatics Visualization - have made some RSS feeds for us. These are the three RSS feeds listed below: one event, one material, and one to be ignored for now as it seems more to be tools than TeSS content. The <description></description> element has encoded some attributes within the text in the format

#field: <value>

So these will need to be regexed out, and some are lists (e.g. keywords) so will need to be split by commas.

Ignore: http://bivi.co/visualisation-feed
Materials: http://bivi.co/presentation-feed
Events: http://bivi.co/event-feed

Text for content provider should be

About BiVi

The Biological Visualisation Network (BiVi) provides a forum for dissemination, training and discussion for life-scientists to discover and promote complex data visualisation ideas and solutions. BiVi, funded by the BBSRC, is a central resource for information on bio-visualisation and is supplemented with annual meetings for networking and educational purposes, focussed around emerging trends in visualisation and challenges facing biology.

Checkout ELIXIR scraper

The Hub said our scraper has not been retrieving all of the events from here https://www.elixir-europe.org/events

Software Capentry events

Software carpentry have an ICS file of all training events. Should be easy enough to write a scraper for http://software-carpentry.org/workshops.ics

Investigate why search for 'Finland' doesn't bring up all CSC events

Metabolomics Training

http://metagenomics-training-material-roundup.readthedocs.io/en/latest/training/index.html

Australian Events Calendar

https://calendar.google.com/calendar/ical/mcgrath.annette%40gmail.com/private-630fc41b06ab65d8ed820d56932f130b/basic.ics

For http://www.abacbs.org/about/ website

Refactored BITSVIB RDFa scraper thinking scientific topics/audience are keywords

Need to check the query being used is sensible

DataCarpentry scraper failing

undefined method `include?' for nil:NilClass
/home/tess/TeSS_scrapers/app/scrapers/data_carpentry_scraper.rb:47:in `block in scrape'

New scraper required for SIB content

https://www.sib.swiss/training/upcoming-training-events

Nominatim

TeSS itself is using Nominatim for Geocoder lookups:

ElixirTeSS/TeSS#478

The scrapers should perhaps be updated to operate similarly.

Goblet scraper keywords are URLs

Should just be the terms

Also scientific_topic_names are now missing

Refactor scrapers

Add a superclass that implements some common things that a scraper needs to do.
Rewrite existing scrapers to use this superclass

Relates to ElixirTeSS/TeSS#123

NBIS.se

Parse NBIS. It's the swedish nodes stuff. https://www.googleapis.com/calendar/v3/calendars/bils.elixir%40gmail.com/events?key=AIzaSyA7tQAGCL4d8mNBSUZRBhedexrswhzgY6s&orderBy=startTime&singleEvents=true

Revist Software carpentry Schema.org

Look back into making a pull request for adding schema.org to Software carpentry template. Contact jduckles

Send a notification e-mail if scrapers do not run

If for some reason the scraper fails to run, send out an e-mail informing tess info mailing / niall

CSC website

Scraper hasn't run for ages.
Need to filter by tag that says 'tess'.

Archive feature on website. Click on an event that has been and gone. There's a field at the bottom called 'course materials' this gives you link to course material

Portugese node events

http://elixir-portugal.org/?q=event/ib16s

nil error on SoftwareCarpentryEventsScraper

undefined method `address_components' for nil:NilClass
/home/tess/TeSS_scrapers/app/scrapers/software_carpentry_events_scraper.rb:38:in `block (2 levels) in scrape'
/home/tess/TeSS_scrapers/app/scrapers/software_carpentry_events_scraper.rb:29:in `each'

IFB Scraper
Khan Academy
CSC Events
SIB Scraper

elixirtess / tess_scrapers Goto Github PK

tess_scrapers's People

Contributors

Stargazers

Watchers

Forkers

tess_scrapers's Issues

Recommend Projects

Recommend Topics

Recommend Org