Coder Social home page Coder Social logo

kiliankoe / emeal-server Goto Github PK

View Code? Open in Web Editor NEW
5.0 5.0 0.0 224 KB

🌯 Scraping Dresden's canteens for juicy meal data

License: MIT License

Swift 99.78% Makefile 0.16% Shell 0.06%
dresden studentenwerk-dresden mensa canteen emeal tu-dresden htw-dresden

emeal-server's Introduction

🌯 emeal-server

Travis Docker Build Status Docker Pulls

This is a minimal webapp to function as a proxy between anything requiring meal data (e.g. your app, chatbot, etc.) and the canteen menu of the Studentwerk Dresden. It's powered by Vapor and runs on Swift.

🍲✌️

Usage

/canteens

List out all known canteens. This information is however bundled with the app itself. Don't consider it dynamic. The canteens' IDs are based on their order in the linked config file.

[
  {
    "name": "Mensa Reichenbachstraße",
    "city": "Dresden",
    "coordinates": {
      "latitude": 51.0342255,
      "longitude": 13.7323254
    },
    "id": 1,
    "address": "Reichenbachstr. 1, 01069 Dresden"
  },
  {
    "name": "Zeltschlösschen",
    "city": "Dresden",
    "coordinates": {
      "latitude": 51.031458,
      "longitude": 13.7264826
    },
    "id": 2,
    "address": "Nürnberger Straße 55, 01187 Dresden"
  },
  ...
]

/meals

List all meals for the current day. The query parameters date and canteen can be used with example values 2018-01-08 or 4 (canteen id) respectively to query for specific dates or canteens.

Use a canteen's id as a URL parameter, e.g. /meals/4 to list all known meals for a given canteen. At maximum this can include the next three weeks worth of data.

The corresponding values for the information can be found here. Info on listed allergens and additives is provided by the Studentenwerk.

[
  {
    "canteen": "Alte Mensa",
    "detailURL": "https://www.studentenwerk-dresden.de/mensen/speiseplan/details-198200.html?pni=20",
    "information": [
      "pork",
      "garlic"
    ],
    "image": null,
    "isSoldOut": false,
    "date": "2018-01-08",
    "title": "Hausgemachte frische Pasta, heute Amori in Pastasoße all'amatriciana mit Tomaten und Bauchspeck, dazu italienischer Hartkäse Grana Padano",
    "studentPrice": 2.3,
    "employeePrice": 4.05,
    "additives": [
      "2",
      "3",
      "8"
    ],
    "allergens": [
      "A",
      "A1",
      "C",
      "G"
    ]
  },
  {
    "canteen": "Alte Mensa",
    "detailURL": "https://www.studentenwerk-dresden.de/mensen/speiseplan/details-198216.html?pni=18",
    "information": [
      "vegetarian"
    ],
    "image": null,
    "isSoldOut": false,
    "date": "2018-01-08",
    "title": "Paprikaschote mit Soja-Gemüsefüllung mit Tomatensoße, dazu Bohnen- Maisgemüse und Reis",
    "studentPrice": 2.4,
    "employeePrice": 4.15,
    "additives": [],
    "allergens": [
      "A",
      "A1",
      "I"
    ]
  },
  ...
]

/search

Search for a given keyword in all known meal titles. The keyword is supplied with the query parameter query, e.g. http://server_url/search?query=burrito. The response is a list of meals matching the query.

/update

Queue an update for the application's data for a given week and day. Using this shouldn't be necessary in most cases, since the application updates everything itself at regular intervals, but sometimes it very well might be. In that case send a POST request to /update with a week and day identifier as form url-encoded body params.

To prevent external misuse of this endpoint, the server requires the request to come from 0.0.0.0, e.g. localhost. An update script is provided for convenience. It is supplied with the week and day identifiers, e.g. ./update current monday and basically just curls the running server with the given commands.

Installation

The recommended way of installation is via docker, just run $ docker pull htwdd/emeal-server or build locally. Otherwise it can be built and deployed like any other Vapor application.

emeal-server's People

Contributors

benchr267 avatar kiliankoe avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

emeal-server's Issues

limit meal output for single canteen

Currently all meals for a single canteen are being output on /meals/<canteen_id>. That should probably be limited to the current week only?

In turn that should also result in some way of accessing the data for the next two weeks somehow. URL param maybe?

Persist meal array fields

Currently all array fields of Meal are not persisted due to the fact that Fluent (or SQLite down below) can't handle arrays directly. It would probably work to just encode those as semicolon separated strings on the fly in both directions.

endpoint for all meals for a single day

Currently only possible via /meals for today's meals. Would be great to have this for any possible date or maybe using the week and day params same as the StuWe?

Update meal properties instead of overwrite

Currently all meal properties are overwritten on an update. This unfortunately removes images and maybe other metadata as well since the StuWe removes these for some reason 😕

It would definitely make sense to just update instead. I'll consider the current behavior a bug.

Parallelize scraping

Currently all scraping requests are completely synchronous, albeit being run in the background. It obviously takes a little while to get through them all, especially on the initial fetch all.

Fix MealInformation log errors

Also think about not keeping an exhaustive list of all allergens and additives, but just stripping them to their identifiers. Meal.Information feels like a good thing to keep.

No meals found

Currently, no meals are found, e.g., when using the /meals endpoint. It seems, that somewhen, the layout of the website of Studentenwerk was updated and this change has not been reflected here since.
Is there anything planned for updating this repo, or is there another more recent project?

The logs of a freshly created container using docker-compose tell the following when GETting /meals:

emeal_1  | The current hash key "0000000000000000" is not secure.
emeal_1  | Update hash.key in Config/crypto.json before using in production.
emeal_1  | Use `openssl rand -base64 <length>` to generate a random string.
emeal_1  | The current cipher key "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=" is not secure.
emeal_1  | Update cipher.key in Config/crypto.json before using in production.
emeal_1  | Use `openssl rand -base64 32` to generate a random string.
emeal_1  | Production mode enabled, disabling informational logs.
emeal_1  | Database prepared
emeal_1  | Starting server on 0.0.0.0:8080
emeal_1  | Failed to read menu date at https://www.studentenwerk-dresden.de/mensen/speiseplan/w0-d1.html
emeal_1  | Failed to read menu date at https://www.studentenwerk-dresden.de/mensen/speiseplan/w0-d2.html
emeal_1  | Failed to read menu date at https://www.studentenwerk-dresden.de/mensen/speiseplan/w0-d3.html
emeal_1  | [Abort request error: Not Found] [Identifier: Vapor.Abort.notFound]
emeal_1  | [Abort request error: Not Found] [Identifier: Vapor.Abort.notFound]
emeal_1  | Failed to read menu date at https://www.studentenwerk-dresden.de/mensen/speiseplan/w0-d4.html
emeal_1  | Failed to read menu date at https://www.studentenwerk-dresden.de/mensen/speiseplan/w0-d5.html
emeal_1  | Failed to read menu date at https://www.studentenwerk-dresden.de/mensen/speiseplan/w0-d6.html
emeal_1  | Failed to read menu date at https://www.studentenwerk-dresden.de/mensen/speiseplan/w0-d0.html
emeal_1  | Failed to read menu date at https://www.studentenwerk-dresden.de/mensen/speiseplan/w1-d1.html
emeal_1  | Failed to read menu date at https://www.studentenwerk-dresden.de/mensen/speiseplan/w1-d2.html
emeal_1  | Failed to read menu date at https://www.studentenwerk-dresden.de/mensen/speiseplan/w1-d3.html
emeal_1  | Failed to read menu date at https://www.studentenwerk-dresden.de/mensen/speiseplan/w1-d4.html
emeal_1  | Failed to read menu date at https://www.studentenwerk-dresden.de/mensen/speiseplan/w1-d5.html
emeal_1  | Failed to read menu date at https://www.studentenwerk-dresden.de/mensen/speiseplan/w1-d6.html
emeal_1  | Failed to read menu date at https://www.studentenwerk-dresden.de/mensen/speiseplan/w1-d0.html

Version API

As in add the path component v1 to the URL. Just in case incompatible changes happen in the future.

Extend tests

Currently only a few attributes of the scraping code are being tested. There's a lot of fragile untested corners left.

It might also make sense to periodically run tests against live data. Via travis' cron jobs for example to ensure that failures are found quickly.

Mensa Mahlwerk

Another new canteen? Can't find any details though...

Update mechanism

  • update the current day at regular intervals (every 15 minutes?)
  • update the next day every few hours
  • update the current week once a day (just in case)
  • update the next two weeks only on app startup

does that make sense?

It would probably make sense to limit updating of the current day to times throughout the day (not at night) and don't do so as often on weekends (and holidays)?

Recycled meal IDs

It turns out there's some meals that share a single ID across several days as they're apparently declared for an entire date range ([...] Angebot vom Mo 8.1.18 - Fr 12.1.18) instead of a single day?

See here for an example.

This breaks the current handling of meals since it's interpreted as an update and leads to deletion of the original meal. It then appears as only occurring on the last day it was discovered on, meh.

Sold out meals

Apparently some canteens mark meals als being sold out, others just remove them. The current crawler updates new meals, but keeps the ones that have been removed intact. Ideally these should be marked as being sold out. Not quite sure where to model this though.

One way would be to make the meal models timestampable, which Vapor supports and take that route somehow to check which meals are stale and mark those as sold out? That seems rather fragile though.

Another option would be to check which meals are still present compared to all previously known meals on an update, filter those that are not in the new list and then mark these as sold out. Sounds just as fragile :/

Meal duplicates

Apparently the query param at the end of the meal detail URL is not static between refreshes. It should probably be stripped.

counter

try and parse the entire meal title maybe?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.