Coder Social home page Coder Social logo

broadway's People

Contributors

andyclee avatar ayushr2 avatar bhuvy2 avatar ezhang887 avatar jhenhapl avatar kipyminyman avatar nd-0r avatar nmagerko avatar rod-lin avatar st-arry avatar xiangmingchen avatar zhengyao-lin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

broadway's Issues

Grader dies while running job

When the grader dies while running a job, that job should be marked as failed. Which does not seem to be the case as shown below:
image

Improve Linter Checks

Following up from illinois-cs241/broadway-api#58, some warnings like shadowing variable names, type mismatch warning slip by flake8. We should be able to catch such things and not let them slip into production code.

Maybe change the linter? But have been unsuccessful in finding linters which can do this.

Eventual Jepsen Testing

Assuming and giving a once over of a distributed system is good -- actually verifying it is a little better.

We don't need to do this immediately, but to be a robust piece of software, we should put in some Jepsen Tests for fault tolerance. We'll need to assume just a few likely cases (we aren't trying to make this a true fault tolerant distributed system),

  • (this) API Fails
  • Graders Start Failing in
    • A few at a time
    • Waves
  • Various network delays cause graders to come back online (little less likely).

Separate runner logic from grader

Right now there is essentially a single file that has all of the functions needed to start up the grader and run it.

I would suggest we make a Grader class that contains any grading logic and that we put it inside of the grader module. Then at the top level (outside of the module) we have a run.py that instantiates the grader and gets things running.

I think this would make things a bit easier to maintain.

Moving to docker swarm

Since we have containerized broadway API and grader, it would be a natural step to migrate the entire cluster to docker swarm (or other container orchestration system).

This is mostly an issue of configuration. There are a few things to solve in my limited experience with docker swam:

  • broadway grader needs to interact with a docker daemon. dind doesn't work in this case since there are some permission issues (docker swam doesn't allow privileged container).
  • broadway API need a mongodb instance. So we also need to figure out how to expose a mongodb service to the containers in docker swarm.

A few potential benefits in doing this:

  • auto-deployment
  • more fault-tolerancy
  • easier to monitor the status of the entire cluster (?)

Improve Testing

Currently, the tests do not test the following:

  • The DB contents after jobs have been scheduled/executed or after courses are uploaded.
  • Worker node failure. We will have to mock the variable which decided how often the heartbeat validator runs and we can make the testing thread to sleep (after polling a job) and then check DB contents to assert that the job is marked as failed.
  • Worker node state (alive/dead)

Nice to Have: Deploy script

Right now you can deploy the api after knowing a good amount about the internals, setting up mongo etc.

We have codified setup in .travis.yml, but it'd be nice to refactor this to a deploy script for Ubuntu, so we can

  • Use this for travis testing
  • Provision machines with a one step install

I'd recommend fab, but as always there are other ways to skin a cat.

Test Reconnecting Nodes

We should test that when a node reconnects, their info is sustained and they can continue from where they left.

Can we get rid of the need to run as root?

As usual, running a piece of code as root unless absolutely necessary is just good security practice. Can we have a script that creates a new user and sets up that user to be a part of the docker group instead of root?

Adding Tests + CI

The CI can be mostly copied over from broadway-api, but it'd be nice to have some assurance checks on each commit.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.