harrowio / harrow Goto Github PK

The monorepo for the FOSS Harrow CI/CD collaboration project.

License: GNU Affero General Public License v3.0

Makefile 0.33% Go 52.79% Shell 0.86% Ruby 0.11% Awk 0.04% Python 0.58% HTML 9.01% CoffeeScript 9.33% JavaScript 18.23% CSS 6.21% PLpgSQL 2.50% Dockerfile 0.01%

continuous-integration continuous-deployment continuous-delivery ops devops devops-tools devops-teams operations collaboration collaboration-platform

harrow's Introduction

Harrow.io Open Source

PREVIEW

This is a brutally modified version of the upstream code which powers harrow.io which is currently offered as a dev bundle for modifying Harrow. The intention is that within a few weeks this message will go away, and harrow.io (hosted) will run the same codebase as is offered here, but there is a small amount of clean-up and canonicalization yet to do.

What Is This?

Harrow is a task-runner for people who build and manage software. It's designed to sit in the place of a traditional CI/CD build system whilst providing an element of accessibility and beauty for non-technical team members and stake-holders.

Harrow was borne out the popular Capistrano tool for Ruby (and Rails) deployments and created by the same people.

Operating as a successful online SaaS since 2015, Harrow is now released in it's entirety as a piece of (AGPL v3 licensed) free, open source software.

Why Does This Exist?

Harrow sits in a peculiar place in DevOps. DevOps as a movement has reached a plateau where "we have CI, and we do CD" has become the accepted state of the art.

Harrow's creators believe that DevOps can go further, ideally we'd achieve the same enlightenment that the Agile/XP movement brought to collaboration when building software and extend that to the whole life cycle of a piece of probably business-critical software.

Why Is The Code Open Source?

Harrow is/was VC funded, and having operated successfully so far, we want to use the opportunity afforded to us to give something back to the FOSS community to which we owe our existence.

What Is Included?

The entire software is included. Some parts of the repository are protected with GPG encryption, available only to those who are core maintainers of the commercial, hosted version of Harrow.

There are some private components held in separate repositories, namely the license key generation mechanisms for the enterprise version. The key verification systems are part of this open source repository.

For a quick summary please see the following entries:

frontend: The Angular (1.x) application which drives our whole official HTML5 client. This application has extensive integration tests.
style-guide: Imported by the front end as a bower module, contains a separate style-guide with all graphic and styling resources.
api: The Go packages that comprise the application, including the harrow fat-executable which contains all the micro-services which fulfil all the roles responsible for the backend.
notifiers: The notification used by the API.
knowledge-base: The Sphinx based knowledge-base and deployment recipes.
config-management: Ansible scripts for building and provisioning the development, staging and test environments using VirtualBox. The same scripts are applied to production.

See the README.md of each subdirectory for more explanation of their contents.

License

AGPL v3: https://www.gnu.org/licenses/agpl-3.0.html
See LICENSE.md

Harrow, a continuous integration and collaboration software.

Copyright (C) 2016 Harrow GmbH

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with this program. If not, see http://www.gnu.org/licenses/.

Truncated History

The Git history was truncated at the time of FOSS release to ensure that historical secrets and credentials were no longer in unreachable commits in the history. It is also partly because merging the individual projects into a monorepo was a relatively brutal merger.

harrow's People

Stargazers

Watchers

Forkers

duizendnegen nelyj happy-ferret jbelke ddfznt

harrow's Issues

Implement FlowDock notifier

FlowDock notifications can already be sent using the WebHook system, but they're not the most flexible in the world and don't look great.

It'd be great to have a FlowDock notifier.

Send email to project members on operation timeout

Timeouts should be an exceptional condition, following recent improvements they should be now.

Please +1 this comment if you're interested in this feature.

One-click option to invite a staffer to help

Currently if we want to help a user with their issues we need to ask them to invite us to their project, wait until we've accepted the invitation and then upgrade us to management. We could of course hack the DB but this fails to leave the trail of events (activities) which is important, it also violates some trust, and we prefer to be explicit.

Related are the issues #8 and #9 and #10.

Support BitBucket as a pull-request notifier

This would allow someone to connect a BitBucket notifier to a project, and have related repositories' pull requests be updated with the build statuses.

cloning repo fails 'sometimes'

Every once in a while, the cloning repo task fails with a FATAL error, probably a network glitch or something like that between the harrow infrastructure and the git repo.
We get this message :

Fatal: script=.bin/setup line=308 func="main" cmd="( clone_repository_my_repo_path )" status=76

and the tasks fails. Usually, we just have to relaunch the task manually, but when there's automation somewhere, it breaks the pipeline.

Maybe a solution would be to re-try cloning the repo after a reasonable amount of time ( a few seconds ? ) and maybe limited times ( 2-3 times max), before failing.
And having a more verbose error message could help to debug for people hosting their own repo.

Thanx :)

Upgrade to support Go 1.5+

Possible 418 (Teapot) condition on invalid schedule

A user-reported issue causing a 418 HTTP response was found because it was possible to create a domain.Schedule with both a timespec and a cronspec field populated.

Both the cronspec and timespec were unparsable:

cronspec         | 20 20 20 20 20
timespec         | /20 * * * *

This cronspec was invalid because 20 in the 4th field is invalid (month):

https://crontab.guru/#20_20_20_20_20

The timespec is invalid because it doesn't accept cronspecs:

https://www.apt-browse.org/browse/ubuntu/precise/main/i386/at/3.1.13-1ubuntu1/file/usr/share/doc/at/timespec

Probably we should have more validation up-front to prevent this case, however it's only the 2nd case ever of a 418 resulting from a bad timespec in 2 years of running Harrow, so possibly we're already validating this, but we miss some edge cases?

User invitation to project

It would be great to be able to directly invite to a project, a user that already has an "harrow" account, and if possible to select when inviting the role given to that user.

Show a completion percentage for jobs

Of course they can never be 100% accurate, but something based on the mean time of the last few runs might help give people an idea of where their build is in relation to "normal"

Recurring tasks modification

I had for a little while a task run every 5 minutes : */5 * * * * .
I changed it to run every 2 minutes : */2 * * * * and was surprised that this "new" schedule was added to the previous one, so the task runs every 2 AND 5 minutes :)
I even deleted the "Schedule" Trigger, but it continues to run as previous.
Really no harm for this precise task, but might be problematic for other customers.
And just to be clean, all the modifications were made in the web UI, with "normal" access ;)

Environment variables must be sanitised as they may include HTML

A customer recently reported an issue where rich formatted text had been pasted into the environment editor controls and subsequently jobs failed with a syntax error in ./bin/setup (the user script entrypoint)

370
371 function export_secret_vars() {
372   export DB_PASSWORD="<pre style="color: rgb(0, 0, 0); word-wrap: break-word; white-space: pre-wrap;">••••••••••</pre>"
373
374 }
375

Clarify Installation & Shipping

How is the open source version of Harrow shipped to users? Is there an installation procedure to follow? Would it be possible to provide Docker images?

Implement HipChat notifier

HipChat notifications can already be sent using the webhook system, but they're not the most flexible in the world and don't look great.

It'd be great to have a HipChat notifier.

Loxer package enhancements

Create a public wrapper around lexer

It should implement io.Writer and provide utility methods for
subscribing to events.
Handle arguments for all event types

Currently arguments are only interpreted for Display and Cursor events.
Fix that if the output mode never switches you never get any output

The lexer should track the "last" kind of emitted event (current active
statefn), and emit when the tokbuf reaches a certain size. It might also
flush this bus periodically.
Optimize for memory usage.

Since there will be a lot of throughput (e.g. thousands of events for
a typical run of npm install), the number of memory allocations
for events needs to be kept low.
Add more tests

Test coverage is too low. Every implemented event type should be
tested at least once with sensible parameters.

Allow SSH access to build machine

This should be regulated via additional TTYs and proxied through a Go service (for logging, and etc) rather than being a simple exposed sshd port on the build machine itself.

This ought to be quite simple in reality. It would also be nice to be able to click a single link in the browser and get access to the machine with <somekey>@<it's ip address> without fighting keys, passwords, etc.

Change FE to support "public" projects without requiring a logged-in user.

Building out the implicit guest-account was mandatory for this as the backend always expects to find a user.

This is done, and now theoretically we just need to build a switch somewhere to toggle projects from public to private.

Public projects should not be counted towards the billed # of projects for an organization.

Pasting formatted HTML into the environment editor form breaks tasks

Because we don't capture the paste event when people past into the HTML form of the environment editor page, people can sometimes break their tasks by inserting < and > etc into environment variables which breaks Bash's export function.

Often the formatting isn't as blatant as in the attached screenshot.

Implement "inheritance" between environments

We recently introduced the "Default" environment, however there is no inheritance.

Inheritance from "Default" down to individual environments isn't implemented, and I'm personally not sure that it should be. I'd be glad to have some community input. I think a little repetition is better than more indirection, but I'm not the only stakeholder.

Package the notifiers for releaes

The notifiers (email, slack, and job (e.g trigger another job)) need much work before they can be released publicly, they don't build at all without a lot of infrastructure that is internal to us, and they're mandatory for a fully working version of Harrow that doesn't break down.

Similarly the existing code in cmd/operation-runner/spawn.go refers to ...%s controller-shell -operation-uuid %s '/srv/harrow/bin/harrow-notify-${NOTIFIER_TYPE}'" ... which under the new arrangement of code is no longer valid and must rely on the $PATH.

Show calendar/day planner of forthcoming scheduled tasks on dashboard

To give people a clue what's scheduled on any given day.

The backend is supported already, check the sub-paths under /projects/:uuid

Have a glimpse on the main metrics of the whole system

On very rare occasions, a "bad neighbour" in the worker's queue makes the system less responsive.
It would be very helpful to have access to basic metrics to see if everything is normal or if the queue is behaving wrong.
I was thinking of a light version of this kind of dashboard : http://monitor.gitlab.net/dashboard/db/ci ?

Thanx

Occasional failure to connect to docker daemon early in script.

Used to be very very infrequent, some users report it happening in 5-10% of builds which is unacceptable.

A workaround to wait for the Docker daemon to come up might be:

until docker ps
do
  echo "waiting for Docker to come-up"
  sleep 1
done

/cc @duizendnegen

Build UI for creating organization-wide members

We support this in the backend but there's currently no user-facing UI to do this.

Replace billing subsystem

The current billing subsystem is very closely bound to Braintree, for the open sourcing of Harrow it's important that we get away from this model, for two principle reasons:

Difference in billing stacks between Enterprise and Cloud
Requirement of running self-hosted without a billing backend.

The current system opts into a plan, and communicates frequently with Braintree, storing events in the billing_events table such as the following:

uuid              | 91c29e51-******
organization_uuid | 49e835a8-******
event_name        | plan-selected
occurred_on       | 2016-06-22 12:47:34.05605+00
data              | {"UserUuid":"3eed9271-******","PlanUuid":"b99a21cc-******","PlanName":"free","SubscriptionId":"free:49e835a8-******","PrivateCodeAvailable":true,"PricePerMonth":"0.00 USD","UsersIncluded":1,"ProjectsIncluded":1,"PricePerAdditionalUser":"0.00 USD","NumberOfConcurrentJobs":1}

This doesn't reflect the book keeping nature, and requires that in order to have uninterrupted access to Harrow that braintree be frequently consulted.

We want to replace it with a system which behaves essentially the same way, but also generates simple credit/debit notes when an additional daemon is run. This would allow us to see the costs over time, switch to a more granular billing system.

The new ideal system would simply run an additional daemon which generated invoice entries bound to accounts in the database, and optionally registered payments against those by using the immediate billing APIs of some 3rd party, rather than relying on the 3rd party for subscription management.

Bringing control of subscriptions in-house, and having complete control of the billing would enable the FOSS/self-hosting users to simply not run this daemon, whilst the enterprise (supported self-hosted) system can run a different daemon. The cloud version of Harrow can run any daemon it likes.

This would make trials, optional added extras and other complex topics comparably trivial to implement.

Allow build-parameters to be chosen

We once developed a concept for this where build parameters for a build can be specified with [json-schema](http://json-schema.org/examples.html which ) allowing for people to define how the build params form should look, with fall-backs and defaults.

The concept fell apart trying to design a UI for the JSON schema, and trying to decide how to export these into the build environment (env vars, likely). The UI part might be obsolete, since we could simply mark this as a power user feature and provide a text input for a JSON schema and provide a validator to ensure it's valid.

The other stumbling block was how to handle triggers such as webhooks incase they send build params, and we have to somehow map a webhook body to a set of build params, or fail to react to the webhook.

Surely all these are solvable and we could get the MVP shipped very easily if we had some help.

Allow customization of the Slack request notifier payload

Currently the payload is static and provided verbatim to the notifier container, we should allow it to be configured. The Slack notifier container can receive any payload, it's simply a question of allowing this in the UI, and/or referring people to the Slack UI tool for testing messages, whilst also documenting what variables we export to the environment.

Problem description and outline of work to fix the enqueuing reliability issues.

Closely related to #35, as this ticket will hopefully lay the groundwork for this.

How it works

Presently clicking "Run Now" injects a "OneTime" schedule into the database with the timespec now. The database publishes a message using LISTEN/NOTIFY which is picked up by the zob component and republished on the rabbitmq exchange concerned with the general creation of new messages.

Note: The schedule has a def default properties, including it's status "pending". The schedule can remain in this state indefinitely in the absence of any of the other micro services performing work.

The presence of the message on the rabbitmq bus has a significance for the scheduler component which looks at the database for "late" schedules; this component also looks every second for cronspecs that should fire now.

The scheduler then creates the "operation". The operation is then enququed, and waits for one of the operation-runner processes to pick it up.

The operation-runner processes greedily consume but don't ACK messages concerning operations that should be run. The ACK is sent once the job completes successfully. This is problematic since received, but unacked messages don't contribute to RabbitMQ queue stats as monitored by most tools.

Personal Note: I suspect also some foul play with the operation runners and the queue behaviour. We intended them to connect to RabbitMQ take one message, and then disconnect. This should mean that from the rabbitmq queue stats. I believe the confusion is because we need to stay connected (to hold the unacked message, disconnecting would send the message back to the queue), but not consuming, this is something that doesn't seem to work as intended.

The operation-runner is a service which is started via systemd something like 20-40 times depending on the backend load, these take the operations, and compile the assets to run them, build the rootfs and finally run the operation. Updating the original "operation" record in the DB as they go.

Problems with the above

Even if none of the backend microservices are running (or are crashed), the newly scheduled schedule has a status of "pending", it probably ought not to have.
Apart from high and low prio queues, we have no way to prioritise different kinds of work (e.g higher prio for script editor than for manually clicked runs, which are all higher than automated tasks)
No way to see how many operations are running, easily at any given time or how long they waited.
Whilst we can do some rudimentary "mean" or "95th percentile" graphing of global waiting times, we can't really do that on a per-project basis because of the database schema.

Potential "fixes"

TODO: Finish this

Potential work arounds/band-aids

As a band-aid more monitoring has been added. See comment below.

Transpile coffee script to JavaScript

We're really not fond of Coffee Script anymore. In 2015 when we started the Harrow project it still made sense, now - not so much.

We need to look seriously at webpack and the module system and start taking the FE performance in terms of size, build tooling, etc a bit more seriously.

Scheduler does not correctly "forget" archived schedules

Simple log of adding, and subsequently removing a schedule. I believe the error case is failing to find the schedule because of the archived_at IS NOT NULL clause which causes it to err and fail to remove the schedule from the active pool?

Dec 07 20:54:03 alcohol harrow[12666]: {"time":"2017-12-07T20:54:03Z","level":"debug","message":"schedulableFromUuid: loading schedule \"5dfbc93f-9eff-45a2-37de-b50e8bf1cf04\""}
Dec 07 20:54:03 alcohol harrow[12666]: {"time":"2017-12-07T20:54:03Z","level":"debug","message":"cronspec: \"0 20 * * *\""}
Dec 07 20:54:03 alcohol harrow[12666]: {"time":"2017-12-07T20:54:03Z","level":"debug","message":"pool.Add obtaining lock"}
Dec 07 20:54:03 alcohol harrow[12666]: {"time":"2017-12-07T20:54:03Z","level":"debug","message":"pool.Add[d] releasing lock"}
Dec 07 20:54:03 alcohol harrow[12666]: {"time":"2017-12-07T20:54:03Z","level":"info","message":"pool.Add \"5dfbc93f-9eff-45a2-37de-b50e8bf1cf04\""}

Dec 07 20:54:25 alcohol harrow[12666]: {"time":"2017-12-07T20:54:25Z","level":"debug","message":"pool.Remove obtaining lock"}
Dec 07 20:54:25 alcohol harrow[12666]: {"time":"2017-12-07T20:54:25Z","level":"info","message":"pool.Remove \"5dfbc93f-9eff-45a2-37de-b50e8bf1cf04\""}
Dec 07 20:54:25 alcohol harrow[12666]: {"time":"2017-12-07T20:54:25Z","level":"debug","message":"pool.Remove deleting internal struct member"}
Dec 07 20:54:25 alcohol harrow[12666]: {"time":"2017-12-07T20:54:25Z","level":"debug","message":"pool.Remove[d] releasing lock"}
Dec 07 20:54:25 alcohol harrow[12666]: {"time":"2017-12-07T20:54:25Z","level":"debug","message":"schedulableFromUuid: loading schedule \"5dfbc93f-9eff-45a2-37de-b50e8bf1cf04\""}
Dec 07 20:54:25 alcohol harrow[12666]: {"time":"2017-12-07T20:54:25Z","level":"error","message":"handleChanges: Error creating schedulable: 5dfbc93f-9eff-45a2-37de-b50e8bf1cf04 (harrow/domain: not found)\n"}

Ensure reliable starting of pgbouncer and postgresql in Vagrant virtual machine

Occasionally the directories required for pgbouncer and postgresql fail to be created correctly. This is probably a silly mistake connected with a race condition. It should be investigated and the cause sought and fixed.

Implement ability to re-run task with old build params

For example re-triggering an incoming 3rd party webhook, re-simulating the webhook body to retry a branch/etc.

Remove the configuration files and modify systemd units in vagrant box for the services.

The services in the modified codebase no longer read from config files, rendering the following part of the Ansible run surplus to requirements.

TASK [harrow.backend : configuration files {{ item }}] *************************
ok: [development] => (item=aws)
ok: [development] => (item=braintree)
ok: [development] => (item=features)
ok: [development] => (item=filesystem)
ok: [development] => (item=http)
ok: [development] => (item=limits_store)
ok: [development] => (item=mail)
ok: [development] => (item=oauth)
ok: [development] => (item=pgsql)
ok: [development] => (item=rabbitmq)
ok: [development] => (item=redis)

Similarly the systemd services should be touched to set the configuration (where necessary) as part of the environment in keeping with modern practices.

In any case, the point is somewhat moot, and this ticket is a low priority because it's no longer mandatory to run these services in a specific directory with a config file in the relative correct position and have the systemd unit set up in the right way with the correct hard-coded path to the binary.

It's all relative to itself now.

Ensure binaries build with make under Linux and OSX with Go 1.8

Docs must be accurate too. The change to supporting OS X will be painful and mandate the guarding of some blocks with build tags which isn't necessarily a bad thing.

Rubies in the baseimage are becoming out of date

Reported by a customer 2017-05-26 that the latest version in the baseimage is 2.3.1 which was released on 2016-04-26. There are presently four newer versions that ought to be included in the baseimage:

2.3.3 (2016-11-21)
2.3.4 (2017-03-30)
2.4.0 (2016-12-25 🎅 )
2.4.1 (2017-03-22)

It may warrant also documenting how to upgrade a base-image given the upstream tarball of the LXC baseimage.

Wrap the LXC containers in some sane resource limits

Someone is helpfully trying to mine crypto currencies on that platform again

Check the docs https://stgraber.org/2016/03/26/lxd-2-0-resource-control-412/
Integrate with ansible:
- disk https://github.com/harrowio/harrow/blob/master/config-management/roles/harrow.lxd-host/tasks/lxd.yml#L30
- cpu (??)
- iops

User rights management

It would be nice to have more control over the users in the projects :
Currently, a "normal" user can only see logs, and has to be promoted to "Manager" to be able to launch tasks.
But when manager, he has pretty much every rights on the project.
Why not add more roles, to be able to control more precisely what users can / cannot do.
For example, it would be great to have user that can run tasks, but cannot modify them.

Cache Git repositories locally to speed up clones and operations

We pull (and throw-away) the Git repositories to which Harrow is connected on every single operation. This is wasteful and slow.

We should cache locally the Git repository and make an effort to clone locally, add the user's upstream as a remote, fetch, update ours (forcefully), and proceed with the build.

I suspect this would drop our inbound traffic 50%+ based on looking at the logs.

Make the FOSS announcement!

This is the sentinel ticket which will be symbolically closed when Harrow's code is finally pushed upstream to https://github.com/harrowio/harrow

Here's the checklist:

Audit code to ensure there're no current secrets / tokens embedded
Squash history and merge into a mono repo (switch away from repo tool)
Ensure each component has an up-to date README.md
Ensure each component has an up-to date LICENSE.md
Write and publish a release announcement on the blog
Tweet about the announcement.

Build badges for projects

We wanted to implement this as a "notifier" so that someone can configure their preferred badge style and text options/etc. This would also allow us to update the badges asynchronously and have them pre-cached and pre-generated.

Problem with basename after renamed repository in setup script leading to junk output at end of log

The file fsbuilder/rootfs/templates/setup.sh should have some fixes to the basename usage, to ensure that basename responds without an error.

In a couple of cases of renamed repositories for a particular customer something appears to have gone wrong, with weird errors dumped at the end of the script run relating to basename problems.

Support creating memberships with a built-in expiry date

For freelancers, colleagues, consultants, etc.

Support specifying membership level when inviting someone

Should affect project, and organization memberships.

Ensure that failed repository clones are retried a sane number of times.

The file fsbuilder/rootfs/templates/setup.sh should have the clone_repositories (or the clone_repository_.... meta functions) improved to handle temporary failure cases.

This will make Harrow more resilient in the case of occasional temporary (sub-minute?) outages that seem to randomly break builds.

Including a # at the beginning of a cronspec causes it to break.

A customer changed a cronspec to begin with a # (they wanted to comment it out) and the platform started raising 418 unhandled internal errors.

The issue is that the parser has validated the cronspec at the time of the change, but has not been able to re-read it at a later date.

The broken cronspec prevents the associated task being editable.

A log excerpt:

Jun 20 17:16:59 api.app.harrow.io api[3688]: Internal server error: Unknown error type encountered: syntax error in minute field: '#*/2': &errors.errorString{s:"syntax error in minute field: '#*/2'"}

Implement status badges for tasks

They should roughly look like this:

Just a first idea, these should be easy to proxy off to the shields.io server.

Configuration management of worker base-image using Ansible (or similar)

Currently the worker base image is a 90% automated build using shell scripts with some by-hand modifications for the AMI or LXD variants.

There are a number of automated tests written using shunit which enforce the structure of the base-image, but they are incomplete.

As Ansible is already in use, it might be the ideal way to do a simple mechanical transformation of the existing shell scripts into Ansible, and start refactoring to use roles and playbooks as required to get this bent into a shape we can live with.

Refactor configuration package to read from the environment

In the spirit of https://12factor.net/ and making it easier to run the binaries without moving into a specified directory relative to the configuration dir. It should also make it easier to run the binaries on any platform, or under some container engine.