biddisco / pycicle Goto Github PK

Python Continuous Integration Command Line Engine

CMake 66.59% Python 33.41%

pygithub cmake ctest cdash hpx

pycicle's Introduction

pycicle

Python Continuous Integration Command Line Engine

A simple command line tool to poll github for Pull Requests on a project for a single base branch(master by default) and trigger builds when they change, or when the base branch of a PR changes. Projects are assumed to be C++, use CMake for configuration and CTest for testing with results submitted to a CDash dasboard.

The project was/is created for use with HPX and the HPX (CDash) dashboard associated with the project is Here, other (non HPX) projects are supported.

What does it do and how does it work

When running, pycicle will poll github once every N seconds and look for open pull requests on your base branch using the pygithub API. A list of open PRs is generated, and for each that is mergeable, pycicle looks at the latest SHA on the PR and the latest SHA on base branch, and if either has changed since last time it looked, it marks that PR as needing an update (rebuild).

A build is triggered either from the current shell or by ssh-ing into a remote machine and calling ctest -S dashboard-script.cmake <args> to spawn a build, or by calling ctest -S dashboard-<scheduler>.cmake <args> if the machine is using a scheduler for job control. Currently slurm and pbs are supported.

The scheduler version of the dashboard script does nothing more than wrap the call to the base dashboard script inside an job dispath script wrapper so that the build is triggered by on a compute node, rather than on the login node.

The build script will checkout the latest base branch (as specified in command line arguments or config file), merge the PR (branch) into it, then do ctest configure/build/test with submit steps after each configure/build/test step respectively to produce an entry in the dashboard that is updated as the build progresses. Note that if a pull request is modified whilst a previous build is still going, a scancel of the existing job is used to terminate the first before starting the second.

Every M seconds, pycicle will find (scrape) a small log file generated in each build dir that contains a summary of config/build/test results and update the github PR status based on it so that failures flag the PR as not ready for merging.

Why use this instead of Jenkins/other CI tool

Running pycicle is relatively simple, and can be done by a user, manually or in a cron job. It does not require elevated system privledges and uses the same permissions as the user starting it. When a build fails, the user can ssh into the machine, cd into the build dir, manually (re)start the build, see the errors, cd into the source dir and inspect the repo/branch that is being tested and tweak anything necessary to get it working - even fix the build/test errors and update the PR from the test build's copy of the repo. You can run it inside a screen session, at startup or just leave a terminal open with it and start and stop it on demand.

CDash supports the display of build information from many sites, so pycicle can be run at several institutions with results from machines at each location being submitted to a single CDash dashboard. Machines at each location may be configured by different users and no central coordination is required - it is this aspect that makes it attractive to projects like HPX that have developers in several locations and compute resources ditributed worldwide with different architectures/hardware.

Pycicle allows users to:

contribute to more complete CI
running CI on exactly the systems they care about
easily run CI on forked repos

Running pycicle

To run locally and use machine daint for builds of the hpx project

python ./pycicle.py -m daint -P hpx

or for builds of the dca project

python ./pycicle.py -m daint -P hpx

options

usage: pycicle.py [-h] [-s] [--no-slurm] [-d] [-r PYCICLE_ROOT] [-t USER_TOKEN]
                  [-m MACHINES [MACHINES ...]] [-p PULL_REQUEST] [-c]

-P PROJECT, --project : Project name (case sensitive) This is the name of the project to be tested, it should be the same as the name/name.cmake file that holds the settings.

-s, --slurm : Use slurm for job launching (default). When slurm is enabled, builds are triggered by launching a slurm script that in turn launches the ctest build script

--no-slurm : Disable slurm job launching When disabled, the script is executed directly, you might want to do this when setting up a build script and using a login node for test purposes.

-d, --debug : Enable debug mode When using debug mode, remote commands are echoed to the screen instead of being executed. This is useful when setting up your first build and trying to get commands right or debugging pycicle itself.

-r PYCICLE_ROOT, --pycicle-root : pycicle root path/directory The environment variable $PYCICLE_ROOT should be set on the machine you are running pycicle on, and also on the machine where builds are being triggered - but when supplied on the command line, it overrides the environment variable for the local machine (not the remote one, but we could add that). It is the root of the build/src tree where pycicle will write all its files.

-t PYCICLE_GITHUB_TOKEN, --github-token PYCICLE_GITHUB_TOKEN : github token used to authenticate access To access github (and set status of PRs) you need to generate a developer token on the github website and use it when initializing the pygithub object. Set the environment variable $PYCICLE_GITHUB_TOKEN or pass it on the command line. Make sure you give yourself write permission if you want to set the status of PRs using pycicle.

-m MACHINES [MACHINES ...], --machines MACHINES [MACHINES ...] list of machines to use for testing Currently pycicle only supports a single machine at a time, but the plan is to allow spawning build on several machines from a single pycicle instance. Currently we run one instance on a login node of each machine or run one on a local terminal and use ssh to spawn on a single remote machine.

-p PULL_REQUEST, --pull-request PULL_REQUEST : A single PR number for limited testing When debugging pycicle, or your build scripts, to avoid spamming github, use a known PR numbe to tell pycicle to ignore all other PRs apart from that one.

-c, --scrape-only : Only scrape results and set github status (no building) When this is set, pycicle will not trigger any builds, it will only look for completed build logs on the remote machine and scrape them for the status it needs to set github PRs to enabled or disabled.

Installing/setting up

Create a pycicle directory on a machine, set $PYCICLE_ROOT to the path and add it to your bash startup so that a machine that ssh's in will have it set.

Running pycicle and doing build/tests on the same machine

(Note that this mode of operation might not work as it hasn't been used for a while but the setup steps are still valid for both modes).

# setup for machine that will do builds and run pycicle script
PYCICLE_ROOT=/user/biddisco/pycicle
mkdir -p $PYCICLE_ROOT
cd $PYCICLE_ROOT
# clone pycicle into the root
git clone https://github.com/biddisco/pycicle.git pycicle
# create a directory called `repos` where projects to be tested will go
# make a copy of your project git repository in the `repos` folder
cp -r /path/to/your/project/hpx $PYCICLE_ROOT/repos/hpx
# alternatively, clone your project into the `repos` folder
# git -C ./repos/ clone [email protected]:STEllAR-GROUP/hpx.git repos/

Note that if you are testing more than one project using the same tree then it is only necessary to clone/copy a second project into the repos folder.

Running pycicle on machine A, build/test on machine B

Follow the steps above on the machine that will do builds. On the machine that will run the scripts and trigger builds (on the remote machine)

# setup for machine that runs the pycicle script only
PYCICLE_ROOT=/user/biddisco/pycicle
mkdir -p $PYCICLE_ROOT
cd $PYCICLE_ROOT
# clone pycicle into the root
git clone https://github.com/biddisco/pycicle.git pycicle

Why do we keep a copy of the project repository in the pycicle root dir? The reason is that when initially developing pycicle on a laptop, using wifi internet access, it turned out to be very painful to git clone the entire HPX project for each PR being tested, and so pycicle will copy the repo from it's own private copy for each PR rather than cloning - this is much faster when the repo is many GBs. Note that each branch being tested will still be pulled from the origin (github), but this is much faster than a full clone. (NB. Doing a shallow clone isn't a great solution because you need to go back far enough to ensure the merge-base between the PR and the base branch is in the history).

After using the above setup, pycicle can be started using a command like

python $PYCICLE_ROOT/pycicle/pycicle.py -m MACHINE -P project

When it runs, two directories will be created

$PYCICLE_ROOT/src
$PYCICLE_ROOT/build

and these will be populated with source trees and build trees for PRs and the base branch when they need to be built.

Running pycicle on cluster A login node, it submits to itself

see:

config/dca_local/condaGPUTrunk_local.cmake

Running on a forked repo of individual (i.e. github.get_organization() => None

An example is in config/dca_local

It submits to a CDash server reverse tunneled to a localhost port on machine B (laptop)

still debugging this feature * Assuming cdash is setup on a httpd running on localhost:8080 of machine B. From machine B

ssh -fN -R38080:localhost:8080 you@clusterA-login-node

If you have load balancing on log in nodes make sure to explicitly raise the reverse tunnel on the login node pycicle is running on.

Inspect

The HPX project runs a tool called inspect on the code (similar to clang-format/style checks etc) to ensure that #includes are set correctly and basic format checks pass. Currently this is hardcoded into the ctest script as a prebuild step to do an extra configure and submit step to a different dashboard track. If you use pycicle to test non HPX projects, the inspect step will be skipped - at some point the scripts will be updated to allow a custom tool per project to be run.

Docs

Not yet implemented, but adding a doc build step to pycicle.py or the ctest scripts should be straightforward.

Config

The config directory contains examples of two slurm operated machines {greina/daint}, these can be copied/modified to create new configurations for other machines. The machine name passed on the commandline python pycicle.py -m daint (in this example daint) must correspond to the name of a cmake configuratiuon file for the machine in the config directory.

Details of the CMake Vars that need to be set will follow. Most is self explanatory for developers familiar with CMake/CTest.

Force rebuilds

In the $PYCICLE_ROOT directory of the machine that runs the pycicle script you can delete the file that holds the last checked SHA from github. This will trigger a new build for all PRs.

cd $PYCICLE_ROOT
find src -maxdepth 2 -name last_pr_sha.txt -delete

If you only want to force a rebuild for PR 3042, then

cd $PYCICLE_ROOT
rm -f src/${project-name}-3042/last_pr_sha.txt

NB. A command line param should be added to allow this to be done without manual deletion.

ToDo

I don't really know anything about python, so have no real idea if this works with python2 and python3. I think it does and I added a few imports to make it work, but it isn't tested.

pycicle's People

Contributors

Stargazers

Watchers

Forkers

msimberg pdoakornl ut-chg aurianer

pycicle's Issues

Clean binary dir before builds

Currently pycicle does not wipe binary dirs between builds. this can lead to invalid CMakeCache states that invalidate build/test results.

Question about compatibility and scope of the project with our setup

Hello,

Your project was referred to me by Hannes Vogt, and I am considering using pycicle in our organization (Polish meteorological services, IMGW) for testing development of our new dynamic core for the weather forecast.

Our project is built and tested exclusively by CMake and CTest (with make and make test). Besides C++ and CUDA it does have Fortran code. The git repository containing the project itself contains git submodules. The project itself is hosted on our internal GitLab instance, and some submodules are on GitHub. We wish to build the project on our supercomputer managed by the Torque (not Slurm), where compute nodes do not have access to the internet. We don't use CDash (but may start using it, if it is recommended).

Do you think this kind of project is within reach of the pycicle (or at least pycicle can be easily expanded to cover it)?

PRs from remote repositories are not handled correctly

When pycicle checks out a new branch to test a PR, it will not get the right SHA when the PR comes from a remote clone of the repo being tested.

The sequence used here assumes that the branch being tested comes from origin and therfore the SHA will not be fetched for PR from a clone

                       ${CTEST_GIT_COMMAND} checkout ${PYCICLE_MASTER};
                       ${CTEST_GIT_COMMAND} fetch origin;
                       ${CTEST_GIT_COMMAND} reset --hard origin/${PYCICLE_MASTER};
                       ${CTEST_GIT_COMMAND} branch -D ${GIT_BRANCH};
                       ${CTEST_GIT_COMMAND} checkout -b ${GIT_BRANCH};
                       ${CTEST_GIT_COMMAND} fetch origin ${PYCICLE_BRANCH};
                       ${CTEST_GIT_COMMAND} merge --no-edit FETCH_HEAD;
                       ${CTEST_GIT_COMMAND} checkout ${PYCICLE_MASTER};
                       ${CTEST_GIT_COMMAND} clean -fd;"

PYCICLE_GITHUB_WHATEVER vs PCYCILE_WHATEVER design

There is a bad smell around PYCICLE_GITHUB_BASE_BRANCH & PYCICLE_BASE
Just seems like a potentially disasterous alias for PICYCLE_BASE or vice versa
as far as I can tell they are set equal here and that's that. Could be for the others as well. Seems like the intention was to decouple the PYCICLE_GITHUB values from those passed to the CDash dashboard. But then GITHUB_ORG and GITHUB_PROJ are reused.

What about treating all variables/defines coming in from cmake configs as a dictionary with keys that are the cmake variable names. Most could just be carried straight across to the dashboard call.

Base branch for PR should handle any value

This will include refactoring master => base MASTER => BASE where appropriate to avoid confusion. It should not break hpx.

Make Pycicle work for any Project

In order to use pycicle for projects other than HPX, there are some HPX specific variables/links that need to be made into options or configurable elements.

Status for build/test being set during config

When the test project is first configured, ctest submits results and pycicle is updating not just the config, but also build and test status with 'ok' before they have actually run. They are correctly set later to 'fail' if there are problems, but they should be marked as running rather than ok.

Token as a command line option doesn't work

Using python 2.7.x

I find that only PYCICLE_GITHUB_TOKEN is all that works. I suspect this might be a unicode issue. Did some fooling with that, inconclusive.
Going to try pycicle in python 3, if that just works perhaps pycicle should just go python 3 only.

Need a flag to disable github status setting

When adding new features to pycicle, build/test fails that are caused by incorrect setup can make a project look bad by setting failed status on gihub PRs. There should be a --no-status or -n flag to disable status setting during testing of pycicle itself.

Allow setting combinations of build options

It would be useful to be able to pass a set of build options to the configs to easier run builds with combinations of options. These would be for example:

compiler
compiler version
dependency (e.g. boost) version
compiler/linker flags
sanitizer
and so on...

These can be generic though so that projects can define their own set of options. I'm not sure what the best syntax would be for this (could be command line options or a separate build options config file), but essentially one should be able to launch pycicle with:

project=hpx
machine=daint
build_options={compiler: gcc49, clang5; generic_option_1: value_1, value_2; generic_option_2: value_1, value_2}

pycicle would then generate the cartesian product of the build options and launch a build for each combination with ctest -S config/hpx/daint.cmake ... -Dcompiler=gcc49 -Dgeneric_option_1=value_1 -Dgeneric_option_2=value_1.

The build name should be derived from the build options to make them unique.

Conflicts/dependencies between build options are project specific anyway, so this should be handled on a per project basis in the config files.

One may not want to run builds for all combinations of build options so one could additionally:

launch multiple instances of pycicle to piece together the combinations one wants (will this work now?), or
have a way to include or exclude specific combinations (probably easiest to put in a separate file which is passed to pycicle)

Ideally the build options and config (daint.cmake) would be taken from the source repository but this might require a bigger refactoring.

Old build/src dirs are not cleaned up

Recent builds on CSCS machine daint failed with an error

sbatch: error: You are unable to submit jobs because you have exceeded the quota of   
1 million off files/folders (inodes) on the scratch file system.  
You can find more details about the quotas on the CSCS User Portal ...

There should be a cleanup step added that will either

cleanup any PRs that have been merged (requires iterating over all PRs since time began, aswe do not currently track them)
or cleanup and build/ or src/ dirs that are older than some time limit (like a week or so), this just means that some PRs will be purged and rebuild from a new checkout and new build dir when they are next triggered.

The second option is currently preferred for simplicity.

slurm job output goes into user's $HOME dir

Currently, the user's home directory gets all the slurm-123456.out and slurm-123456.err files. It ought to be able to put them somewhere else. A better default would be $PYCICLE_ROOT/temp with the option to put them somewhere else as a command-line or config setting.

Be careful with unicode raw strings

I think from readying this that for a regex like in our code that using

r"whatever" is the correct replacement for ur I think.

Store per-project pycicle settings in the project being tested

Allow pycicle to take its per project settings from the source project itself.

Each project that uses pycicle for testing can have a pycicle subdir that stores the testing/config settings, the main advantage being

The settings for the project can be maintained by the project owners without modifiying pycicle
Different branches could (in principle) have different test settings
Changes to options can be smoothly updated by the project

and lots more reasons

Add build options as configuration choices

When builds are started, currently, we use a fixed set of options for each 'machine'. A new config file with a different machine name can be used to create a second set of options for a the same machine, but it would be better to allow many options/choices to be setup and allow pycicle to pick some either at random - or according to some other rule.

Options would generally fall into categories

Compiler choice
Compiler flags (e.g. c++11/14 or sanitizer flags)
CMake options (WITH_XXX)
Library or dependency versions (boost/hwloc/other)

CI should be running hpx CI with new pycicle

Since hpx is the build canary

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.