dotnet / dotnet-ci Goto Github PK

View Code? Open in Web Editor NEW

84.0 43.0 135.0 177.18 MB

Repository containing scripting for the dotnet-ci Jenkins instance.

License: MIT License

Groovy 49.82% Python 0.53% Shell 2.61% Batchfile 0.19% PowerShell 46.84%

dotnet-ci's Introduction

.NET CI Repository

Contains documentation and some implementation functionality for the .NET CI system.

Getting started and need to onboard your project?

Start here.

Looking for the repo list to add another branch for your project?

Go here.

Looking for information on how to write your jobs

Looking for a general system overview?

Look here.

dotnet-ci's People

Contributors

Stargazers

Watchers

Forkers

mmitche makandok wutyi winsto dennievestin saper chcosta chrisboh codejuan modulexcite ellismg dilijev meteoritt chuck-mitchell crummel priya91 eerhardt leemgs kevinransom jashook bluemutedwisdom newlysoft brthor sebastiengllmt ericstj danmoseley bartonjs winstonhoseadriggins tmat iamjasonp jasonmalinowski sjsinju mellinoe michellemcdaniel pilchie tyoverby 333fred naamunds peoplearmy joperezr gkhanna79 livarcocc agocke roncain hongdai john-soklaski drewscoggins enricosada runt18 sbomer am11 cston drognanar codito fcastellacci jyoungyun therealpiotrp michaelsimons jonsequitur shawnro brettfo jayaranigarg cellule jmarolf swgillespie mlorbetske rchande maririos tmeschter davkean rainersigwald karajas jcouv mslaguana stephentoub wli3 woland2k nategraf bruceforstall ravimeda davfost dagood laoshan007 andygerlicher johnbeisner natidea vsadov alekseyts falkwinkler boingoing mattgal smadala cdmihai acidburn0zzz stephenbonikowsky sywhang pranavkm chborl sharwell shyam-gupta

dotnet-ci's Issues

Move all artifact archiving to azure blobstorage

Requires a couple things. Need to be able to tar/zip the artifacts, and requires Azure's artifact plugin has an "excludes".

This could alternatively be done in scripting.

Push/PRs triggers that launch sub-jobs should pass checked out commit hash to sub-jobs

Right now it's possible that the checked out hash is different for different sub-jobs of a flow job. There are a few reasons:

GitHub push events contain no info on what hash was pushed, so Jenkins can combine builds
PR merger builds do a merge from the upstream branch into the PR branch and check out the latest hash there. That means an intervening push to upstream could cause sub-jobs to check out different hashes

The right way to do this is to pass the checked out hash (seen in the environment as GIT_COMMIT) from the root job to the sub jobs. In practice, I saw some issues with this where the checked out hash was incorrectly determined (GIT_COMMIT was actually the previous job's GIT_COMMIT). This could be a result of incorrect EnvInject caching or something else entirely.

Builds fails occasionally with "XamlTaskFactory" could not be loaded

I think this is only happening for CMake based builds:

http://dotnet-ci.cloudapp.net/job/dotnet_llilc_debug_win32_prtest/637/consoleFull

Update timeouts on the Build Failure Analyzer

Update the scan timeouts on the build failure analyzer. May be able to be done through the JAVA_ARGS

Failures in some dependent jobs cause the DSL script for corefx to fail

See: http://dotnet-ci.cloudapp.net/job/dotnet_corefx_prtest/1533/

Linux images should automatically make /mnt writeable

The issue with this right now is that the method of daemon startup cannot garauntee that the /mnt is available prior to launching the Jenkins process because the Azure daemon starts through a different mechanism.

Design and develop multi-machine cloud plugin for Jenkins

There are cases when a single physical machine can have multiple OS's attached to it. For instance, a machine that can boot to both Linux and Windows. This is useful for lots of things, like measuring performance cross OS on the same hardware.

We need a way to manage such machines in Jenkins. The suggestion is to create a cloud plugin that has inherent knowledge of these processes. Starting with one attached machine that has multiple personalities, if a job is queued that is to be run on another OS on the same machine, the cloud plugin would take the machine offline, reboot or do what it needs, remove the original machine from Jenkins, and add the new one.

The coreclr CI-build for FreeBSD should have PAL-tests enabled.

With test-failures now fixed in master, we should try to enable PAL-tests for the coreclr FreeBSD CI-build, to prevent future regressions.

@mmitche Is this something you can handle on your own, or will I need to issue a PR?

Move "Ubuntu" to "Ubuntu14.04"

In our groovy scripts and machine labels we have taken "Ubuntu" to mean "Ubuntu 14.04". As we bring more Ubuntu platforms online (15.10 and someday 16.04) Ubuntu is a bad short name. We should update it and deal with the fallout (machine labels and such).

Create automated setup process for new CI instances

We should create an automated process for new CI instances. Here are the general steps. Some of this is already available:

Create a new VM in Azure with correct cloud services, endpoints, etc. Endpoint scripting is already available in dotnet-ci-internal repo (management\master)
Add disks to the VM for Jenkins data.
Use Chef or some other automated way to install Jenkins upon the creation and boot up of the VM.
Automate modification of the Jenkins setup (location of data, startup options etc.)
Automate installation of the standard set of plugins (this could be restored from a backup, it is copyable)
Start Jenkins
Automate startup of basic settings, including security, setup of the azure cloud. etc.

Alternatively, it may be that key config files and directories could simply by identified, tarred up from the main installation and applied. For instance, if I took everything from the Jenkins directory except for the jobs and fingerprints, I believe I would have a full installation. It would slightly be tied to the server name, but that could be customized by editing afterwards.

Anyways, the really critical stuff here is:

creds
plugins
main system config

Investigate moving more complex workflows to the Pipeline plugin

Investigate moving more of the complicated multi-machine workflows to "pipeline".

There will be some challenges:

Still need to create the pipeline job, so job dsl will have to integrate.
Pipeline text could be checked in and read from git during job creation, but you can't execute workflow jobs directly for each check-in/PR from the enlistment, since it allows you to alter what machines things run on. we could potentially work around this some other way
There are inter-project dependencies CoreFx requires the CoreCLR build artifacts from the mid-stage of the CoreCLR test pipeline for non-Windows OS's. This probably isn't hard to deal with (add a separate pipeline for the CoreCLR build)
We use a lot of extra features on the job - timeouts, workspace wiping, etc. that needs to stick around.

/cc @sejongoh

Complete move to auto-imaged windows VMs

Complete the move to the auto-imaged Windows VMs, rather than using the static pool

PR/nonPR jobs should be generated at the top level into different folders

Today we usually have the following:

We generate a folder for a project and a subfolder for a branch, then generate jobs into that branch
In the netci groovy files we usually have the following pattern:

[true, false].each { isPR -> ...

This is a little inefficient. PR jobs pollute the rolling/main folder, causing additional data to be loaded all the time when most people only want to look at the rolling data.

Instead, the MetaGenerator could generate a PR/nonPR folder, then set up a generator job for both and call each with the appropriate PR/nonPR job parameter.

Configs wishing to do this would be rewritten, just removing the "[true, false].each { isPR ->". By default, we should not generate the folders for this, only opt-in for now using an additional option in the repo list.

Add semantic diff capabilities to the local job generation script

Semantic diff capability should be added on top of Local-Job-Gen. I am thinking:

Script that calls Local-Job-Gen for the current checked out repo, providing all the input parameters (temp output dir, project name, etc.) and then also for the master in the repo.
Parses the results providing a list of added/removed jobs and a diff for changed jobs
Potentially parses the results to note certain things that are changed: triggers, steps, etc.

dci-ub-fbld-5 may not have all the LLDB stuff installed?

Builds of coreclr on this machine are failing like this:

http://dotnet-ci.cloudapp.net/job/dotnet_coreclr_linux_release/729/console

Looks like it hasn't done a clean coreclr build in a while:

http://dotnet-ci.cloudapp.net/computer/dci-ub-fbld-5/builds

Scripting for unix vm creation

FreeBSD CI-builds are failing

As can be seen on via Jenkins: http://dotnet-ci.cloudapp.net/job/dotnet_coreclr_freebsd_debug/

Looking into the first failing build, we have this log:

Cloning the remote Git repository
Cloning repository git://github.com/dotnet/coreclr.git
 > git init /mnt/tmpdrive/j/workspace/dotnet_coreclr_freebsd_debug # timeout=10
ERROR: Error cloning remote repo 'origin'
ERROR: Error cloning remote repo 'origin'

Failure to clone the repository could hint at the disks being full. But subsequent builds are failing for other reasons. That latest build gets to 16%, then suddenly Jenkins seems to disconnect and call it a day, so obviously something else is going on too.

@mmitche Looking at the build log It seem's you're already on this? :)

Put jenkins config under source control

There is a plugin for this kind of thing.

Need a way to prioritize builds

It looks like there's a few plug-ins for prioritizing specific jobs (PR job vs rolling job), but there's none for prioritizing individual builds (Bob's build before Sam's).

@Petermarcu states:

Do you know if there is a way to prioritize certain items in the queue? We had a change that had to get into the build tonight blocked by this today and im sure many of the things it was behind weren't business critical. We ended up just merging without CI testing because it had to get into tonight's build.

PR triggers should be more flexible about their target branches

PR triggers today can apply to a specific branch. We don't easily allow through the Utilities to apply to multiple branches. This should be fixed.

In addition, a change to the ghprb should be made that can allow for the trigger to NOT apply to specific branches.

The other tracked branches for a repo should also be passed into the generators, so that logic like:

'Apply this PR trigger to master and to any other branch that isn't tracked'

dotnet restore - failed

Installed on MAC OS.

Executed 'dotnet new'
then 'dotnet restore'

it throws following error

error: Unable to load the service index for source https://api.nuget.org/v3/index.json.
error:   The type initializer for 'Crypto' threw an exception.
error:   Unable to load DLL 'System.Security.Cryptography.Native': The specified module could not be found.
error:    (Exception from HRESULT: 0x8007007E)```

Improve CI documentation for writing netci

Create job or script for automated rebooting of machines

We need an easy way to reboot machines if they get wedged or whatnot. This should be done so that even if the machine is not in the Azure pool (or if it's Linux, etc.) it can be rebooted. And you shouldn't have to go to the Azure portal.

Consider automating machine setup with Chef or something

It may make more sense to move to a model where we have scripts that take a clean machine and install all the required software on the CI machines instead of the model where we do brain surgery on a VM and then capture a new image.

This would make the stuff installed on a machine explicit and we could version it in this repo.

When a branch is removed or added to the branch list run/delete associated generators

When removing a branch or repo, the folder should be disabled. When adding one, the generator should be run.

SSL support on the CI

It would be nice to have SSL support on the CI site (i.e. https://dotnet-ci.cloudapp.net). I'm working on a script that downloads the latest build and would like to know I'm getting the real build from the right place :).

Should we keep Jenkins results around longer?

Ported from dotnet/roslyn#6337

We have a few PRs on dotnet/roslyn that have been around long enough that the Jenkins results have been dropped. Is the dropping of build/test results intentional? Is it based on elapsed time or some fixed number of results per queue? Ideally we'd keep results around forever, right? Are we running into storage issues?

Update local job generation script to reference correct plugin version

CoreCLR builds for Windows on ARM

Windows on ARM builds should be included as they work well when compiled in my machine. Why aren't they tested by the bot?

Modify system to be branch specific

Right now we depend on the master branch having all of the definitions for the jobs for all target branches. Ideally we want:

The jobs for a branch to be read from that branch.
Pull request jobs should be generated and run on a per-branch basis. The trigger isn't branch specific but it can be set up to only run for PRs to certain target branches.

I think the best way to implement this is to change the repo list to include the branch name. The subfolder structure would be modified to the following:

/dotnet_coreclr/

PR jobs for specific branches would go ijn their specific folders, whitelisted for that target branch

There are some challenges:

Workspace name length becomes an issue
Larger accumulation of PR jobs

Sync time across machines to time server

Currently, machines across the dotnet-ci environment can be a few minutes ahead or behind others. As we start implementing cross-machine tests with security (e.g., SSL / Kerberos), significant time skews may cause issues with these tests.

Consider running an NTP client to get time to be consistent across machines - i.e., no more than a few seconds of drift either way.

Generators should only run when netci/dotnet-ci is changed

Today we rerun the generators whenever a repo or dotnet-ci changes. This should be altered so that it only reruns when netci/dotnet-ci changes. The ideal way to do this is actually to have the SCM poll, say every 15 minutes or so and set up the polling specifically to ignore certain paths for the generators. I think the way things are done now may contribute to memory leaks over time.

The challenge here is that there is a "polling ignores commits to X" option for git, but it is not directly configurable, so a configure block is needed. This is entirely doable though.

Would be nice: Combined prtest/private job/official job types

Right now we have to have individual jobs for prtest and official, to keep the badges (and general job views) clean. We would also have to do the same for private jobs (if available) since they couldn't be shared with PR tests. This is clunky and involves a lot of duplicated job logic. Would be awesome to have a Jenkins plugin that that allows some kind of job type.

Move debian images to auto-image

Move the debian VMs to auto-image

Disable error reporting UI on Windows

The error reporting UI causes problems (as it blocks a process from ending). Disable on Windows.

PlatformException gets thrown when trying to run dotnet-cli

I just installed the latest beta (1.0.0-beta-001598) on my Mac running OS X 10.11.3.
When I try to run the dotnet utility using the following command:

dotnet --version

I run into the following nasty error.
It seems as if it cannot find libc somehow.

Unhandled Exception: System.TypeInitializationException: The type initializer for 'Microsoft.Extensions.PlatformAbstractions.PlatformServices' threw an exception. ---> System.PlatformNotSupportedException: Error reading Darwin Kernel Version ---> System.DllNotFoundException: Unable to load DLL 'libc': The specified module could not be found.
 (Exception from HRESULT: 0x8007007E)
   at Microsoft.Extensions.PlatformAbstractions.Native.NativeMethods.Darwin.sysctl(Int32* name, UInt32 namelen, Byte* oldp, UInt32* oldlenp, IntPtr newp, UInt32 newlen)
   at Microsoft.Extensions.PlatformAbstractions.Native.NativeMethods.Darwin.GetKernelRelease()
   --- End of inner exception stack trace ---
   at Microsoft.Extensions.PlatformAbstractions.Native.NativeMethods.Darwin.GetKernelRelease()
   at Microsoft.Extensions.PlatformAbstractions.Native.PlatformApis.GetDarwinVersion()
   at Microsoft.Extensions.PlatformAbstractions.Native.PlatformApis.GetOSVersion()
   at Microsoft.Extensions.PlatformAbstractions.DefaultRuntimeEnvironment..ctor()
   at Microsoft.Extensions.PlatformAbstractions.DefaultPlatformServices..ctor()
   at Microsoft.Extensions.PlatformAbstractions.PlatformServices..cctor()
   --- End of inner exception stack trace ---
   at Microsoft.DotNet.Cli.Program.PrintVersionInfo()
   at Microsoft.DotNet.Cli.Program.ProcessArgs(String[] args)
   at Microsoft.DotNet.Cli.Program.Main(String[] args)

Improve CI documentation for system troubleshooting

Exclude PRs containing only *.md changes from building

Currently, we prevent a build from happening when the only file changed is netci.groovy.

We should also leverage this functionality to prevent CI builds when the only files changed are *.md files.

Enable developer mode for OSX machine

In order to make debuggertests run on OSX. We needs the developer mode enabled on OSX. If it is doable, then please make it on:) @mmitche

Install "Simple Theme" and "Dashboard View" plugins

Simple Theme Plugin
Dashboard View

Machines should not wipe workspace after runs if they are taken offline during the run

To avoid disk space issues, we wipe the workspace before and after runs. This is fine, except when someone needs to investigate something. In that case, the workspace is gone before there is a chance to look at it.

Although I do not know whether it's easy to achieve, it would be nice to have the workspace not be wiped if the machine is offline at the end of the build. This would allow for easy investigation.

RDP and SSH ports should be predictable based on machine name

In order to facilitate machine management, the RDP and SSH ports for dotnet-ci nodes should be predictable.

Nightly system cleaner should work with pipeline jobs

Currently pipeline jobs aren't based on Job, and so don't have a disabled property

CI should allow for private job submissions

The CI should allow someone to submit a private job for their repo/branch (not PR test)

Windows auto-image setup script should be reworked

Currently, there are some issues with the auto-image setup script on windows.

We need to have desktop access and the VM needs to connect to Jenkins (this is vs. SSH on Linux, which is simpler). To do this, today the setup script installs a startup task (on log-in) as dotnet-bot, then sets up auto-login for dotnet-bot. It then restarts the machine. The issue here is that the script is running as the system user, and there seems to be some level of non-determinism in how long it takes for the full dotnet-bot user to appear. There was a wait installed of about 5 minutes which helps, but this is inefficient.

The suggested workflow is to add the auto login into first, then restart.

Download files as necessary (maybe put a loop here to avoid 404s)
Install user as auto-login
Put connection script into legacy startup folder??

/cc @tannergooding

Set up a special job or website to get the login info of auto-imaged VMs

Auto imaged VMs do not have predictable ports. To access them it is necessary to find the endpoint in Azure. Lots of users do not have the Azure subscription set up on their machines. We should have a Jenkins job or web service that talks to a machine with the subscription to get the connection info.

CI boxes should not run elevated

Unable to complete builds due to inability to delete project workspace

In the buildtools repo, the following error message causes a failure to clean the project workspace leading to a failed build.

3:06:41 Using context: Innerloop Windows Debug
13:06:42 Building remotely on Azure0328100701 (auto-win2012-20160325) in workspace D:\j\workspace\innerloop_prtest4ca7949f
13:06:42 [WS-CLEANUP] Deleting project workspace...
13:06:48
ERROR: [WS-CLEANUP] Cannot delete workspace: remote file operation failed: D:\j\workspace\innerloop_prtest4ca7949f at hudson.remoting.Channel@2aaf2a22:Azure0328100701: hudson.remoting.ChannelClosedException: channel is already closed
13:06:48 ERROR: Cannot delete workspace: remote file operation failed: D:\j\workspace\innerloop_prtest4ca7949f at hudson.remoting.Channel@2aaf2a22:Azure0328100701: hudson.remoting.ChannelClosedException: channel is already closed
13:06:48 [BFA] Scanning build for known causes...
13:06:48 [BFA] Scanning build for known causes...
13:06:48 [BFA] Found failure cause(s):
13:06:48 [BFA] Hung processs on target machine from category Infrastructure
13:06:48 [BFA] Done. 0s
13:06:48 Setting status of 38525666293756dc6fca04171a8125707c1414bc to FAILURE with url http://dotnet-ci.cloudapp.net/job/dotnet_buildtools/job/master/job/innerloop_prtest/264/ and message: 'Build finished. No test results found.'

/cc: @mmitche

Enable ARM and AArch64 CI build for CoreCLR

With dotnet/coreclr#1292 and dotnet/coreclr#1210 merged, please enable CI build jobs for ARM and AArch64 architectures and add the respective badges in coreclr repo's README (under Build Status), so others can follow the support status.

@mmitche, @saper

Local-Job-Gen.ps1 cannot compile coreclr/netci.groovy

Local-Job-Gen.ps1 cannot compile coreclr/netcli.groovy with the following error.

[JarClassLoader] INFO:  findResource(): unable to locate "jobs/generation/UtilitiesCustomizer.groovy"                                 
[JarClassLoader] INFO:  findResource(): unable to locate "jobs/generation/UtilitiesCustomizer.groovy"                                 
Exception in thread "main" java.lang.reflect.InvocationTargetException                                                                
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)                                                                
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)                                                                
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)                                                            
        at java.lang.reflect.Method.invoke(Unknown Source)                                                                            
        at com.simontuffs.onejar.Boot.run(Boot.java:313)                                                                              
        at com.simontuffs.onejar.Boot.main(Boot.java:161)                                                                             
Caused by: javaposse.jobdsl.dsl.DslScriptException: (combinednetci.groovy, line 421) No signature of method: javaposse.jobdsl.dsl.help
rs.triggers.TriggerContext.githubPullRequest() is applicable for argument types: (jobs.generation.Utilities$_addGithubPRTriggerImpl_cl
sure8_closure29_closure30) values: [jobs.generation.Utilities$_addGithubPRTriggerImpl_closure8_closure29_closure30@402e37bc]          
        at javaposse.jobdsl.dsl.DslScriptLoader.runDslEngineForParent(DslScriptLoader.java:79)                                        
        at javaposse.jobdsl.dsl.DslScriptLoader.runDslEngine(DslScriptLoader.java:135)                                                
        at javaposse.jobdsl.dsl.DslScriptLoader$runDslEngine.call(Unknown Source)                                                     
        at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:42)                                      
        at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:108)                                      
        at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:120)                                      
        at javaposse.jobdsl.Run$_main_closure2.doCall(Run.groovy:36)                                                                  
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

def private static addGithubPRTriggerImpl(def job, String branchName, String contextString, 
String triggerPhraseString, boolean triggerOnPhraseOnly, boolean permitAllSubmittters, Iterable<String> permittedOrgs, Iterable<String> permittedUsers) {
        job.with {
            triggers {
                githubPullRequest {  <- line 421
                    useGitHubHooks()
                    if (permitAllSubmittters) {
                        admin('Microsoft')
                    }
                    admin('mmitche')
                    if (permitAllSubmittters) {