Coder Social home page Coder Social logo

dotnet-ci's Introduction

.NET CI Repository

Contains documentation and some implementation functionality for the .NET CI system.

Getting started and need to onboard your project?

Start here.

Looking for the repo list to add another branch for your project?

Go here.

Looking for information on how to write your jobs

Looking for a general system overview?

Look here.

dotnet-ci's People

Contributors

333fred avatar agocke avatar brettfo avatar chcosta avatar crummel avatar dagood avatar davkean avatar dilijev avatar drewscoggins avatar dustincampbell avatar ellismg avatar jaredpar avatar jasonmalinowski avatar jcouv avatar jmarolf avatar karajas avatar maririos avatar mattgal avatar mellinoe avatar michaelsimons avatar michellemcdaniel avatar mmitche avatar naamunds avatar russkeldorph avatar shyam-gupta avatar shyamnamboodiripad avatar smile21prc avatar tannergooding avatar tmat avatar weshaggard avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dotnet-ci's Issues

Push/PRs triggers that launch sub-jobs should pass checked out commit hash to sub-jobs

Right now it's possible that the checked out hash is different for different sub-jobs of a flow job. There are a few reasons:

  1. GitHub push events contain no info on what hash was pushed, so Jenkins can combine builds
  2. PR merger builds do a merge from the upstream branch into the PR branch and check out the latest hash there. That means an intervening push to upstream could cause sub-jobs to check out different hashes

The right way to do this is to pass the checked out hash (seen in the environment as GIT_COMMIT) from the root job to the sub jobs. In practice, I saw some issues with this where the checked out hash was incorrectly determined (GIT_COMMIT was actually the previous job's GIT_COMMIT). This could be a result of incorrect EnvInject caching or something else entirely.

Linux images should automatically make /mnt writeable

The issue with this right now is that the method of daemon startup cannot garauntee that the /mnt is available prior to launching the Jenkins process because the Azure daemon starts through a different mechanism.

Design and develop multi-machine cloud plugin for Jenkins

There are cases when a single physical machine can have multiple OS's attached to it. For instance, a machine that can boot to both Linux and Windows. This is useful for lots of things, like measuring performance cross OS on the same hardware.

We need a way to manage such machines in Jenkins. The suggestion is to create a cloud plugin that has inherent knowledge of these processes. Starting with one attached machine that has multiple personalities, if a job is queued that is to be run on another OS on the same machine, the cloud plugin would take the machine offline, reboot or do what it needs, remove the original machine from Jenkins, and add the new one.

Move "Ubuntu" to "Ubuntu14.04"

In our groovy scripts and machine labels we have taken "Ubuntu" to mean "Ubuntu 14.04". As we bring more Ubuntu platforms online (15.10 and someday 16.04) Ubuntu is a bad short name. We should update it and deal with the fallout (machine labels and such).

Create automated setup process for new CI instances

We should create an automated process for new CI instances. Here are the general steps. Some of this is already available:

  1. Create a new VM in Azure with correct cloud services, endpoints, etc. Endpoint scripting is already available in dotnet-ci-internal repo (management\master)
  2. Add disks to the VM for Jenkins data.
  3. Use Chef or some other automated way to install Jenkins upon the creation and boot up of the VM.
  4. Automate modification of the Jenkins setup (location of data, startup options etc.)
  5. Automate installation of the standard set of plugins (this could be restored from a backup, it is copyable)
  6. Start Jenkins
  7. Automate startup of basic settings, including security, setup of the azure cloud. etc.

Alternatively, it may be that key config files and directories could simply by identified, tarred up from the main installation and applied. For instance, if I took everything from the Jenkins directory except for the jobs and fingerprints, I believe I would have a full installation. It would slightly be tied to the server name, but that could be customized by editing afterwards.

Anyways, the really critical stuff here is:

  • creds
  • plugins
  • main system config

Investigate moving more complex workflows to the Pipeline plugin

Investigate moving more of the complicated multi-machine workflows to "pipeline".

There will be some challenges:

  • Still need to create the pipeline job, so job dsl will have to integrate.
  • Pipeline text could be checked in and read from git during job creation, but you can't execute workflow jobs directly for each check-in/PR from the enlistment, since it allows you to alter what machines things run on. we could potentially work around this some other way
  • There are inter-project dependencies CoreFx requires the CoreCLR build artifacts from the mid-stage of the CoreCLR test pipeline for non-Windows OS's. This probably isn't hard to deal with (add a separate pipeline for the CoreCLR build)
  • We use a lot of extra features on the job - timeouts, workspace wiping, etc. that needs to stick around.

/cc @sejongoh

PR/nonPR jobs should be generated at the top level into different folders

Today we usually have the following:

  • We generate a folder for a project and a subfolder for a branch, then generate jobs into that branch
  • In the netci groovy files we usually have the following pattern:

[true, false].each { isPR -> ...

This is a little inefficient. PR jobs pollute the rolling/main folder, causing additional data to be loaded all the time when most people only want to look at the rolling data.

Instead, the MetaGenerator could generate a PR/nonPR folder, then set up a generator job for both and call each with the appropriate PR/nonPR job parameter.

Configs wishing to do this would be rewritten, just removing the "[true, false].each { isPR ->". By default, we should not generate the folders for this, only opt-in for now using an additional option in the repo list.

Add semantic diff capabilities to the local job generation script

Semantic diff capability should be added on top of Local-Job-Gen. I am thinking:

  • Script that calls Local-Job-Gen for the current checked out repo, providing all the input parameters (temp output dir, project name, etc.) and then also for the master in the repo.
  • Parses the results providing a list of added/removed jobs and a diff for changed jobs
  • Potentially parses the results to note certain things that are changed: triggers, steps, etc.

FreeBSD CI-builds are failing

As can be seen on via Jenkins: http://dotnet-ci.cloudapp.net/job/dotnet_coreclr_freebsd_debug/

Looking into the first failing build, we have this log:

Cloning the remote Git repository
Cloning repository git://github.com/dotnet/coreclr.git
 > git init /mnt/tmpdrive/j/workspace/dotnet_coreclr_freebsd_debug # timeout=10
ERROR: Error cloning remote repo 'origin'
ERROR: Error cloning remote repo 'origin'

Failure to clone the repository could hint at the disks being full. But subsequent builds are failing for other reasons. That latest build gets to 16%, then suddenly Jenkins seems to disconnect and call it a day, so obviously something else is going on too.

@mmitche Looking at the build log It seem's you're already on this? :)

Need a way to prioritize builds

It looks like there's a few plug-ins for prioritizing specific jobs (PR job vs rolling job), but there's none for prioritizing individual builds (Bob's build before Sam's).

@Petermarcu states:

Do you know if there is a way to prioritize certain items in the queue? We had a change that had to get into the build tonight blocked by this today and im sure many of the things it was behind weren't business critical. We ended up just merging without CI testing because it had to get into tonight's build.

PR triggers should be more flexible about their target branches

PR triggers today can apply to a specific branch. We don't easily allow through the Utilities to apply to multiple branches. This should be fixed.

In addition, a change to the ghprb should be made that can allow for the trigger to NOT apply to specific branches.

The other tracked branches for a repo should also be passed into the generators, so that logic like:

'Apply this PR trigger to master and to any other branch that isn't tracked'

dotnet restore - failed

Installed on MAC OS.

Executed 'dotnet new'
then 'dotnet restore'

it throws following error

error: Unable to load the service index for source https://api.nuget.org/v3/index.json.
error:   The type initializer for 'Crypto' threw an exception.
error:   Unable to load DLL 'System.Security.Cryptography.Native': The specified module could not be found.
error:    (Exception from HRESULT: 0x8007007E)```

Create job or script for automated rebooting of machines

We need an easy way to reboot machines if they get wedged or whatnot. This should be done so that even if the machine is not in the Azure pool (or if it's Linux, etc.) it can be rebooted. And you shouldn't have to go to the Azure portal.

Consider automating machine setup with Chef or something

It may make more sense to move to a model where we have scripts that take a clean machine and install all the required software on the CI machines instead of the model where we do brain surgery on a VM and then capture a new image.

This would make the stuff installed on a machine explicit and we could version it in this repo.

SSL support on the CI

It would be nice to have SSL support on the CI site (i.e. https://dotnet-ci.cloudapp.net). I'm working on a script that downloads the latest build and would like to know I'm getting the real build from the right place :).

Should we keep Jenkins results around longer?

Ported from dotnet/roslyn#6337

We have a few PRs on dotnet/roslyn that have been around long enough that the Jenkins results have been dropped. Is the dropping of build/test results intentional? Is it based on elapsed time or some fixed number of results per queue? Ideally we'd keep results around forever, right? Are we running into storage issues?

Modify system to be branch specific

Right now we depend on the master branch having all of the definitions for the jobs for all target branches. Ideally we want:

  • The jobs for a branch to be read from that branch.
  • Pull request jobs should be generated and run on a per-branch basis. The trigger isn't branch specific but it can be set up to only run for PRs to certain target branches.

I think the best way to implement this is to change the repo list to include the branch name. The subfolder structure would be modified to the following:

/dotnet_coreclr/

PR jobs for specific branches would go ijn their specific folders, whitelisted for that target branch

There are some challenges:

  • Workspace name length becomes an issue
  • Larger accumulation of PR jobs

Sync time across machines to time server

Currently, machines across the dotnet-ci environment can be a few minutes ahead or behind others. As we start implementing cross-machine tests with security (e.g., SSL / Kerberos), significant time skews may cause issues with these tests.

Consider running an NTP client to get time to be consistent across machines - i.e., no more than a few seconds of drift either way.

Generators should only run when netci/dotnet-ci is changed

Today we rerun the generators whenever a repo or dotnet-ci changes. This should be altered so that it only reruns when netci/dotnet-ci changes. The ideal way to do this is actually to have the SCM poll, say every 15 minutes or so and set up the polling specifically to ignore certain paths for the generators. I think the way things are done now may contribute to memory leaks over time.

The challenge here is that there is a "polling ignores commits to X" option for git, but it is not directly configurable, so a configure block is needed. This is entirely doable though.

Would be nice: Combined prtest/private job/official job types

Right now we have to have individual jobs for prtest and official, to keep the badges (and general job views) clean. We would also have to do the same for private jobs (if available) since they couldn't be shared with PR tests. This is clunky and involves a lot of duplicated job logic. Would be awesome to have a Jenkins plugin that that allows some kind of job type.

PlatformException gets thrown when trying to run dotnet-cli

I just installed the latest beta (1.0.0-beta-001598) on my Mac running OS X 10.11.3.
When I try to run the dotnet utility using the following command:

dotnet --version

I run into the following nasty error.
It seems as if it cannot find libc somehow.

Unhandled Exception: System.TypeInitializationException: The type initializer for 'Microsoft.Extensions.PlatformAbstractions.PlatformServices' threw an exception. ---> System.PlatformNotSupportedException: Error reading Darwin Kernel Version ---> System.DllNotFoundException: Unable to load DLL 'libc': The specified module could not be found.
 (Exception from HRESULT: 0x8007007E)
   at Microsoft.Extensions.PlatformAbstractions.Native.NativeMethods.Darwin.sysctl(Int32* name, UInt32 namelen, Byte* oldp, UInt32* oldlenp, IntPtr newp, UInt32 newlen)
   at Microsoft.Extensions.PlatformAbstractions.Native.NativeMethods.Darwin.GetKernelRelease()
   --- End of inner exception stack trace ---
   at Microsoft.Extensions.PlatformAbstractions.Native.NativeMethods.Darwin.GetKernelRelease()
   at Microsoft.Extensions.PlatformAbstractions.Native.PlatformApis.GetDarwinVersion()
   at Microsoft.Extensions.PlatformAbstractions.Native.PlatformApis.GetOSVersion()
   at Microsoft.Extensions.PlatformAbstractions.DefaultRuntimeEnvironment..ctor()
   at Microsoft.Extensions.PlatformAbstractions.DefaultPlatformServices..ctor()
   at Microsoft.Extensions.PlatformAbstractions.PlatformServices..cctor()
   --- End of inner exception stack trace ---
   at Microsoft.DotNet.Cli.Program.PrintVersionInfo()
   at Microsoft.DotNet.Cli.Program.ProcessArgs(String[] args)
   at Microsoft.DotNet.Cli.Program.Main(String[] args)

Machines should not wipe workspace after runs if they are taken offline during the run

To avoid disk space issues, we wipe the workspace before and after runs. This is fine, except when someone needs to investigate something. In that case, the workspace is gone before there is a chance to look at it.

Although I do not know whether it's easy to achieve, it would be nice to have the workspace not be wiped if the machine is offline at the end of the build. This would allow for easy investigation.

Windows auto-image setup script should be reworked

Currently, there are some issues with the auto-image setup script on windows.

We need to have desktop access and the VM needs to connect to Jenkins (this is vs. SSH on Linux, which is simpler). To do this, today the setup script installs a startup task (on log-in) as dotnet-bot, then sets up auto-login for dotnet-bot. It then restarts the machine. The issue here is that the script is running as the system user, and there seems to be some level of non-determinism in how long it takes for the full dotnet-bot user to appear. There was a wait installed of about 5 minutes which helps, but this is inefficient.

The suggested workflow is to add the auto login into first, then restart.

  1. Download files as necessary (maybe put a loop here to avoid 404s)
  2. Install user as auto-login
  3. Put connection script into legacy startup folder??

/cc @tannergooding

Set up a special job or website to get the login info of auto-imaged VMs

Auto imaged VMs do not have predictable ports. To access them it is necessary to find the endpoint in Azure. Lots of users do not have the Azure subscription set up on their machines. We should have a Jenkins job or web service that talks to a machine with the subscription to get the connection info.

Unable to complete builds due to inability to delete project workspace

In the buildtools repo, the following error message causes a failure to clean the project workspace leading to a failed build.

3:06:41 Using context: Innerloop Windows Debug
13:06:42 Building remotely on Azure0328100701 (auto-win2012-20160325) in workspace D:\j\workspace\innerloop_prtest4ca7949f
13:06:42 [WS-CLEANUP] Deleting project workspace...
13:06:48
ERROR: [WS-CLEANUP] Cannot delete workspace: remote file operation failed: D:\j\workspace\innerloop_prtest4ca7949f at hudson.remoting.Channel@2aaf2a22:Azure0328100701: hudson.remoting.ChannelClosedException: channel is already closed
13:06:48 ERROR: Cannot delete workspace: remote file operation failed: D:\j\workspace\innerloop_prtest4ca7949f at hudson.remoting.Channel@2aaf2a22:Azure0328100701: hudson.remoting.ChannelClosedException: channel is already closed
13:06:48 [BFA] Scanning build for known causes...
13:06:48 [BFA] Scanning build for known causes...
13:06:48 [BFA] Found failure cause(s):
13:06:48 [BFA] Hung processs on target machine from category Infrastructure
13:06:48 [BFA] Done. 0s
13:06:48 Setting status of 38525666293756dc6fca04171a8125707c1414bc to FAILURE with url http://dotnet-ci.cloudapp.net/job/dotnet_buildtools/job/master/job/innerloop_prtest/264/ and message: 'Build finished. No test results found.'

/cc: @mmitche

Local-Job-Gen.ps1 cannot compile coreclr/netci.groovy

Local-Job-Gen.ps1 cannot compile coreclr/netcli.groovy with the following error.

[JarClassLoader] INFO:  findResource(): unable to locate "jobs/generation/UtilitiesCustomizer.groovy"                                 
[JarClassLoader] INFO:  findResource(): unable to locate "jobs/generation/UtilitiesCustomizer.groovy"                                 
Exception in thread "main" java.lang.reflect.InvocationTargetException                                                                
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)                                                                
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)                                                                
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)                                                            
        at java.lang.reflect.Method.invoke(Unknown Source)                                                                            
        at com.simontuffs.onejar.Boot.run(Boot.java:313)                                                                              
        at com.simontuffs.onejar.Boot.main(Boot.java:161)                                                                             
Caused by: javaposse.jobdsl.dsl.DslScriptException: (combinednetci.groovy, line 421) No signature of method: javaposse.jobdsl.dsl.help
rs.triggers.TriggerContext.githubPullRequest() is applicable for argument types: (jobs.generation.Utilities$_addGithubPRTriggerImpl_cl
sure8_closure29_closure30) values: [jobs.generation.Utilities$_addGithubPRTriggerImpl_closure8_closure29_closure30@402e37bc]          
        at javaposse.jobdsl.dsl.DslScriptLoader.runDslEngineForParent(DslScriptLoader.java:79)                                        
        at javaposse.jobdsl.dsl.DslScriptLoader.runDslEngine(DslScriptLoader.java:135)                                                
        at javaposse.jobdsl.dsl.DslScriptLoader$runDslEngine.call(Unknown Source)                                                     
        at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:42)                                      
        at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:108)                                      
        at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:120)                                      
        at javaposse.jobdsl.Run$_main_closure2.doCall(Run.groovy:36)                                                                  
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)                                                                
def private static addGithubPRTriggerImpl(def job, String branchName, String contextString, 
String triggerPhraseString, boolean triggerOnPhraseOnly, boolean permitAllSubmittters, Iterable<String> permittedOrgs, Iterable<String> permittedUsers) {
        job.with {
            triggers {
                githubPullRequest {  <- line 421
                    useGitHubHooks()
                    if (permitAllSubmittters) {
                        admin('Microsoft')
                    }
                    admin('mmitche')
                    if (permitAllSubmittters) {

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.