Contains documentation and some implementation functionality for the .NET CI system.
Start here.
Go here.
Look here.
Repository containing scripting for the dotnet-ci Jenkins instance.
License: MIT License
Requires a couple things. Need to be able to tar/zip the artifacts, and requires Azure's artifact plugin has an "excludes".
This could alternatively be done in scripting.
Right now it's possible that the checked out hash is different for different sub-jobs of a flow job. There are a few reasons:
The right way to do this is to pass the checked out hash (seen in the environment as GIT_COMMIT) from the root job to the sub jobs. In practice, I saw some issues with this where the checked out hash was incorrectly determined (GIT_COMMIT was actually the previous job's GIT_COMMIT). This could be a result of incorrect EnvInject caching or something else entirely.
I think this is only happening for CMake based builds:
http://dotnet-ci.cloudapp.net/job/dotnet_llilc_debug_win32_prtest/637/consoleFull
Update the scan timeouts on the build failure analyzer. May be able to be done through the JAVA_ARGS
The issue with this right now is that the method of daemon startup cannot garauntee that the /mnt is available prior to launching the Jenkins process because the Azure daemon starts through a different mechanism.
There are cases when a single physical machine can have multiple OS's attached to it. For instance, a machine that can boot to both Linux and Windows. This is useful for lots of things, like measuring performance cross OS on the same hardware.
We need a way to manage such machines in Jenkins. The suggestion is to create a cloud plugin that has inherent knowledge of these processes. Starting with one attached machine that has multiple personalities, if a job is queued that is to be run on another OS on the same machine, the cloud plugin would take the machine offline, reboot or do what it needs, remove the original machine from Jenkins, and add the new one.
With test-failures now fixed in master, we should try to enable PAL-tests for the coreclr FreeBSD CI-build, to prevent future regressions.
@mmitche Is this something you can handle on your own, or will I need to issue a PR?
In our groovy scripts and machine labels we have taken "Ubuntu" to mean "Ubuntu 14.04". As we bring more Ubuntu platforms online (15.10 and someday 16.04) Ubuntu is a bad short name. We should update it and deal with the fallout (machine labels and such).
We should create an automated process for new CI instances. Here are the general steps. Some of this is already available:
Alternatively, it may be that key config files and directories could simply by identified, tarred up from the main installation and applied. For instance, if I took everything from the Jenkins directory except for the jobs and fingerprints, I believe I would have a full installation. It would slightly be tied to the server name, but that could be customized by editing afterwards.
Anyways, the really critical stuff here is:
Investigate moving more of the complicated multi-machine workflows to "pipeline".
There will be some challenges:
/cc @sejongoh
Complete the move to the auto-imaged Windows VMs, rather than using the static pool
Today we usually have the following:
[true, false].each { isPR -> ...
This is a little inefficient. PR jobs pollute the rolling/main folder, causing additional data to be loaded all the time when most people only want to look at the rolling data.
Instead, the MetaGenerator could generate a PR/nonPR folder, then set up a generator job for both and call each with the appropriate PR/nonPR job parameter.
Configs wishing to do this would be rewritten, just removing the "[true, false].each { isPR ->". By default, we should not generate the folders for this, only opt-in for now using an additional option in the repo list.
Semantic diff capability should be added on top of Local-Job-Gen. I am thinking:
Builds of coreclr on this machine are failing like this:
http://dotnet-ci.cloudapp.net/job/dotnet_coreclr_linux_release/729/console
Looks like it hasn't done a clean coreclr build in a while:
As can be seen on via Jenkins: http://dotnet-ci.cloudapp.net/job/dotnet_coreclr_freebsd_debug/
Looking into the first failing build, we have this log:
Cloning the remote Git repository
Cloning repository git://github.com/dotnet/coreclr.git
> git init /mnt/tmpdrive/j/workspace/dotnet_coreclr_freebsd_debug # timeout=10
ERROR: Error cloning remote repo 'origin'
ERROR: Error cloning remote repo 'origin'
Failure to clone the repository could hint at the disks being full. But subsequent builds are failing for other reasons. That latest build gets to 16%, then suddenly Jenkins seems to disconnect and call it a day, so obviously something else is going on too.
@mmitche Looking at the build log It seem's you're already on this? :)
There is a plugin for this kind of thing.
It looks like there's a few plug-ins for prioritizing specific jobs (PR job vs rolling job), but there's none for prioritizing individual builds (Bob's build before Sam's).
@Petermarcu states:
Do you know if there is a way to prioritize certain items in the queue? We had a change that had to get into the build tonight blocked by this today and im sure many of the things it was behind weren't business critical. We ended up just merging without CI testing because it had to get into tonight's build.
PR triggers today can apply to a specific branch. We don't easily allow through the Utilities to apply to multiple branches. This should be fixed.
In addition, a change to the ghprb should be made that can allow for the trigger to NOT apply to specific branches.
The other tracked branches for a repo should also be passed into the generators, so that logic like:
'Apply this PR trigger to master and to any other branch that isn't tracked'
Installed on MAC OS.
Executed 'dotnet new'
then 'dotnet restore'
it throws following error
error: Unable to load the service index for source https://api.nuget.org/v3/index.json.
error: The type initializer for 'Crypto' threw an exception.
error: Unable to load DLL 'System.Security.Cryptography.Native': The specified module could not be found.
error: (Exception from HRESULT: 0x8007007E)```
We need an easy way to reboot machines if they get wedged or whatnot. This should be done so that even if the machine is not in the Azure pool (or if it's Linux, etc.) it can be rebooted. And you shouldn't have to go to the Azure portal.
It may make more sense to move to a model where we have scripts that take a clean machine and install all the required software on the CI machines instead of the model where we do brain surgery on a VM and then capture a new image.
This would make the stuff installed on a machine explicit and we could version it in this repo.
When removing a branch or repo, the folder should be disabled. When adding one, the generator should be run.
It would be nice to have SSL support on the CI site (i.e. https://dotnet-ci.cloudapp.net
). I'm working on a script that downloads the latest build and would like to know I'm getting the real build from the right place :).
Ported from dotnet/roslyn#6337
We have a few PRs on dotnet/roslyn
that have been around long enough that the Jenkins results have been dropped. Is the dropping of build/test results intentional? Is it based on elapsed time or some fixed number of results per queue? Ideally we'd keep results around forever, right? Are we running into storage issues?
Windows on ARM builds should be included as they work well when compiled in my machine. Why aren't they tested by the bot?
Right now we depend on the master branch having all of the definitions for the jobs for all target branches. Ideally we want:
I think the best way to implement this is to change the repo list to include the branch name. The subfolder structure would be modified to the following:
/dotnet_coreclr/
PR jobs for specific branches would go ijn their specific folders, whitelisted for that target branch
There are some challenges:
Currently, machines across the dotnet-ci environment can be a few minutes ahead or behind others. As we start implementing cross-machine tests with security (e.g., SSL / Kerberos), significant time skews may cause issues with these tests.
Consider running an NTP client to get time to be consistent across machines - i.e., no more than a few seconds of drift either way.
Today we rerun the generators whenever a repo or dotnet-ci changes. This should be altered so that it only reruns when netci/dotnet-ci changes. The ideal way to do this is actually to have the SCM poll, say every 15 minutes or so and set up the polling specifically to ignore certain paths for the generators. I think the way things are done now may contribute to memory leaks over time.
The challenge here is that there is a "polling ignores commits to X" option for git, but it is not directly configurable, so a configure block is needed. This is entirely doable though.
Right now we have to have individual jobs for prtest and official, to keep the badges (and general job views) clean. We would also have to do the same for private jobs (if available) since they couldn't be shared with PR tests. This is clunky and involves a lot of duplicated job logic. Would be awesome to have a Jenkins plugin that that allows some kind of job type.
Move the debian VMs to auto-image
The error reporting UI causes problems (as it blocks a process from ending). Disable on Windows.
I just installed the latest beta (1.0.0-beta-001598) on my Mac running OS X 10.11.3.
When I try to run the dotnet utility using the following command:
dotnet --version
I run into the following nasty error.
It seems as if it cannot find libc somehow.
Unhandled Exception: System.TypeInitializationException: The type initializer for 'Microsoft.Extensions.PlatformAbstractions.PlatformServices' threw an exception. ---> System.PlatformNotSupportedException: Error reading Darwin Kernel Version ---> System.DllNotFoundException: Unable to load DLL 'libc': The specified module could not be found.
(Exception from HRESULT: 0x8007007E)
at Microsoft.Extensions.PlatformAbstractions.Native.NativeMethods.Darwin.sysctl(Int32* name, UInt32 namelen, Byte* oldp, UInt32* oldlenp, IntPtr newp, UInt32 newlen)
at Microsoft.Extensions.PlatformAbstractions.Native.NativeMethods.Darwin.GetKernelRelease()
--- End of inner exception stack trace ---
at Microsoft.Extensions.PlatformAbstractions.Native.NativeMethods.Darwin.GetKernelRelease()
at Microsoft.Extensions.PlatformAbstractions.Native.PlatformApis.GetDarwinVersion()
at Microsoft.Extensions.PlatformAbstractions.Native.PlatformApis.GetOSVersion()
at Microsoft.Extensions.PlatformAbstractions.DefaultRuntimeEnvironment..ctor()
at Microsoft.Extensions.PlatformAbstractions.DefaultPlatformServices..ctor()
at Microsoft.Extensions.PlatformAbstractions.PlatformServices..cctor()
--- End of inner exception stack trace ---
at Microsoft.DotNet.Cli.Program.PrintVersionInfo()
at Microsoft.DotNet.Cli.Program.ProcessArgs(String[] args)
at Microsoft.DotNet.Cli.Program.Main(String[] args)
Currently, we prevent a build from happening when the only file changed is netci.groovy
.
We should also leverage this functionality to prevent CI builds when the only files changed are *.md files.
In order to make debuggertests run on OSX. We needs the developer mode enabled on OSX. If it is doable, then please make it on:) @mmitche
To avoid disk space issues, we wipe the workspace before and after runs. This is fine, except when someone needs to investigate something. In that case, the workspace is gone before there is a chance to look at it.
Although I do not know whether it's easy to achieve, it would be nice to have the workspace not be wiped if the machine is offline at the end of the build. This would allow for easy investigation.
In order to facilitate machine management, the RDP and SSH ports for dotnet-ci nodes should be predictable.
Currently pipeline jobs aren't based on Job, and so don't have a disabled property
The CI should allow someone to submit a private job for their repo/branch (not PR test)
Currently, there are some issues with the auto-image setup script on windows.
We need to have desktop access and the VM needs to connect to Jenkins (this is vs. SSH on Linux, which is simpler). To do this, today the setup script installs a startup task (on log-in) as dotnet-bot, then sets up auto-login for dotnet-bot. It then restarts the machine. The issue here is that the script is running as the system user, and there seems to be some level of non-determinism in how long it takes for the full dotnet-bot user to appear. There was a wait installed of about 5 minutes which helps, but this is inefficient.
The suggested workflow is to add the auto login into first, then restart.
/cc @tannergooding
Auto imaged VMs do not have predictable ports. To access them it is necessary to find the endpoint in Azure. Lots of users do not have the Azure subscription set up on their machines. We should have a Jenkins job or web service that talks to a machine with the subscription to get the connection info.
In the buildtools repo, the following error message causes a failure to clean the project workspace leading to a failed build.
3:06:41 Using context: Innerloop Windows Debug
13:06:42 Building remotely on Azure0328100701 (auto-win2012-20160325) in workspace D:\j\workspace\innerloop_prtest4ca7949f
13:06:42 [WS-CLEANUP] Deleting project workspace...
13:06:48
ERROR: [WS-CLEANUP] Cannot delete workspace: remote file operation failed: D:\j\workspace\innerloop_prtest4ca7949f at hudson.remoting.Channel@2aaf2a22:Azure0328100701: hudson.remoting.ChannelClosedException: channel is already closed
13:06:48 ERROR: Cannot delete workspace: remote file operation failed: D:\j\workspace\innerloop_prtest4ca7949f at hudson.remoting.Channel@2aaf2a22:Azure0328100701: hudson.remoting.ChannelClosedException: channel is already closed
13:06:48 [BFA] Scanning build for known causes...
13:06:48 [BFA] Scanning build for known causes...
13:06:48 [BFA] Found failure cause(s):
13:06:48 [BFA] Hung processs on target machine from category Infrastructure
13:06:48 [BFA] Done. 0s
13:06:48 Setting status of 38525666293756dc6fca04171a8125707c1414bc to FAILURE with url http://dotnet-ci.cloudapp.net/job/dotnet_buildtools/job/master/job/innerloop_prtest/264/ and message: 'Build finished. No test results found.'
/cc: @mmitche
With dotnet/coreclr#1292 and dotnet/coreclr#1210 merged, please enable CI build jobs for ARM and AArch64 architectures and add the respective badges in coreclr repo's README (under Build Status), so others can follow the support status.
Local-Job-Gen.ps1 cannot compile coreclr/netcli.groovy with the following error.
[JarClassLoader] INFO: findResource(): unable to locate "jobs/generation/UtilitiesCustomizer.groovy"
[JarClassLoader] INFO: findResource(): unable to locate "jobs/generation/UtilitiesCustomizer.groovy"
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at com.simontuffs.onejar.Boot.run(Boot.java:313)
at com.simontuffs.onejar.Boot.main(Boot.java:161)
Caused by: javaposse.jobdsl.dsl.DslScriptException: (combinednetci.groovy, line 421) No signature of method: javaposse.jobdsl.dsl.help
rs.triggers.TriggerContext.githubPullRequest() is applicable for argument types: (jobs.generation.Utilities$_addGithubPRTriggerImpl_cl
sure8_closure29_closure30) values: [jobs.generation.Utilities$_addGithubPRTriggerImpl_closure8_closure29_closure30@402e37bc]
at javaposse.jobdsl.dsl.DslScriptLoader.runDslEngineForParent(DslScriptLoader.java:79)
at javaposse.jobdsl.dsl.DslScriptLoader.runDslEngine(DslScriptLoader.java:135)
at javaposse.jobdsl.dsl.DslScriptLoader$runDslEngine.call(Unknown Source)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:42)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:108)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:120)
at javaposse.jobdsl.Run$_main_closure2.doCall(Run.groovy:36)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
def private static addGithubPRTriggerImpl(def job, String branchName, String contextString,
String triggerPhraseString, boolean triggerOnPhraseOnly, boolean permitAllSubmittters, Iterable<String> permittedOrgs, Iterable<String> permittedUsers) {
job.with {
triggers {
githubPullRequest { <- line 421
useGitHubHooks()
if (permitAllSubmittters) {
admin('Microsoft')
}
admin('mmitche')
if (permitAllSubmittters) {
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.