Coder Social home page Coder Social logo

Comments (8)

thii avatar thii commented on June 30, 2024 2

@thanosexcite Oh right 👍
Would you mind submitting a PR to update it?

from aws-codebuild-extras.

rarylson avatar rarylson commented on June 30, 2024

I don't think this is the right place to document this (maybe I should open a new "Issue"). I did some tests with CodeBuild, and also I have some opinions about the way aws-codebuild-extras calculates the CODEBUILD_GIT_BRANCH var.

How CodeBuild works

How CodeBuild gets the source:

  • CodeBuild has several sources (to pull codes). Common ones are GitHub, CodeCommit and CodePipeline
  • When working with GitHub and CodeCommit, CodeBuild is in charge for doing the git clone
  • When working with CodePipeline, there are 2 options:
    • (1) CODE_ZIP: It gets the artifact generated by CodePipeline in a previous step from S3. This artifact is a Gzip file that doesn't have the .git folder. So CodeBuild will not be able to run git commands.
    • (2) CODEBUILD_CLONE_REF: The CodePipeline artifact now will not contain the code, but an URL for a git clone by CodeBuild instead. There will still be a Gzip artifact on S3, the difference is which information the artifact will contain (the clone URL instead of the code). So CodeBuild gets the artifact, reads the URL, and uses the URL to clone the repo.
      • git-credential-helper is necessary for this mode.
  • When working with CodePipeline, CodeBuild can also receive information via env vars
  • When working via external repos, like GitHub, it's possible to use webhook integrations

How CodeBuild builds the source:

  • CodeBuild has 2 models of execution: single builds, and batch build
  • For single builds, it just gets the source (depending on the source type - it may be a git repo, or only the code - for CodePipeline CODE_ZIP)
  • For batch builds, there will be 2 phases: (1) a batch job will pull the code, read the batch definition, understand all of the builds that needs to be done and resolve their configuration, and start them; (2) the builds execute using the configurations discovered before.

Also, when starting an build in CodeBuild, there are 3 possibilities for choosing the reference (via build overrides):

  • An specific branch: refs/heads/mydevbranch
  • An specific tag: refs/tags/vx.x.x
  • An specific commit: {full-commit-SHA}
  • Reference and a commit ID: like refs/heads/mydevbranch^{full-commit-SHA} (the build is related to branch mydevbranch, but not necessarily full-commit-SHA is the HEAD of this branch).

They are valid for CodeCommit, but other sources like GitHub allows more:

  • A pull request

I only made tests with CodeBuild directly pulling CodeCommit as a source. In my tests, I tested both single and batch builds. That's what I discovered...

Single build - clone depth of 1 - specifying a branch

I had the following env vars:

CODEBUILD_SOURCE_VERSION=refs/heads/develop
CODEBUILD_BUILD_ID=update-conf-py-DO-NOT-USE-tests:MY_BUILD_ID_HERE
CODEBUILD_RESOLVED_SOURCE_VERSION=30edcfc3f94d81416e12602ba6760df4a5098f24
CODEBUILD_BUILD_NUMBER=MY_BUILD_NUMBER_HERE
CODEBUILD_INITIATOR=MY_SESSION_ROLE_HERE

I also had the following output for some useful git commands:

[Container] [...] Running command git symbolic-ref HEAD --short || true
fatal: ref HEAD is not a symbolic ref

[Container] [...] Running command git branch -a --contains HEAD
* (no branch)
  develop
  remotes/origin/develop

For this case, I could use CODEBUILD_SOURCE_VERSION to get my branch name. For instance, my goal was to pass the branch name to coveralls:

  post_build:
    commands:
      - CI_NAME=codebuild CI_BRANCH=${CODEBUILD_SOURCE_VERSION##*/} coveralls

Also observe that git symbolic-ref HEAD doesn't help, as the cloned repo is in a detached HEAD state. I don't like this approach, but it's how CodeBuild works.

Although we may use git branch -a --contains HEAD, I don't think this a reliable idea, as this commit may belong to multiple branches, and may not be the current HEAD of one of these branches (it may be an older commit).

Batch build - clone depth of 5 - specifying a branch

CODEBUILD_SOURCE_VERSION=ee75ab30943acd1f4edaa3300b2b203b2a56f9e2
CODEBUILD_BUILD_ID=update-conf-py-DO-NOT-USE-tests:MY_BUILD_ID_HERE
CODEBUILD_BUILD_BATCH_TRIGGERED=true
CODEBUILD_RESOLVED_SOURCE_VERSION=ee75ab30943acd1f4edaa3300b2b203b2a56f9e2
CODEBUILD_BUILD_NUMBER=MY_BUILD_NUMBER_HERE
CODEBUILD_BUILD_BATCH_NUMBER=MY_BATCH_NUMBER_HERE
CODEBUILD_INITIATOR=BASICALLY_THE_BATCH_ID_THAT_INITIATED_THIS_BUILD
CODEBUILD_BATCH_BUILD_IDENTIFIER=MY_IDETIFIER

And:

[Container] [...] Running command git symbolic-ref HEAD --short || true
fatal: ref HEAD is not a symbolic ref

[Container] [...] Running command git branch -a --contains HEAD
* (no branch)
  develop
  remotes/origin/develop

The results are similar, but unfortunately, I lost the source reference information. It seems that the first batch job (which pulls the code) sends the CODEBUILD_RESOLVED_SOURCE_VERSION as CODEBUILD_SOURCE_VERSION for all children batches.

We can confirm this here:

Screen Shot 2022-11-06 at 13 51 51

We can see that the parent build knows the branch, but all the children only knows the commit ID.

So it's pretty hard to discover the branch reference on the children builds (I'll explain later why I disagree from git branch -a --contains HEAD).

I tried to, as a workaround, evaluate CODEBUILD_SOURCE_VERSION in the parent batch and pass it as a variable to the child batch. But I didn't had success. This doesn't work:

env:
  variables:
    BATCH_CODEBUILD_SOURCE_VERSION: ${CODEBUILD_SOURCE_VERSION}

And this doesn't work either:

batch:
  build-list:
    - identifier: python3_10
      env:
        variables:
          BATCH_CODEBUILD_SOURCE_VERSION: ${CODEBUILD_SOURCE_VERSION}

My suggestion would be to pick the CODEBUILD_BUILD_ID, and use aws-cli (IAM role needs to be updated) to get the source reference for this specific build.

Other suggestion is to look for variable CODEBUILD_BUILD_BATCH_TRIGGERED or CODEBUILD_BUILD_BATCH_ID and implement a logic to recover CODEBUILD_SOURCE_VERSION in this case. The main issue is that these vars are not documented. CODEBUILD_BATCH_BUILD_IDENTIFIER is documented, but I don't know if it will work for build-matrix.

Single build - clone depth of 1 - specifying a commit that exists into 2 different branches:

Now I run a single build, but specifying a commit instead...

CODEBUILD_SOURCE_VERSION=f3bdd9e5ea0396f5d72f48349b4a3312e427063e
CODEBUILD_BUILD_ID=update-conf-py-DO-NOT-USE-tests:MY_BUILD_ID_HERE
CODEBUILD_RESOLVED_SOURCE_VERSION=f3bdd9e5ea0396f5d72f48349b4a3312e427063e
CODEBUILD_BUILD_NUMBER=MY_BUILD_NUMBER_HERE
CODEBUILD_INITIATOR=MY_SESSION_ROLE_HERE

And:

[Container] [...] Running command git symbolic-ref HEAD --short || true
fatal: ref HEAD is not a symbolic ref

[Container] [...] Running command git branch -a --contains HEAD
* (no branch)
  develop
  remotes/origin/develop

Now, we can see that we lost all reference information. CODEBUILD_SOURCE_VERSION doesn't help us. And git branch -a --contains HEAD is not reliable at all, because this commit in my case belongs to both the develop and the master branches. And for some reason, just the develop branches appeared. My guess is that my default branch is develop, and this commit is the HEAD for the develop branch (but wasn't the HEAD for the master, since there is one additional commit for the merge request) so I guess it somehow interfered. But anyway, the result wasn't reliable.

For this case, I think it doesn't make any sense to say that this build is related to a given branch, because it wasn't.

from aws-codebuild-extras.

rarylson avatar rarylson commented on June 30, 2024

Opinion about the way aws-codebuild-extras calculates CODEBUILD_GIT_BRANCH

Here it's all about what CODEBUILD_GIT_BRANCH really means. Imagine the cases:

(1) I asked to build a given branch (e.g. refs/heads/mytest)

I would expect CODEBUILD_GIT_BRANCH to return mytest. But the way the variable is calculated nowadays, if the HEAD for this branch also belongs to another branch (e.g. develop), and this branch is the default branch, there is a risk that CODEBUILD_GIT_BRANCH returns develop instead.

(2) I asked to build the commit 12345678 without specifying a branch ref

I would expect CODEBUILD_GIT_BRANCH to return nothing (be empty), since it was a build without specifying a branch. If this commit is not the HEAD for none of the branches, but belongs to multiple branches, CODEBUILD_GIT_BRANCH will return one of the many branches the commit belongs to.

(3) I asked to build a tag (e.g. refs/tags/v1.0.0)

One may argue that CODEBUILD_GIT_BRANCH should return empty and CODEBUILD_GIT_TAG return v1.0.0. However, in my case, my goal is to put the correct reference in coveralls. So I should implement a logic to set CI_BRANCH properly. In order to have the same behavior I have when using GitHub Actions, I should run a logic like:

if [ $CODEBUILD_GIT_BRANCH != "" ]; then
    CI_BRANCH=$CODEBUILD_GIT_BRANCH
elif [ $CODEBUILD_GIT_TAG != "" ]; then
    CI_BRANCH=$CODEBUILD_GIT_TAG
fi  

I didn't though about the expected results for multiple other test cases, specially when running CodeBuild from CodePipeline. I need to reflect more about all of them.

However, for now, I don't think the current behavior of CODEBUILD_GIT_BRANCH and CODEBUILD_GIT_TAG are good.

Of course, they work for the straightforward use case, which is when we're running a CI/CD at branch master just after a merge/pull, and when the master is the default branch.

For the other non-trivial cases, the current behavior sounds strange for me.

from aws-codebuild-extras.

rarylson avatar rarylson commented on June 30, 2024

By the way, in case you are not familiar with, I just discovered that CodeBuild allows the use of Session Manager to access the container environment: https://docs.aws.amazon.com/codebuild/latest/userguide/session-manager.html

After some tests, it seems that it doesn't support batch builds:

[Container] 2022/11/06 17:11:23 Running command codebuild-breakpoint
2022/11/06 17:11:23 Build is paused temporarily and you can use codebuild-resume command in the session to resume this build

[Container] 2022/11/06 17:11:23 Running command python -m pip install coveralls
Collecting coveralls
  Downloading coveralls-3.3.1-py2.py3-none-any.whl (14 kB)

The codebuild-breakpoint doesn't pause the build. Note that the timestamps is the same for both commands: 2022/11/06 17:11:23.

This is not documented as well. It seems to me that the "batch build" feature has some bugs or non-documented behavior.

from aws-codebuild-extras.

rarylson avatar rarylson commented on June 30, 2024

I did some reverse engineering using the Session Manager integration, but now using single builds (I still want to test again batch builds w/ session manager, but trying to fix my IAM Role permissions - this may be the issue for batch builds)...

My case was: I used refs/heads/develop^{4588c877c22af8808969b1daa38e056467c3962a} as the source. This commit was NOT the HEAD for the dev branch, and also belonged to other branches. My clone depth was equal to 1.

Single build - clone depth of 1 - specifying a non-HEAD commit that exists into 2 different branches:

$ git status
Not currently on any branch.
nothing to commit, working tree clean

$ ls
CHANGELOG.md  CONTRIBUTING.md  LICENSE  MANIFEST.in  Makefile  README.md  extras  requirements-dev.txt  requirements-test.txt  samples  setup.cfg  setup.py  tests  update_conf_py_do_not_use

# All my files are here :)

$ git log
commit 4588c877c22af8808969b1daa38e056467c3962a (HEAD)
[...]
commit 550d821ea5d234ff3a0a51fa04caebe9a5104b0a
[...]
commit 154d3f97ed850a73c2e5efa863f8f1bb0d3d65cf
[..]
Date:   Sat Jul 27 10:58:48 2013 -0300

    First commit. First version implemented.

# It was a full clone of branch develop but starting at commit `4588c877`. The commits made after this one were not pulled.

$ echo $CODEBUILD_SOURCE_VERSION
refs/heads/develop^{4588c877c22af8808969b1daa38e056467c3962a}

# CODEBUILD_SOURCE_VERSION contains the branch info, but also that not necessarily it is a build for the HEAD of the branch.

$ git branch
* (no branch)
  develop

$ git show-ref
ce9718cee5385cd74c52652bd10f7663047c0e74 refs/heads/develop
ce9718cee5385cd74c52652bd10f7663047c0e74 refs/remotes/origin/develop

# git show-ref --head
4588c877c22af8808969b1daa38e056467c3962a HEAD
ce9718cee5385cd74c52652bd10f7663047c0e74 refs/heads/develop
ce9718cee5385cd74c52652bd10f7663047c0e74 refs/remotes/origin/develop

$ echo $CODEBUILD_RESOLVED_SOURCE_VERSION
4588c877c22af8808969b1daa38e056467c3962a

$ git show-ref --heads --tags
ce9718cee5385cd74c52652bd10f7663047c0e74 refs/heads/develop

$ git show-ref $CODEBUILD_RESOLVED_SOURCE_VERSION
# Empty. No results.

$ git symbolic-ref HEAD
fatal: ref HEAD is not a symbolic ref

$ git tag -l
# Empty. No results.

# git tag -l --contains HEAD
# Empty. No results.

# git show-ref --heads --tags $CODEBUILD_RESOLVED_SOURCE_VERSION
# Empty. No results.

Some notes:

  • We could parse CODEBUILD_SOURCE_VERSION to know the branch name. But I don't know if we would expect that CODEBUILD_GIT_BRANCH should be updated to develop or should still be empty. Because, the build didn't run against the HEAD of the branch, but against an specific commit instead.
  • Although I set git clone depth to 1, it was a partial clone of the branch. I did another tests (I'll share later), and both tests show that CodeBuild only respects the git clone depth when I pass a branch, a single commit or a tag. When I use both the branch and the commit (like in refs/heads/develop^{4588c877c22af8808969b1daa38e056467c3962a}) for some reason it needs to make a partial clone (from the first commit until the specified commit).
  • We don't have tag information. I know that lot's of commits in this branch actually are tagged. But CodeBuild doesn't get this information (I still need to test in a build with a tag source refs/tags/TAG to compare, but when the source is not a tag, the tag information is not pulled).
  • Although the commit belongs to multiple branches, because the partial clone was performed against a single branch (develop), so there is not risk to get inconsistent results pointing the the other branches.
  • For the batch case, we know that CODEBUILD_SOURCE_VERSION doesn't keep the original value. It's not possible to recover that information. However, I workaround may be checking if it's a batch, and if it's, get the references (git show-ref --heads --tags) and compare them against CODEBUILD_RESOLVED_SOURCE_VERSION.

I also have a question:

  • On GitHub actions integration w/ Coveralls, what is the behavior when we start a workflow that points to the non-current commit for a branch (like this example here). What is the value for GITHUB_REF and the BRANCH value used by coveralls?
    • Update: https://github.com/coverallsapp/github-action/blob/master/src/run.ts#L28
    • Coveralls just use GITHUB_REF as branch. Now I need to understand if GITHUB_REF would be refs/heads/develop, develop, refs/heads/develop^{4588c877c22af8808969b1daa38e056467c3962a} or develop^{4588c877c22af8808969b1daa38e056467c3962a} for this case. My guess: refs/heads/develop^{4588c877c22af8808969b1daa38e056467c3962a} - and coveralls knows how to interpret the value.

Single build - clone depth of 1 - specifying a HEAD commit for a branch:

$ git status
Not currently on any branch.
nothing to commit, working tree clean

$ git log
commit ce9718cee5385cd74c52652bd10f7663047c0e74 (grafted, HEAD, origin/develop, develop)
[...]
Date:   Sun Nov 6 14:42:45 2022 -0300

    Test 2

# Only a single commit -> Clone depth was actually 1 this time

$ echo $CODEBUILD_SOURCE_VERSION
refs/heads/develop

$ git branch
* (no branch)
  develop

$ echo $CODEBUILD_RESOLVED_SOURCE_VERSION
ce9718cee5385cd74c52652bd10f7663047c0e74

$ git tag -l
# Emtpy

$ git show-ref --heads --tags
ce9718cee5385cd74c52652bd10f7663047c0e74 refs/heads/develop

$ git show-ref --heads --tags --head
ce9718cee5385cd74c52652bd10f7663047c0e74 HEAD
ce9718cee5385cd74c52652bd10f7663047c0e74 refs/heads/develop

$ git show-ref $CODEBUILD_RESOLVED_SOURCE_VERSION
# Empty

$ git symbolic-ref HEAD
fatal: ref HEAD is not a symbolic ref
  • Now we see the clone depth of 1 being respected
  • We see that git show-ref --heads --tags and git show-ref --head are the same commit, which is the same as $CODEBUILD_RESOLVED_SOURCE_VERSION. Maybe this may help us to get branch information when running inside a batch build.

Single build - clone depth of 1 - specifying a TAG:

$ git status
Not currently on any branch.
nothing to commit, working tree clean

$ git log
commit aa0fe5c067e8b0d29721c48551dd237146bd82df (grafted, HEAD, tag: v1.1.0.dev2, origin/master)
[...]
Date:   Sun Nov 6 16:07:40 2022 -0300

    Bump version

$ echo $CODEBUILD_SOURCE_VERSION
refs/tags/v1.1.0.dev2

$ git branch
* (no branch)
  develop

$ echo $CODEBUILD_RESOLVED_SOURCE_VERSION
aa0fe5c067e8b0d29721c48551dd237146bd82df

$ git tag -l
v1.1.0.dev2

$ git show-ref --heads --tags
ce9718cee5385cd74c52652bd10f7663047c0e74 refs/heads/develop
380ae5cf950d927ce5381cf3aa16d59829b39282 refs/tags/v1.1.0.dev2

$ git show-ref --heads --tags --head
aa0fe5c067e8b0d29721c48551dd237146bd82df HEAD
ce9718cee5385cd74c52652bd10f7663047c0e74 refs/heads/develop
380ae5cf950d927ce5381cf3aa16d59829b39282 refs/tags/v1.1.0.dev2
  • Here, we have both info about branch and tag. But this build wasn't against a branch.
  • This tag belongs to 2 branches. But only the default branch (develop in my case) appears in git branch and git show-ref. So if we rely on branch information, we could get the wrong branch (e.g., this tag is also in my master branch, and maybe I would prefer to have CODEBUILD_GIT_BRANCH=master. So we have an inconsistent behavior when setting the branch name.
  • Suppose now we're inside a batch build (where CODEBUILD_SOURCE_VERSION doesn't have the source reference anymore). For a batch build, if we check that it's a batch build, and that we have tag information, and HEAD and tag reference is the same commit, then we could safely recover the tag info and know that it was a build against a tag.
    • When GitHub actions interact with Coveralls (the issue that I'm currently trying to solve), Coveralls seems to receive refs/tags/v1.1.0.dev2 as the branch name, and it's okay for Coveralls.

Quick insight:

CODEBUILD_BUILD_ARN=arn:aws:codebuild:AWS_REGION:AWS_ACCOUNT:build/BUILD_PROJECT:BUILD_UUID
  • Instead of running aws sts get-caller-identity like the project does currently, there are lots of env vars already provided by CodeBuild that could be used instead to calculate the same stuff.

I'll try to create a GIT gist that recalculates CODEBUILD_SOURCE_VERSION when it's inside a batch build. For single batchs, using CODEBUILD_SOURCE_VERSION as is seems a good approach.

There is a good example here: https://community.opengroup.org/osdu/platform/system/notification/-/blob/fix-error-code/provider/notification-aws/build-aws/buildspec.yaml

For builds that come from CodePipeline, yeeeeh, it's complicated. A need to test a lot to understand. Need to check CODEBUILD_SOURCE_VERSION (checking the docs, seems to be the just the S3 version of the artifact, that doesn't help at all). But I guess it's possible to pass the source reference as a env var (like CODEPIPELINE_SOURCE_VERSION).

And I guess if user uses CODE_ZIP (instead of CODEBUILD_CLONE_REF), it's impossible to get branch/tag info. Passing the source reference as a env var may be the best approach.

But even using CODEBUILD_CLONE_REF, there should be a logic to recover the initial source reference, and this is probably because so many people uses this repo here.

from aws-codebuild-extras.

rarylson avatar rarylson commented on June 30, 2024

Update - Batch build - clone depth of 1 - specifying a branch - Commit belongs to 2 branches, and has an associated tag

[Container] 2022/11/06 21:09:22 Running command if [ ! -z "$CODEBUILD_BATCH_BUILD_IDENTIFIER" ]; then echo yes; fi
yes

[Container] 2022/11/06 21:09:22 Running command if [ ! -z "$CODEBUILD_BUILD_BATCH_NUMBER" ]; then echo yes; fi
yes

[Container] 2022/11/06 21:09:22 Running command git show-ref --tags
d13e7fb86fe9d1cffe56784f6205c9e762ab3e92 refs/tags/testtag
380ae5cf950d927ce5381cf3aa16d59829b39282 refs/tags/v1.1.0.dev2

[Container] 2022/11/06 21:09:22 Running command git show-ref --heads
81d70eb13860c52bb38c0b542150b7bde54ce461 refs/heads/develop
81d70eb13860c52bb38c0b542150b7bde54ce461 refs/heads/master

[Container] 2022/11/06 21:09:22 Running command echo $CODEBUILD_SOURCE_VERSION
81d70eb13860c52bb38c0b542150b7bde54ce461

[Container] 2022/11/06 21:16:30 Running command git branch
* (no branch)
  develop
  master

This is bad news.

Although in previous tests, the reference to 2 branches didn't occur, now it did. So, it is not possible to discover the original reference (which branch is correct). The workaround I created minutes ago is not working because of this.

Maybe the best approach is really to use aws cli and the batch job ID (non-documented env var, but it exists) to get the source reference. Otherwise, it's not possible to safely correctly match the build to the source reference used.

The same will occur for tags. If there are 2 tags pointing to the same commit, I use one of the tags (let's say, tag1) to start the build, and use a batch build, we'll have 2 tags for the same commit, and not be able to recover the reference.


Using AWS CLI to get the source reference seems to be the best idea. Something like:

$ aws codebuild batch-get-build-batches --ids `BATCH_ID` --query "buildBatches[0].sourceVersion" --output text

The issue is that no env var has the BATCH_ID (only the batch number).

So we can do:

BATCH_ID=$(aws codebuild batch-get-builds --ids $CODEBUILD_BUILD_ID --query "builds[0].buildBatchArn" --output text)
aws codebuild batch-get-build-batches --ids $BATCH_ID --query "buildBatches[0].sourceVersion" --output text

I'll test and share the results later.

--

It worked :)

Code follows:

  post_build:
    commands:
      # Workaround for `CODEBUILD_SOURCE_VERSION` in batch builds
      # See: https://github.com/thii/aws-codebuild-extras/issues/3
      - |
        if [ ! -z "$CODEBUILD_BATCH_BUILD_IDENTIFIER" ]; then
          build_id=$(aws codebuild batch-get-builds --ids $CODEBUILD_BUILD_ID \
                  --query "builds[0].buildBatchArn" --output text)
          source_version=$(aws codebuild batch-get-build-batches \
                  --ids $build_id --query "buildBatches[0].sourceVersion" \
                  --output text)
          export CODEBUILD_SOURCE_VERSION=$source_version
        fi
      # Set correct env vars expected by coveralls
      # See: https://docs.coveralls.io/supported-ci-services
      - CI_NAME=codebuild CI_BRANCH=${CODEBUILD_SOURCE_VERSION##*/} coveralls

My personal opinion: the env variables Github Actions provide is much better: GITHUB_REF, GITHUB_REF_NAME and GITHUB_REF_TYPE.

We could evolve this project to calculate similar variables...

from aws-codebuild-extras.

rarylson avatar rarylson commented on June 30, 2024

Update - Single build triggered by Code Pipeline- CODEBUILD_CLONE_REF - from branch develop

Now I also tested how CodeBuild behaves for builds started from CodePipeline. I'm triggering the Pipeline for pushes on the develop branch to make easier to test. I'm starting w/ mode CODEBUILD_CLONE_REF.

The first discovery was that it's not possible to set "Enable session connection" from CodePipeline, so I couldn't use the integration w/ session manager (which makes the debug much easier). So I had to put the debug commands directly on the buildspec (like when I tested batch builds).

This is the source artifact (passed to CodeBuild by CodePipeline via S3):

{
  "AccountId": "AWS_ACCOUNT",
  "FullRepositoryName":"MY_REPO_HERE",
  "CloneUrl":"https://git-codecommit.us-east-1.amazonaws.com/v1/repos/MY_REPO_HERE",
  "CommitId":"MY_COMMIT_SHA_HASH",
  "BranchName":"develop"
}

So instead of be the source code zipped, it's a JSON w/ info to allow CodeBuild to directly clone the repo (CODEBUILD_CLONE_REF).

And this is the output of the debug commands on the build:

[Container] 2022/11/12 23:36:57 Running command env
CODEBUILD_SOURCE_VERSION=arn:aws:s3:::codepipeline-us-east-1-XXXXXXXXX/MY_REPO/SourceArti/MY_ARTIFACT_NAME
CODEBUILD_BUILD_ID=update-conf-py-DO-NOT-USE-tests:MY_BUILD_ID_HERE
CODEBUILD_RESOLVED_SOURCE_VERSION=f14549a89e9ede8c070cd0046291d3a9974ba7c6
CODEBUILD_BUILD_NUMBER=MY_BUILD_NUMBER
CODEBUILD_INITIATOR=codepipeline/MY_PIPELINE_NAME
PWD=/codebuild/output/src689052430/src/git-codecommit.us-east-1.amazonaws.com/v1/repos/MY_ARTIFACT_NAME
[...]
SOURCEVARIABLES_COMMIT_ID=f14549a89e9ede8c070cd0046291d3a9974ba7c6
SOURCEVARIABLES_BRANCH_NAME=develop
SOURCEVARIABLES_REPOSITORY_NAME=MY_REPO
CODEPIPELINE_PIPELINE_EXECUTION_ID=PIPELINE_EXECUTION_UUID

[Container] 2022/11/12 23:36:57 Running command git show-ref --tags --heads --head
f14549a89e9ede8c070cd0046291d3a9974ba7c6 HEAD
f14549a89e9ede8c070cd0046291d3a9974ba7c6 refs/heads/develop
25c60c866e39ac6552ae9f1372575e60a4923559 refs/heads/master
380ae5cf950d927ce5381cf3aa16d59829b39282 refs/tags/v1.1.0.dev2
3bc611cde1ad220f0fad546a75b28f782a95ef56 refs/tags/v1.1.0.dev3
59c02de312a10e2f3dc77ce25565a3141c315ca5 refs/tags/v1.1.0.dev6

[Container] 2022/11/12 23:36:57 Running command git branch
* (no branch)
  develop
  master

[Container] 2022/11/12 23:36:57 Running command git log
commit f14549a89e9ede8c070cd0046291d3a9974ba7c6
Author: Rarylson Freitas <XXXXX@XXXXXXX>
Date:   Fri Nov 11 21:46:12 2022 -0300

    Bump version to test CodePipeline

It's similar to the build batch, as we lost the source reference here (CODEBUILD_SOURCE_VERSION is not more the initial source reference here).

CODEBUILD_SOURCE_VERSION now points to the artifact (in S3) generated by CodePipeline (but this is not a strong hint, as we could have started the build manually pointing to this S3 object). The only string reference that it was initiaced by CodePipeline was CODEBUILD_INITIATOR, which starts with codepipeline/ (it's the name of the pipeline).

Just like when we run CodeBuild Batch Builds against a non-head commit, now we have plenty of references (e.g. refs/heads/develop, refs/heads/master, v1.1.0.dev6). In this case, I was lucky as the only reference that my commit belongs to is refs/heads/develop. But remember my past build batch example. If this commit was also the head for master, so it wasn't possible to know if the source reference that started the whole pipeline was the develop or the master branch.

You can also see other variables (e.g. CODEPIPELINE_PIPELINE_EXECUTION_ID and SOURCEVARIABLES_BRANCH_NAME). These variables were manually added by me. There is a feature in CodePipeline that allows us to pass variables from a past stage. In my case:

CODEPIPELINE_PIPELINE_EXECUTION_ID=#{codepipeline.PipelineExecutionId}
SOURCEVARIABLES_BRANCH_NAME=#{SourceVariables.BranchName}
[and so on ...]

So the only real solution I found to recover the source reference in a secure way (without the risk of any side case) was to:

  • Pass #{codepipeline.PipelineExecutionId} or #{codepipeline.PipelineExecutionId} as a variable
  • Use this variable to get the reference
    • Option 1: Similar to the solution I did for Batch Builds. Use #{codepipeline.PipelineExecutionId} and AWS CLI to get the source reference of the first phase of the pipeline. -> There are side cases where this logic may brake.
    • Option 2: Use #{SourceVariables.BranchName}. Your reference will be refs/heads/{BranchName}. I think option 2 is the best here. In my case, where I had to pass the branch name to Coveralls, the branch name is everything that I need.

The only drawback is that it's a design oriented by convention (the user should follow a convention of always exporting the variable SOURCEVARIABLES_BRANCH_NAME).

In my case, one workaround could be:

  post_build:
    commands:
      # Workaround for `CODEBUILD_SOURCE_VERSION` in CodePipeline
      # If using CodePipeline, per convention, `SOURCEVARIABLES_BRANCH_NAME` MUST be properly set
      # See: https://github.com/thii/aws-codebuild-extras/issues/3
      - |
        if [ ${CODEBUILD_INITIATOR%/*} = 'codepipeline' ]; then
          [ ! -z "$SOURCEVARIABLES_BRANCH_NAME" ] || exit 1
          export CODEBUILD_SOURCE_VERSION="/refs/heads/$SOURCEVARIABLES_BRANCH_NAME"
        fi
      # Set correct env vars expected by coveralls
      # See: https://docs.coveralls.io/supported-ci-services
      - CI_NAME=codebuild CI_BRANCH=${CODEBUILD_SOURCE_VERSION##*/} coveralls

Update:

It's not really easy to just use CodeBuild / CodePipeline.

I observed that the code above doesn't work if we use CodePipeline to start a batch build. The reason is because CODEBUILD_INITIATOR will be 'codepipeline/' on the first build job, but the batch parent for the child batches.

So, for this case, I came up with the following workaround:

  post_build:
    commands:
      # Workarounds for `CODEBUILD_SOURCE_VERSION` in CodePipeline or batch builds
      # If using CodePipeline, per convention, `SOURCEVARIABLES_BRANCH_NAME` MUST be properly set.
      # See: https://github.com/thii/aws-codebuild-extras/issues/3
      - |
        if [ ! -z "$SOURCEVARIABLES_BRANCH_NAME" ]; then
          export CODEBUILD_SOURCE_VERSION=/refs/heads/$SOURCEVARIABLES_BRANCH_NAME
        elif [ ! -z "$CODEBUILD_BATCH_BUILD_IDENTIFIER" ]; then
          build_id=$(aws codebuild batch-get-builds --ids $CODEBUILD_BUILD_ID \
                  --query "builds[0].buildBatchArn" --output text)
          source_version=$(aws codebuild batch-get-build-batches \
                  --ids $build_id --query "buildBatches[0].sourceVersion" \
                  --output text)
          export CODEBUILD_SOURCE_VERSION=$source_version
        fi
      # Set correct env vars expected by coveralls
      # See: https://docs.coveralls.io/supported-ci-services
      - CI_NAME=codebuild CI_BRANCH=${CODEBUILD_SOURCE_VERSION##*/} coveralls

Some additional side info:

At least for CodeCommit, it's possible to start the CodePipeline after a tag push (you can customize the EventBridge Event that triggers the pipeline, and it may be a tag push as CodeCommit publishes these events). However, the tag value doesn't matter for CodePipeline, as it was always use the configuration for the source stage, which is always a branch. Also there is no such a thing like #{SourceVariables.TagName} so we could recover the initial source referecen.

In my case, my goals was to build my project every time I pushed a tag of format v9.9.9[.SUFFIX]. But CodePipeline always build from the specified branch and used the HEAD. It'll work as long as I always push the tag to the HEAD of the specified branch (master).

Also, when starting a build via CodePipeline that runs all of my tests, I can't send the tag information in CI_BRANCH to coveralls (as occurs currently in my projects that uses GitHub Actions). Coveralls will receive the branch instead of the tag.

from aws-codebuild-extras.

rarylson avatar rarylson commented on June 30, 2024

Very sorry for the high number of comments on this issue (sorry if I am pollute too much the discussion).

So just to summarize everything (my personal point of view):

  • I think the way aws-codebuild-extras calculates the CODEBUILD_GIT_BRANCH var, and several others, is wrong
    • It may lead to several inconsistent behaviors. There are plenty of examples in my comments above.
    • Also, please avoid using variables starting as CODEBUILD_. It's reversed and may break your builds in future versions. I would prefer to use EXTRA_* or something similar.
  • For other side, I agree that CodePipeline/CodeCommit/CodeBuild together still have some integration limitations.
    • It's just easier and more elegant to implement lot of non-direct use cases via GitHub Actions (for example). Variables like GITHUB_REF, GITHUB_REF_NAME and GITHUB_REF_TYPE are easy to understand and use.
    • I would love if CodeBuild had variables like CODEBUILD_ORIGINAL_REF, CODEBUILD_ORIGINAL_REF_NAME and CODEBUILD_ORIGINAL_REF_TYPE (ORIGINAL because it would be good to have them passed through build batches or CodePipeline integrations).
  • It's kind of hard to understand how CodeBuild behaves for each use case. One of the reasons is because it's very flexible as it integrates w/ CodeCommit, S3, CodePipeline and external providers.
    • But I still think there is lot of room for improvement and to make the service easier to use.
    • I hope my comments above help you to understand how the service behaves for each one of these integration types.
  • Note that CodeBuild has special limitations when using batch builds. This is a very special case that, until the service improves, we need to take extra caution.
  • I really recommend you to try the CodeBuild integration w/ Session Manager. It has some limitations when using together w/ build batches, but I had a very good experience with it.
    • It's really much easier to debug using it.
  • I created a project with the goal of testing the whole AWS CodeSuite (including CodeBuild, CodePipeline, CodeCommit and CodeArtifact). It may be useful if you want to dive deep and understand better the service (including the limitations). Link is: https://github.com/rarylson/update-conf-py-DO-NOT-USE/
    • I would recommend you to take a look at folders .codepipeline and extras.
  • Avoid re-calculating via the hard way what it's already in the env vars built-in on CodeBuild.
    • Some examples are parsing CODEBUILD_BUILD_ARN to get the account ID. Or checking if CODEBUILD_BUILD_ID exists instead of defining CODEBUILD and CI.
  • Finally, I would recommend you to dive deep into the comments above, and re-think what is the expected behavior for variables like CODEBUILD_GIT_BRANCH, CODEBUILD_GIT_TAG and so on.

from aws-codebuild-extras.

Related Issues (13)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.