Coder Social home page Coder Social logo

GH Artifact lookup times out about craft HOT 23 OPEN

mydea avatar mydea commented on June 18, 2024
GH Artifact lookup times out

from craft.

Comments (23)

asottile-sentry avatar asottile-sentry commented on June 18, 2024

from the logs it looks like craft sees your CI as successful before the artifact upload job has started -- once it sees success it moves on to trying to locate artifacts from the successful CI

you can configure the necessary jobs here

from craft.

mydea avatar mydea commented on June 18, 2024

Ah, I see, so basically this is completely wrong the way it is set up currently 😅

So I need to add this https://github.com/getsentry/sentry-javascript/actions/runs/5715683859/job/15486522728 as the required context, so:

statusProvider:
  name: github
  config:
    contexts:
      - All required tests passed or skipped

from craft.

Lms24 avatar Lms24 commented on June 18, 2024

@asottile-sentry can you clarify what contexts should be set to? We tried the name property of the job but that's not it as craft tells us it can't find the context.

For reference, here's the job we want to wait on:
https://github.com/getsentry/sentry-javascript/blob/754b08279c171ab730224d2432e2e89327fc5932/.github/workflows/build.yml#L858-L873

from craft.

Lms24 avatar Lms24 commented on June 18, 2024

Or is it enough to just set without any context?

statusProvider:
  name: github

from craft.

asottile-sentry avatar asottile-sentry commented on June 18, 2024

I am not sure -- I'm just going by the docs :S

from craft.

BYK avatar BYK commented on June 18, 2024

@Lms24 newest versions of Craft use github as the default status provider. Let me look into your runs.

from craft.

BYK avatar BYK commented on June 18, 2024

Very first step would be to change this line: https://github.com/getsentry/sentry-javascript/blob/99e347907812873cc0c832aa25968c3db6587af1/.craft.yml#L1

To say 1.4.2 instead of 0.23.1. I don't remember when we change the defaults for the context but the more recent version the better.

Moreover, the issue you originally reported doesn't seem to be related to contexts. I'll dig a bit more and report back here.

from craft.

BYK avatar BYK commented on June 18, 2024

If you look at the logs starting from here: https://github.com/getsentry/publish/actions/runs/5715907847/job/15486244767#step:10:80

It actually just waits for all the checks to pass. The context parameter is to give it an explicit list of contexts to pass (so you don't have to wait for all).

from craft.

BYK avatar BYK commented on June 18, 2024

So here in the artifact upload job for the failed publish, it says it successfully uploaded the artifacts at 23 past the hour: https://github.com/getsentry/sentry-javascript/actions/runs/5715683859/job/15485977459#step:6:970

The publish job fails at 27 minutes past the hour: https://github.com/getsentry/publish/actions/runs/5715907847/job/15486244767#step:10:133

That means for some reason GitHub API did not make the uploaded artifacts available for about 4 minutes. That's quite long. It maybe because the artifacts seem quite large.

I googled a bit to see if there's any mention of an expected delay for artifacts to be available but failed to find one. You may wanna raise this issue with GitHub support for investigation. In the meantime, either increasing the number of tries or extending the delay as you originally suggested makes sense for a workaround.

from craft.

Lms24 avatar Lms24 commented on June 18, 2024

@BYK thanks for looking into this!

To say 1.4.2 instead of 0.23.1. I don't remember when we change the defaults for the context but the more recent version the better.

I'm confused now. I thought we just always use the latest craft version when starting a release via getsentry/publish? We only recently were blocked by a bug in the latest craft version. So why would it take an older version just for the status provider?

It actually just waits for all the checks to pass. The context parameter is to give it an explicit list of contexts to pass (so you don't have to wait for all).

Ok, so this, plus your last reply, plus the fact that GH is already the default status provider, suggests to me that our config doesn't need adjustments but it's a GH problem 🤔 The weird thing here is that our initial publish attempts fail often. Really often. I'm wondering how often that's overall because of test flakes vs. this artifact retrieval problem.

In the meantime, either increasing the number of tries or extending the delay as you originally suggested makes sense for a workaround.

Still seems like a good idea to me.

from craft.

BYK avatar BYK commented on June 18, 2024

My pleasure!

I'm confused now. I thought we just always use the latest craft version when starting a release via getsentry/publish? We only recently were blocked by a bug in the latest craft version. So why would it take an older version just for the status provider?

Sorry for the confusion I caused. Some years ago, we did not default to GitHub as the status provider. That was changed around version 0.21.0 and we added a check: if the config file mentioned a version earlier than 0.21.0, don't default to anything. For anything else, use GitHub. So this file was already mentioning 0.23.1, hence past that version, hence nothing to change or worry about. Moreover, that epoch check was removed from the code a long time ago too 😅

Ok, so this, plus your last reply, plus the fact that GH is already the default status provider, suggests to me that our config doesn't need adjustments but it's a GH problem 🤔

Exactly! If you had an issue with your config, you'd have known about it a very long time ago. .craft.yml does not seem to get many updates.

The weird thing here is that our initial publish attempts fail often. Really often. I'm wondering how often that's overall because of test flakes vs. this artifact retrieval problem.

That is very easy to check and quantify. Just go check the logs and see why they fail. This one you linked was due to GitHub not making artifacts available quickly enough. If this is frequent, I'd blame that as if your tests fail I think you get another fail email for the failed workflow on the release branch.

from craft.

Lms24 avatar Lms24 commented on June 18, 2024

The same thing happened again. Opened #484 to fix this. I spent too much time this week trying to fix releases. This needs to be improved.

from craft.

BYK avatar BYK commented on June 18, 2024

The artifacts however are available here: https://github.com/getsentry/sentry-javascript/actions/runs/5752023602

I'd definitely raise this with GitHub Support. Again, I suspect it may have todo something with the number of files or the size of the unzipped artifacts.

from craft.

Lms24 avatar Lms24 commented on June 18, 2024

Hmm I'm not sure if we're missing something here. I think there might still be some problem in the status provider. Looking at the run I linked, we can see the following time stamps:

So we started downloading artifacts before the action even completed.

Shouldn't the status provider in Craft only start downloading artifacts once the entire action completed (by default)? I have a feeling that the artifacts might only be available once the action completed. Does this make sense?

from craft.

BYK avatar BYK commented on June 18, 2024

Shouldn't the status provider in Craft only start downloading artifacts once the entire action completed (by default)? I have a feeling that the artifacts might only be available once the action completed. Does this make sense?

Aha, that makes sense! Then I guess calculating the combined status of the commit step may some issues. That part was always a bit tricky. Relevant code is here:

} else {
logger.debug(
'No config provided for GitHub status provider, calculating the combined status...'
);
let commitApiStatusResult;
if (
revisionStatus.total_count === 0 &&
revisionStatus.state === 'pending'
) {
// Edge case, this is what GitHub returns when there are no registered legacy checks.
logger.debug('No legacy checks detected, checking for runs...');
const hasPendingActiveSuites = revisionCheckSuites.check_suites.some(
suite =>
// Need the any cast as octokit lacks this prop in its types
(suite as any).latest_check_runs_count > 0 &&
suite.status === 'queued'
);
if (revisionChecks.total_count > 0) {
logger.debug('Check runs exist, continuing...');
commitApiStatusResult = CommitStatus.SUCCESS;
} else if (hasPendingActiveSuites) {
logger.debug('Pending check suites exist, continuing...');
commitApiStatusResult = CommitStatus.PENDING;
} else {
logger.warn('No valid build contexts detected, did any checks run?');
commitApiStatusResult = CommitStatus.NOT_FOUND;
}

We may wanna do some digging into the API and modernize that code, especially due to the "legacy checks" part.

from craft.

asottile-sentry avatar asottile-sentry commented on June 18, 2024

@Lms24 it seems in that case then a job would need to be added to your workflow which requires all others to finish -- then you'd use that as the indicator for craft's status

from craft.

Lms24 avatar Lms24 commented on June 18, 2024

I guess it's worth trying as long as sentry-javascript is the only affected repo. But it means we're adapting to broken behaviour.

from craft.

asottile-sentry avatar asottile-sentry commented on June 18, 2024

I think what's happening is there's a point where everything is passed (all statuses green, none pending) and craft can't know that there's more things to run

from craft.

Lms24 avatar Lms24 commented on June 18, 2024

Soo, I just tried to set

statusProvider:
  name: github
  config:
    contexts:
      - job_required_jobs_passed

but the release failed again.

I'm going to try jobs.job_required_jobs_passed next which should be a valid GH context but I have no idea if this is even working correctly.

When I query the API endpoint manually with

gh api \
  -H "Accept: application/vnd.github+json" \
  -H "X-GitHub-Api-Version: 2022-11-28" \
  /repos/getsentry/sentry-javascript/commits/583e0bd53968b56d0f63c132c32b989dbcb999cb/status

I get an empty array for statuses which should contain the context we're waiting on.

{
  "state": "pending",
  "statuses": [],
  // ...
}

Not sure if this endpoint even works correctly. For instance, it always returns pending and this was already reported here.

from craft.

Lms24 avatar Lms24 commented on June 18, 2024

Update: I tried jobs.job_required_jobs_passed in dry run mode and it didn't work either. Not sure what's happening here but if I had to guess that endpoint seems broken.

from craft.

Lms24 avatar Lms24 commented on June 18, 2024

Given that there's no way to make this work with our current status checks:

I'm wondering if we should use the Actions endpoint instead of the commit based endpoints: https://docs.github.com/en/rest/actions/workflow-jobs?apiVersion=2022-11-28#get-a-job-for-a-workflow-run

Seems like this is what we need to poll the status of a specific job. Now what's left to figure out is

  • how to get the run id
  • how to not break everyone by changing stuff around 🙃

from craft.

BYK avatar BYK commented on June 18, 2024

Actions endpoint should it be used here as that's specific to GitHub actions but the GitHub status provider (and GitHub status checks) are independent from GitHub Actions.

You may wanna try graphql endpoints.

from craft.

Lms24 avatar Lms24 commented on June 18, 2024

Alright folks, I'm done with trying to fix this. Given that there's a somewhat working but annoying workaround (namely, telling our managers to wait ~30min with adding the accept label in getsentry/publish) I can't justify working on this any longer. My gut feeling tells me that we can't be the only repo that's affected by this but what can I do...

To whoever picks this up: If we can't easily change the current github status provider, maybe the path of least friction/resistance is to create a second Github (Action) status provider that uses working endpoints. Individual repos could opt-in to use this provider instead and we wouldn't risk breaking existing publishing configs who rely on the older endpoints (if there are any).

from craft.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.