Coder Social home page Coder Social logo

error-tracker's Introduction

Error reporting

This is not an official Google product

Receives error reports emitted by AMP HTML runtime library and sends them to the Google Cloud Error Logging service.

This tool does not collect any user data or information.

Setup

  1. Enable Google Cloud Logging API.
  2. Authenticate with Google Cloud: $ gcloud auth login
  3. Start the server: $ npm start

Deployments

This application runs on Google Cloud Functions. There are three endpoints that execute the same functionality:

  • /r - 90% of traffic goes to this endpoint
  • /r-beta - 10% of traffic goes to this endpoint
  • /r-dev - only manual traffic goes to this endpoint

Note that amphtml, by default, sends reports to https://us-central1-amp-error-reporting.cloudfunctions.net/r and to https://us-central1-amp-error-reporting.cloudfunctions.net/r-beta. This is considered the canonical error reporting service.

Deploying to /r-dev

Any developer with a Google Cloud Project that was set up as above can deploy to the /r-dev endpoint of their project by running npm run deploy-dev. This action will directly deploy the function to GCP.

Deploying to /r-beta

Commits merged to this repository's main branch are automatically deployed to the /r-beta endpoint on the canonical error reporting service using a Cloud Build action, defined in the cloudbuild.yaml config file.

Deploying to /r

This action can only be performed by GitHub users with write permission on this repository. To deploy to the production/stable endpoint /r, run npm run deploy-stable. This will create and push a Git tag of the form deploy-stable-YYMMDDHHMMSS, which in turn triggers a Cloud Build action similar to the beta environment.

License

Licensed under the Apache 2.0 license
http://www.apache.org/licenses/LICENSE-2.0

error-tracker's People

Contributors

danielrozenberg avatar erwinmombay avatar greenkeeper[bot] avatar jridgewell avatar kristoferbaxter avatar lannka avatar nainar avatar raphaeljunior avatar rcebulko avatar renovate[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

error-tracker's Issues

Cannot read property 'includes' of undefined

TypeError: Cannot read property 'includes' of undefined
    at /app/routes/error-tracker.js:74:46
    at Array.some (native)
    at ignoreMessageOrException (/app/routes/error-tracker.js:73:25)
    at getHandler (/app/routes/error-tracker.js:128:7)

Move Stable throttling to client

Context: 90% of errors reported by clients on the Stable release channel are throttled by the backend service and never logged. By default, GCP has a 10 GHz-seconds limit on total CPU allocation for Cloud Functions invocations. A recent surge in errors from a (fixed) high-frequency error pushed us over the edge, and we hit our daily quota for CPU allocation.

For now, I've requested a quota increase to better handle load spikes like this. To reduce some of that request load, we can move the 90% throttling of Stable traffic to the client. This can be a very simple check implemented by the runtime that would drastically reduce backend load.

/cc @jridgewell

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

This repository currently has no open or pending branches.

Detected dependencies

cloudbuild
cloudbuild.yaml
github-actions
.github/workflows/ci.yml
  • actions/checkout v4
  • actions/setup-node v4
npm
package.json
  • @google-cloud/logging 11.0.0
  • @google-cloud/storage 7.7.0
  • @jridgewell/trace-mapping 0.3.22
  • express 4.18.2
  • http-status-codes 2.3.0
  • lodash.debounce 4.0.8
  • node-fetch 3.3.2
  • safe-decode-uri-component 1.2.2-native
  • @google-cloud/functions-framework 3.3.0
  • @types/express 4.17.21
  • @types/lodash.debounce 4.0.9
  • @typescript-eslint/eslint-plugin 6.19.1
  • @typescript-eslint/parser 6.19.1
  • chai 5.0.3
  • eslint 8.56.0
  • eslint-config-prettier 9.1.0
  • eslint-plugin-chai-expect 3.0.0
  • eslint-plugin-import 2.29.1
  • eslint-plugin-sort-destructure-keys 1.5.0
  • eslint-plugin-prettier 5.1.3
  • mocha 10.2.0
  • nock 13.5.0
  • prettier 3.2.4
  • sinon 17.0.1
  • source-map 0.7.4
  • superagent 8.1.2

  • Check this box to trigger a request for Renovate to run again on this repository

Stack trace unminify throws "memory access out of bounds" in wasm code

RuntimeError: memory access out of bounds
    at wasm-function[18]:5277
    at wasm-function[27]:66
    at BasicSourceMapConsumer._parseMappings (/app/node_modules/source-map/lib/source-map-consumer.js:329:47)
    at BasicSourceMapConsumer._getMappingsPtr (/app/node_modules/source-map/lib/source-map-consumer.js:315:12)
    at _wasm.withMappingCallback (/app/node_modules/source-map/lib/source-map-consumer.js:510:14)
    at Object.withMappingCallback (/app/node_modules/source-map/lib/wasm.js:95:11)
    at BasicSourceMapConsumer.originalPositionFor (/app/node_modules/source-map/lib/source-map-consumer.js:508:16)
    at unminifyFrame (/app/utils/unminify.js:79:49)
    at stack.map (/app/utils/unminify.js:153:36)
    at Array.map (<anonymous>)

go/ampe/CLTN18zhofuIDg

From the Oxidizing Source Maps with Rust and WebAssembly article, the author seems aware of this:

With the WebAssembly implementation, Chrome would erroneously throw RuntimeError: memory access out of bounds. Using Chrome’s debugger, the supposed out-of-bounds access was happening in an instruction sequence that does not exist in the .wasm file. All other browser’s WebAssembly implementations successfully ran the benchmark with this input, so I am inclined to believe it is a bug in the Chrome implementation.

We should consider downgrading the source-map dep to see if it'll fix the issue.

TypeError: Cannot read property 'startsWith' of undefined

TypeError: Cannot read property 'startsWith' of undefined
    at getHandler (/app/routes/error-tracker.js:77:14)
    at Layer.handle [as handle_request] (/app/node_modules/express/lib/router/layer.js:95:5)
    at next (/app/node_modules/express/lib/router/route.js:137:13)
    at Route.dispatch (/app/node_modules/express/lib/router/route.js:112:3)
    at Layer.handle [as handle_request] (/app/node_modules/express/lib/router/layer.js:95:5)
    at /app/node_modules/express/lib/router/index.js:281:22
    at Function.process_params (/app/node_modules/express/lib/router/index.js:335:12)
    at next (/app/node_modules/express/lib/router/index.js:275:10)
    at expressInit (/app/node_modules/express/lib/middleware/init.js:40:5)
    at Layer.handle [as handle_request] (/app/node_modules/express/lib/router/layer.js:95:5)

I2I: Rename service bucket from "Development" to "1%"

Currently, the service buckets for Beta and Experimental are called "CDN/Origin Development". The meaning of "Development" isn't clear to all engineers. A more obvious choice would be "CDN/Origin 1%" (or "CDN/Origin 1-Percent"), since that's how most people think about the Beta/Experimental release channels anyway.

/cc @ampproject/wg-infra @jridgewell

Possible bug or spec?

Hi there,

I was reading the source code of this repository for a project of mine and came across this line of code: routes/error-tracker.js#L102
If you follow where the value of req.body flows in the code, you can can see that it ends in a call to deleter() in utils/cache.js. So my question is this: which function is responsible for sanitizing req? It can be potentially exploited (or just result in an undesired behavior). I could not find information about it in the repository so I apologize if I misread or overlooked something.

Thanks

I2I: Error Reporting Migration Plan

With the migration to Cloud Functions, several things have to happen to fully migrate error reporting and introduce new features. Since Cloud Functions does not provide built-in traffic migration or versioning, we have a stable and beta channel with different URLs that the Runtime will divert between to allow testing new error reporting deployments. Stable is promoted explicitly by pushing a git tag, while beta releases are built automatically from master on each commit.

The new error reporting endpoints live at new URLs:

  • stable: https://us-central1-amp-error-reporting.cloudfunctions.net/r
  • beta: https://us-central1-amp-error-reporting.cloudfunctions.net/r-beta

Migrating to the new error reporting conventions includes a handful of changes:

  • introducing the stable Cloud Functions endpoint /r to the AMP Runtime and slowly migrating error reporting traffic over
  • introducing the beta endpoint /r-beta to the AMP Runtime and migrating a slice of traffic over
  • changing service bucket names and version identifiers to more useful, readable groupings
  • splitting expected errors into a separate project for monitoring

To minimize risk of something breaking, and to minimize the impact of the changes on developer workflow, I propose the following plan.

Week of March 30

  • introduce the stable endpoint as an AMP Runtime experiment (ampproject/amphtml#27501) for 20% of traffic in Beta/Experimental
  • introduce the stable endpoint as an AMP Runtime experiment (ampproject/amphtml#27501) for 20% of traffic in Beta/Experimental opt-in

Week of April 6

  • new endpoint traffic of 20% reaches Beta/Experimental 1%

This week allows us to confirm the new reporting endpoint reports correctly and scales as needed.

Week of April 13

  • new endpoint traffic of 20% reaches Stable
  • ramp traffic to the new endpoint to 100% in Beta/Experimental (ampproject/amphtml#27682)
  • modify split for the beta endpoint: 10% of new endpoint traffic (ampproject/amphtml#27682)
  • split expected errors into a separate logging project in beta endpoint (#124)

This week allows us to verify that the new reporting endpoint can scale to production traffic, and validates the 90-10 split to the beta endpoint.

Week of April 20

  • new endpoint traffic of 100% (90%/10%) reaches Stable

By this point, we have all errors being reported through the new endpoints, so we can start making changes to service buckets and version names while maintaining enough reporting volume in both the legacy and new buckets to ensure a useful amount of data in one or both at all times during the transition.

Week of April 27

  • promote expected error split into stable endpoint
  • rename version IDs to readable names in the beta endpoint (#127)
  • introduce service bucket renaming into beta endpoint (#128)

Week of May 4

  • promote service bucket and version renaming into stable endpoint
  • party 🎉

Week of May 11

  • Deep Dive on all Error Reporting process & infrastructure changes

/cc @rsimha @ampproject/wg-runtime

Errors during logging

Currently when stack driver logging experiences an error while logging errors reported to the Error tracker, it dumps the errors it was trying to log. The on error implementation of logging should handle these errors in a safe way without dumping them.

Rename version from RTV to readable string

Right now RTVs are used to reference release versions:

  • in release tracking issues
  • in error reporting dropdowns
  • in the console message in AMP pages
  • in discussions about regressions
  • in the release dashboard

In many of these contexts, a readable form of this version could be more useful for communication, understanding, and cross referencing between threads, errors, etc. I'm leaning towards a mapping like the following:

'052004030010070' --> '04-03 Nightly-Control (0010)'
'012004030010000' --> '04-03 Stable (0010)'
'012004030010001' --> '04-03 Stable (0010+1)'
'012004030010002' --> '04-03 Stable (0010+2)'
'022004030010070' --> '04-03 Control (0010)'
'002004172112280' --> '04-17 Experimental (2112)'
'032004172112280' --> '04-17 Beta (2112)'
'042004210608300' --> '04-21 Nightly (0608)'

These version strings

  • sort chronologically by default
  • indicate cherry-picks via +1, +2, etc.
  • used the hour/minute digits for a "fingerprint" of an RTV
  • are simple enough to remember and type without needing to copy-paste

These strings should be easier to remember, understand/parse, and communicate about. The fingerprints could act as a shorthand in discussions.

"We promoted Stable from 0010 to 2112, but 2112 was borked so we rolled back to 0010 until we could validate cherry-picking #1337 to 0010+1 fixed the issue. The bugfix #1337 is already in 0608"

I think some version of this stringification would improve clarity in most or all of the locations above. In some places (ex. release tracking issue), it would make sense to include both the stringified form and the raw RTV.

I2I: Migrate error reporting handler to serverless deployment

Background

Currently, this application is run on Google App Engine (Flexible) with automatic instance scaling between 4 and 100 instances (see app.yaml), typically floating around ~20 instances.

As a result of settings that have been deprecated since the error-tracker was last deployed, it's currently in a state where re-deploying causes a breakage related to GAE's readiness and liveness checks. @jridgewell and I have tried without success to identify the source. For an example of the impact of this, LTS errors can't currently be logged, since the update to support them is seemingly impossible to deploy

Proposal

One way to simplify the deployment process and unblock updates to this app is to move to a serverless deployment to Cloud Functions. As can be seen in the app entrypoint, the server defines very few endpoints (health checks which respond 200 and /r which delegates work to an errorTracker handler). It would be relatively straightforward to instead package these handler functions as two Cloud Functions--or even just one, since the health check would no longer be relevant.

Challenges

1. Caching the source maps and RTV diversions

Currently each GAE instance of the app fetches https://cdn.ampproject.org/rtv/metadata to identify which RTVs are currently being served, and drop errors from all older RTVs. The app uses a 5-50min cache for the diversion list, and these caches can at times be out-of-sync across the instances. It also caches source maps for 2 weeks to provide unminified line numbers in stack traces.

Cloud Functions are executed independently with no guarantees about the environment or local state. Initially, some form of Cloud Storage or Memorystore were considered to hold this RTV cache and share it across functions. However, according to the Optimizing Networking section, global state is often maintained between function invocations.

Cloud Functions spins up as many instances as required to handle the load, and only the exported function is called. Dependency loading, and creation/instantiation of global variables, happens only a) at the first function deployment and b) when new instances are started to scale up. In other words, if the cache is declared in the global scope outside the handler function, it can still be shared by function invocations on that instance. We will need to test/monitor the behavior of this scaling, but it appears the behavior will be similar to GAE instance scaling.

The same logic applies to the keys fetched at startup, though it may be possible to instead embed the keys directly in the Cloud Function configuration in Pantheon, eliminating the need to fetch them explicitly

2. New handler URL

Cloud Functions have a distinct handler URL of the form https://[REGION]-[PROJECT_NAME].cloudfunctions.net/[FUNCTION_NAME]. This means to actually use a serverless deployment, error-reporting code in amphtml (and any other projects that report to the app currently) would need to be updated to use the new URL.

We would likely want to do this behind a flag in the Experimental build, or configure error reporting to report to the new URL X% of the time and ramp up, so we can migrate to the new deployment gradually. Despite the clients reporting to two different endpoints, the errors themselves would still end up in the same place, so this migration should not impact our ability to monitor or assess error levels.

One alternative was to forward requests from the App Engine app to the Cloud Functions URL, but since we can't update the GAE app, that idea is moot.

/cc @ampproject/wg-infra @ampproject/wg-runtime

An in-range update of safe-decode-uri-component is breaking the build 🚨

The dependency safe-decode-uri-component was updated from 1.2.1-native to 1.2.1.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

safe-decode-uri-component is a direct dependency of this project, and it is very likely causing it to break. If other packages depend on yours, this update is probably also breaking those in turn.

Status Details
  • continuous-integration/travis-ci/push: The Travis CI build failed (Details).

Commits

The new version differs by 22 commits ahead by 22, behind by 17.

There are 22 commits in total.

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

Version 10 of node.js has been released

Version 10 of Node.js (code name Dubnium) has been released! 🎊

To see what happens to your code in Node.js 10, Greenkeeper has created a branch with the following changes:

  • Added the new Node.js version to your .travis.yml
  • The engines config in 1 of your package.json files was updated to the new Node.js version

If you’re interested in upgrading this repo to Node.js 10, you can open a PR with these changes. Please note that this issue is just intended as a friendly reminder and the PR as a possible starting point for getting your code running on Node.js 10.

More information on this issue

Greenkeeper has checked the engines key in any package.json file, the .nvmrc file, and the .travis.yml file, if present.

  • engines was only updated if it defined a single version, not a range.
  • .nvmrc was updated to Node.js 10
  • .travis.yml was only changed if there was a root-level node_js that didn’t already include Node.js 10, such as node or lts/*. In this case, the new version was appended to the list. We didn’t touch job or matrix configurations because these tend to be quite specific and complex, and it’s difficult to infer what the intentions were.

For many simpler .travis.yml configurations, this PR should suffice as-is, but depending on what you’re doing it may require additional work or may not be applicable at all. We’re also aware that you may have good reasons to not update to Node.js 10, which is why this was sent as an issue and not a pull request. Feel free to delete it without comment, I’m a humble robot and won’t feel rejected 🤖


FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

An in-range update of @google-cloud/logging is breaking the build 🚨

The dependency @google-cloud/logging was updated from 5.1.3 to 5.2.0.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

@google-cloud/logging is a direct dependency of this project, and it is very likely causing it to break. If other packages depend on yours, this update is probably also breaking those in turn.

Status Details
  • continuous-integration/travis-ci/push: The Travis CI build could not complete due to an error (Details).

Release Notes for v5.2.0

Features

  • add path template parsing for billing, organizations, and folders (#529) (1e8c67f)
Commits

The new version differs by 3 commits.

  • 26bac83 chore: release 5.2.0 (#530)
  • 1e8c67f feat: add path template parsing for billing, organizations, and folders (#529)
  • 4a1883f build: use config file for linkinator (#524)

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.