spotify / lighthouse-audit-service Goto Github PK

View Code? Open in Web Editor NEW

94.0 94.0 31.0 1.62 MB

License: Apache License 2.0

JavaScript 3.49% TypeScript 94.93% Dockerfile 1.41% Shell 0.18%

lighthouse-audit-service's People

Contributors

Stargazers

Watchers

lighthouse-audit-service's Issues

Rethink how audits are grouped

Audits are currently most directly grouped by "website." this is useful to some extent, but inside Spotify the audits are sort of run as a named entity, something more like:

- name: authenticated-mobile-homepage
  url: https://some-site.com
  authenticated: true
  screen-size: mobile
- name: unauthenticated-desktop-list-view
  url: https://some-site.com/list/foo
  authenticated: false
  screen-size: desktop

We should allow this type of setup more directly, potentially by having some kind of stored audit id with the ability to re-run and schedule that audit. Anything without an existing audit id could probably have the url become the id and then we could deprecate or push people away from the website thing.

Get single audit by id as JSON

GET /v1/audits/$AUDIT_ID

Content-Type: application/json

Request to publish spotify docker image to lighthouse to 9.0 as per upstream

LHError: PROTOCOL_TIMEOUT

Everytime i try to run an audit i get the following error:
LHError: PROTOCOL_TIMEOUT

Not sure what to do with this, any help would be appricated?

Reduce docker image size using multi-stage

Current docker image size is about ~670MB.

Can it be reduced using multi-stage docker build ?

OpenAPI/Swagger API documentation

We need Swagger and/or OpenAPI documentation for the endpoints in lighthouse-audit-service. it would be preferable to generate these somehow, considering we have TypeScript types annotating the expected requests and responses already.

Lot's of vulnerabilities for image v1.0.2

Dears,

we have just executed vulnerability scan over the lastest (v1.0.2) lighthouse-audit-service and there are lot's of vulnerabilities there:

I believe lot's of them are caused by using node:12 base image.

Could you please try to do the hardening of the image so it is not a threat to use it in our/other environments? E.g. see https://snyk.io/blog/10-best-practices-to-containerize-nodejs-web-applications-with-docker/

or what do you suggest?

Found some outdated NPM packages in package.json

I have found some old packages in package.json, can i work in this and update it ?

Filter audit list by URL prefix

GET /v1/audits?limit=N&offset=M&urlPrefix=https://www.spotify.com

Is this project dead?

Is it still being maintained? We are thinking about adopting lighthouse plugin in Backstage and wonder whether it is the best direction forward.

LHR summary on ListItem

Currently, we don't pass any LHR data in the Audit List. We could provide things like the category scores, which would be pretty useful. It would also be cool to be able to filter on those (show me audits with SEO scores less than 0.7 from the past week)

Reconcile with Lighthouse CI

https://github.com/GoogleChrome/lighthouse-ci

Lighthouse CI is a very similar effort which has the support of Google, so we ought to figure out where the overlap is and either fold our work into that or make the differentiation clear.

Support pagination on API responses

URLs which return list-based results do not do any pagination, leading to out-of-memory issues for websites with a grand audit history.

Pagination should be implemented on all API routes which return a list of items, with a sensible default (to gracefully integrate with the existing Backstage UI plugin)

My use case:
We automate lighthouse audits for our website 3 times a day, (as we deploy multiple times per day) however, the array has become so large that the lighthouse audit service pod is crashing because it doesn't have enough memory to handle the response from /v1/audits/:auditId/website

I propose pagination is added to every API route which returns an array with a default limit of 25 items per page.

Put coverage guardrails in place to maintain high coverage

Document library/node usage

[Security] SSRF using parameter ExtraHeaders leading to dangerous internal http call

Describe the bug
Hi team,
We found lighthouse-audit-service that use by Backstage as plugin can be use to send http request to arbitrary URL.
yes lighthouse is being use to do audit website, but it's dangerous because it can be use to send http request to internal network including http call to GCP metadata server to obtain sensitive information such as oauth token.

To Reproduce

prepare a server that will be audited, this server will be redirect to desire internal endpoint.
sample redirect handler

send audit request to audit and add addtional parameter ExtraHeaders so everytime lighthouse-audit-service send http request the addtional header will be included,
here is the image that can explain more
when audit done , we can fetch the response of internal http call captured in variable final-screenshot

GCP or any cloud provider has protection to prevent SSRF by add header validation, but since the lighthouse-audit-service allow parameter ExtraHeaders so attacker can add any header they want.

and as mentioned in the README.md that this project built by Backstage in mind so we reported it you Backstage but after dicussion with the Backstage team he refer us to report to spotify/lighthouse-audit-service

Thank you

Running several audits in parallel results in FAILED audit

Hi there!

I am trying to setup lighthouse service for our internal project, we are planning to constantly run audit for diferent URLs. I am using oficial image:

services:
  lh2:
    image: spotify/lighthouse-audit-service:latest
    environment:
      LAS_PORT: 1234
      PGUSER: user
      PGHOST: db
      PGPASSWORD: password
      PGDATABASE: dbname
    ports:
      - 1234:1234

When I do several POST requests with the following payload:

{
    "url": "https://www.google.com/",
    "options":
    {
        "awaitAuditCompleted": true,
                "chromePort": <randomPortNumber>
    }
}

I am always getting results only for the first successful response. the rest fails with the following error message in the logs:
error: failed while running lighthouse audit.

is it normal? or am I going smth wrong?

Thanks, Ivan

Reconcile if there

View audits, paginated

GET /v1/audits?limit=N&offset=M

Create publishing script based on semantic-release

Document setting up the server

Investigate future Lighthouse 6 incompatibilities

An alpha release of Lighthouse 6 was released on March 11, so we should start looking into what the situation will be if we end up straddling between Lighthouse 5.6 and Lighthouse 6 LHRs. Also, TypeScript seems to be coming to Lighthouse, which is important to look into in terms of whether we can cut our custom types loose.

I setup lighthouse through docker compose and when I navigate to localhost it shows cannot get

Export 'persistAudit' from audit api module to support custom lighthouse scripts

Hello, I've got a custom lighthouse script which runs outside of the audit service, however I would like to have it reported / tracked by the audit service.

To do this I'm installing this package as a dependency of my lighthouse script, but I need to make use of the 'persistAudit' function from the audit api module: https://github.com/spotify/lighthouse-audit-service/blob/master/src/api/audits/index.ts

Seeing as the other functions are exported, can this be added to the export list? It would allow me to upload the results of an independent lighthouse run to the lighthouse audit service database... Here is my relevant code snippet:

const lhAudit = require('@spotify/lighthouse-audit-service');
const uuid = require('uuid');
const pg = require("pg");
const lighthouse = require('lighthouse');

/* REDACTED */

// A custom run of lighthouse using the official lighthouse package
const result = await lighthouse(url, flags, config);

if (process.env.USE_DB === "true") {
    // Report to lhAuditService
    const audit = lhAudit.Audit.build({
        id: uuid.v4(),
        url: result?.lhr.requestedUrl,
        timeCreated: new Date(result?.lhr.fetchTime),
        timeCompleted: nowDate,
        report: result?.lhr
    });
    await lhAudit.persistAudit(        // This is what I want to do
        new pg.Pool({
            host: process.env.DB_HOST,
            port: process.env.DB_PORT,
            database: process.env.DB_NAME,
            user: process.env.DB_USERNAME,
            password: process.env.DB_PASSWORD
        }),
        audit
    );
}

As a workaround I've copied the source for persistAudit to this script.

Run a Lighthouse audit and store the result in the db

POST /v1/audits
{ “url”: $URL }

WebSocket api for running audits

Hey guys! It seems to be pretty nice if there would be WebSocket api for Lighthouse audits. Now the list of audits can display status, but it doesn't update until page reloading. So, I want to get the actual status of current audits in real time. If you are looking for contributions I can handle it, but please let me know if it's a good idea.

integration with Backstage catalog service

Problem

At Spotify, we need the Lighthouse audits to tightly integrate with Backstage's catalog service, particularly when used with the Backstage plugin.

as a developer viewing a website component, I want to see all audits run for my website.
as a developer viewing a website component, I want to run new audits for my website.

Proposed Solution

Note: This solution could be a path forward for many of the services we open source which have a hard dependency on the currently internal catalog service @ Spotify.

I would propose that lighthouse-audit-service continue to work without a catalog id, but rather add a kubernetes-like metadata field to every entry. This could be a JSON field in the postgres DB, and you could look up items by their metadata. We could support nuanced operators to allow for startsWith, endsWith lookups.

# obviously, this would be url encoded; not encoding for the sake of readability
http://lighthouse/v1/audits?metadata={"foo": { "op": "=", "value": "bar" }}

Then, when Lighthouse audits are created via Backstage, we would create them with the catalog entry's id as metadata.

In the Backstage plugin, we'd still have the top-level view, but we would also add component-level and things-l-own-level views for viewing audits and their trends.

Trigger new audit

POST /v1/audits
{ “url”: $URL }

Get single audit by id as HTML

GET /v1/audits/$AUDIT_ID

Content-Type: text/html

Delete audit by id

DELETE /v1/audits/$AUDIT_ID

Create `website` list and get api

{
  "url": String,
  "time_last_audited": Date, // latest time_created
  "audits": {
    "items": [{
       "id": String,
       "iframe_url": String, // url that can be used in the iframe
       "url": String,
       "status": Status
       "time_created": Date,
       "time_completed": Date?,
       "categories": LHR.Categories?, // categories with their scores
       "report": LHR
    }],
    "total": Int,
    "limit": Int,
    "offset": Int
  }
}

List is the same, with report omitted.

spotify / lighthouse-audit-service Goto Github PK

lighthouse-audit-service's People

Contributors

Stargazers

Watchers

Forkers

lighthouse-audit-service's Issues

Problem

Proposed Solution

Recommend Projects

Recommend Topics

Recommend Org