spotify / lighthouse-audit-service Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
Audits are currently most directly grouped by "website." this is useful to some extent, but inside Spotify the audits are sort of run as a named entity, something more like:
- name: authenticated-mobile-homepage
url: https://some-site.com
authenticated: true
screen-size: mobile
- name: unauthenticated-desktop-list-view
url: https://some-site.com/list/foo
authenticated: false
screen-size: desktop
We should allow this type of setup more directly, potentially by having some kind of stored audit id with the ability to re-run and schedule that audit. Anything without an existing audit id could probably have the url become the id and then we could deprecate or push people away from the website thing.
GET /v1/audits/$AUDIT_ID
Content-Type: application/json
Current docker image size is about ~670MB.
Can it be reduced using multi-stage docker build ?
We need Swagger and/or OpenAPI documentation for the endpoints in lighthouse-audit-service. it would be preferable to generate these somehow, considering we have TypeScript types annotating the expected requests and responses already.
Dears,
we have just executed vulnerability scan over the lastest (v1.0.2) lighthouse-audit-service and there are lot's of vulnerabilities there:
I believe lot's of them are caused by using node:12 base image.
Could you please try to do the hardening of the image so it is not a threat to use it in our/other environments? E.g. see https://snyk.io/blog/10-best-practices-to-containerize-nodejs-web-applications-with-docker/
or what do you suggest?
I have found some old packages in package.json, can i work in this and update it ?
GET /v1/audits?limit=N&offset=M&urlPrefix=https://www.spotify.com
Is it still being maintained? We are thinking about adopting lighthouse plugin in Backstage and wonder whether it is the best direction forward.
Currently, we don't pass any LHR data in the Audit List. We could provide things like the category scores, which would be pretty useful. It would also be cool to be able to filter on those (show me audits with SEO scores less than 0.7 from the past week)
https://github.com/GoogleChrome/lighthouse-ci
Lighthouse CI is a very similar effort which has the support of Google, so we ought to figure out where the overlap is and either fold our work into that or make the differentiation clear.
URLs which return list-based results do not do any pagination, leading to out-of-memory issues for websites with a grand audit history.
Pagination should be implemented on all API routes which return a list of items, with a sensible default (to gracefully integrate with the existing Backstage UI plugin)
My use case:
We automate lighthouse audits for our website 3 times a day, (as we deploy multiple times per day) however, the array has become so large that the lighthouse audit service pod is crashing because it doesn't have enough memory to handle the response from /v1/audits/:auditId/website
I propose pagination is added to every API route which returns an array with a default limit of 25 items per page.
Describe the bug
Hi team,
We found lighthouse-audit-service that use by Backstage as plugin can be use to send http request to arbitrary URL.
yes lighthouse is being use to do audit website, but it's dangerous because it can be use to send http request to internal network including http call to GCP metadata server to obtain sensitive information such as oauth token.
To Reproduce
GCP or any cloud provider has protection to prevent SSRF by add header validation, but since the lighthouse-audit-service allow parameter ExtraHeaders so attacker can add any header they want.
and as mentioned in the README.md that this project built by Backstage in mind so we reported it you Backstage but after dicussion with the Backstage team he refer us to report to spotify/lighthouse-audit-service
Thank you
Hi there!
I am trying to setup lighthouse service for our internal project, we are planning to constantly run audit for diferent URLs. I am using oficial image:
services:
lh2:
image: spotify/lighthouse-audit-service:latest
environment:
LAS_PORT: 1234
PGUSER: user
PGHOST: db
PGPASSWORD: password
PGDATABASE: dbname
ports:
- 1234:1234
When I do several POST requests with the following payload:
{
"url": "https://www.google.com/",
"options":
{
"awaitAuditCompleted": true,
"chromePort": <randomPortNumber>
}
}
I am always getting results only for the first successful response. the rest fails with the following error message in the logs:
error: failed while running lighthouse audit.
is it normal? or am I going smth wrong?
Thanks, Ivan
GET /v1/audits?limit=N&offset=M
An alpha release of Lighthouse 6 was released on March 11, so we should start looking into what the situation will be if we end up straddling between Lighthouse 5.6 and Lighthouse 6 LHRs. Also, TypeScript seems to be coming to Lighthouse, which is important to look into in terms of whether we can cut our custom types loose.
Hello, I've got a custom lighthouse script which runs outside of the audit service, however I would like to have it reported / tracked by the audit service.
To do this I'm installing this package as a dependency of my lighthouse script, but I need to make use of the 'persistAudit' function from the audit api module: https://github.com/spotify/lighthouse-audit-service/blob/master/src/api/audits/index.ts
Seeing as the other functions are exported, can this be added to the export list? It would allow me to upload the results of an independent lighthouse run to the lighthouse audit service database... Here is my relevant code snippet:
const lhAudit = require('@spotify/lighthouse-audit-service');
const uuid = require('uuid');
const pg = require("pg");
const lighthouse = require('lighthouse');
/* REDACTED */
// A custom run of lighthouse using the official lighthouse package
const result = await lighthouse(url, flags, config);
if (process.env.USE_DB === "true") {
// Report to lhAuditService
const audit = lhAudit.Audit.build({
id: uuid.v4(),
url: result?.lhr.requestedUrl,
timeCreated: new Date(result?.lhr.fetchTime),
timeCompleted: nowDate,
report: result?.lhr
});
await lhAudit.persistAudit( // This is what I want to do
new pg.Pool({
host: process.env.DB_HOST,
port: process.env.DB_PORT,
database: process.env.DB_NAME,
user: process.env.DB_USERNAME,
password: process.env.DB_PASSWORD
}),
audit
);
}
As a workaround I've copied the source for persistAudit to this script.
POST /v1/audits
{ “url”: $URL }
Hey guys! It seems to be pretty nice if there would be WebSocket api for Lighthouse audits. Now the list of audits can display status, but it doesn't update until page reloading. So, I want to get the actual status of current audits in real time. If you are looking for contributions I can handle it, but please let me know if it's a good idea.
At Spotify, we need the Lighthouse audits to tightly integrate with Backstage's catalog service, particularly when used with the Backstage plugin.
Note: This solution could be a path forward for many of the services we open source which have a hard dependency on the currently internal catalog service @ Spotify.
I would propose that lighthouse-audit-service continue to work without a catalog id, but rather add a kubernetes-like metadata field to every entry. This could be a JSON field in the postgres DB, and you could look up items by their metadata. We could support nuanced operators to allow for startsWith, endsWith lookups.
# obviously, this would be url encoded; not encoding for the sake of readability
http://lighthouse/v1/audits?metadata={"foo": { "op": "=", "value": "bar" }}
Then, when Lighthouse audits are created via Backstage, we would create them with the catalog entry's id as metadata.
In the Backstage plugin, we'd still have the top-level view, but we would also add component-level and things-l-own-level views for viewing audits and their trends.
POST /v1/audits
{ “url”: $URL }
GET /v1/audits/$AUDIT_ID
Content-Type: text/html
DELETE /v1/audits/$AUDIT_ID
{
"url": String,
"time_last_audited": Date, // latest time_created
"audits": {
"items": [{
"id": String,
"iframe_url": String, // url that can be used in the iframe
"url": String,
"status": Status
"time_created": Date,
"time_completed": Date?,
"categories": LHR.Categories?, // categories with their scores
"report": LHR
}],
"total": Int,
"limit": Int,
"offset": Int
}
}
List is the same, with report omitted.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.