projectwallace / extract-css-core Goto Github PK
View Code? Open in Web Editor NEWExtract all CSS from a given url, both server side and client side rendered.
Home Page: https://www.projectwallace.com/get-css
License: MIT License
Extract all CSS from a given url, both server side and client side rendered.
Home Page: https://www.projectwallace.com/get-css
License: MIT License
ATM, browserOverride
is a complex piece of if-statements and a combination of fields, but actually it should accept a fully configured browser instance.
Example:
const browser = puppeteer.launch(browserOptions)
const css = extractCss(url, {browserOverride: browser})
We should build in some sanity checks that the browser
has at least the correct interface in regard to the following methods:
newPage()
close()
Sometimes it's pretty interesting to know where styles may have come from. Some possible options:
<link rel="stylesheet">
in HTML<link rel="stylesheet">
generated by JS<style>
in HTML<style>
generated by JS<div style="">
in HTML<div style="">
generated by JSelement.style.color = 'red'
in JSmyStyle.insertRule('#blanc { color: white }', 0);
CSSStyleSheet.insertRule()@import
rules for most of the above mentioned casesThis is a regression in 3.0.0:
Any minified CSS on the page is now returned un-minified by this package. This CSS should still be minified, because otherwise a css analyzer would pick up rgb(0, 0, 0)
differently than rgb(0,0,0)
.
const css = extractCss('my-url', {
inlineStyles: 'include' // or 'exclude'
})
Namings could be better tho
await extract('https://cdn.jsdelivr.net/npm/tailwindcss/dist/tailwind.min.css')
''
Lots of CSS
CSS is inlined in <head>
with a <style>
tag. Extract-css-core reports the whole thing twice, but with different formatting for colors: first time it's#3d515b
(as authored), second time it's rgb(61, 81, 91)
(see Project Wallace commit). https://projectwallace.com/get-css also sees 1 <link>‐tag or @import
and 1 <style> ‐tag or CSS‐in‐JS
.
Puppeteer v3+ only supports Node v10+, making this a breaking change.
I could imagine that in some cases it's not interesting to get the inline styles from a page.
const cssWithoutInlineStyles = await extractCss('test.url', {
includeInlineStyles: false,
includeJsStylesheetsApi: true,
includeLinks: true,
includeStyleTags: true
})
I want to override the specific user-agent so I can tell a website that it's not just Chrome crawling them, but my custom project UA.
Example implementation from Puppeteer docs:
page.setUserAgent(userAgent)
I haven't found a way yet to extract CSS generated by styled-components (and probably others). Any help would be much appreciated.
Add support for Web Components, both open and closed. Maybe it already works, but at least it should be covered by tests.
Example: https://css-tricks.com/encapsulating-style-and-structure-with-shadow-dom/
https://travis-ci.org/bartveneman/extract-css-core/builds/625868162
✔ it rejects on an invalid url (1.4s)
✔ it finds JS generated <style /> CSS (1.5s)
✖ it finds css-in-js, like Styled Components
✔ it combines server generated <link> and <style> tags with client side created <link> and <style> tags (2.5s)
✔ it fetches css from a page with CSS in server generated <style> inside the <head> (2.6s)
✔ it finds JS generated <link /> CSS (2.6s)
✔ it rejects if the url has an HTTP error status (2.6s)
✔ it fetches css from a page with CSS in a server generated <link> inside the <head> (2.6s)
106: t.is(actual, expected)
Difference:
- 'html { color: rgb(255, 0, 0); }'
+ 'html { color: rgb(255, 0, 0); }.hJHBhT { color: blue; font-family: sans-serif; font-size: 3em; }'
It should be fairly easy to find inline styles, and it could certainly be very interesting to see their results, but how should they be reported? Inline styles don't have selectors, but merely declarations.
Scraping inline styles from a page:
[...document.querySelectorAll('[style]')].map(el => el.getAttribute('style')).join('')
We could generate a unique selector for each inline style attribute, but it could interfere with the resulting CSS statistics:
const nanoid = require('nanoid')
[...document.querySelectorAll('[style]')].map(el => {
return `[x-inline-style-${nanoid()}] { // create a custom [x-inline-style-*] selector
${el.getAttribute('style')} // contains all declarations
}`
)
Report where in the DOM the <style>
and <x style="...">
were found. This breadcrumb could be generated by looking at the target DOM node, and traverse up the (while (node.parentNode)
) and generating the selector for that node by taking the nodeName, className and ID.
[
{
href: undefined,
breadcrumb: ['html', 'body', 'thing', 'p'],
type: 'inline'
css: '[x-inline-style] { color: red; }'
},
{
href: undefined,
breadcrumb: ['html', 'head', 'style'],
type: 'style',
css: 'p { }'
}
]
To use this module inside an AWS Lambda, I must be able to provide puppeteer-core and aws-lambda-chrome to make it run smoothly.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.