Coder Social home page Coder Social logo

projectwallace / extract-css-core Goto Github PK

View Code? Open in Web Editor NEW
36.0 3.0 5.0 510 KB

Extract all CSS from a given url, both server side and client side rendered.

Home Page: https://www.projectwallace.com/get-css

License: MIT License

JavaScript 100.00%
scrape css extract wallace extract-css js-styling inline-styling

extract-css-core's People

Contributors

bartveneman avatar dependabot[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

extract-css-core's Issues

browserOverride should accept a browser instance instead of a brittle config object

ATM, browserOverride is a complex piece of if-statements and a combination of fields, but actually it should accept a fully configured browser instance.

Example:

const browser = puppeteer.launch(browserOptions)
const css = extractCss(url, {browserOverride: browser})

We should build in some sanity checks that the browser has at least the correct interface in regard to the following methods:

  • newPage()
  • close()

Report where styles come from

Sometimes it's pretty interesting to know where styles may have come from. Some possible options:

  • <link rel="stylesheet"> in HTML
  • <link rel="stylesheet"> generated by JS
  • <style> in HTML
  • <style> generated by JS
  • <div style=""> in HTML
  • <div style=""> generated by JS
  • element.style.color = 'red' in JS
  • myStyle.insertRule('#blanc { color: white }', 0); CSSStyleSheet.insertRule()
  • @import rules for most of the above mentioned cases

Prior art

  • get-css does this, but not for JS-generated CSS, I think

CSS is no longer minified

This is a regression in 3.0.0:

Any minified CSS on the page is now returned un-minified by this package. This CSS should still be minified, because otherwise a css analyzer would pick up rgb(0, 0, 0) differently than rgb(0,0,0).

Add option to ignore inline styles/css-in-js/regular css

I could imagine that in some cases it's not interesting to get the inline styles from a page.

const cssWithoutInlineStyles = await extractCss('test.url', {
  includeInlineStyles: false,
  includeJsStylesheetsApi: true,
  includeLinks: true,
  includeStyleTags: true
})

Allow User-Agent string to be set

I want to override the specific user-agent so I can tell a website that it's not just Chrome crawling them, but my custom project UA.

Example implementation from Puppeteer docs:

page.setUserAgent(userAgent)

Build for master branch fails

https://travis-ci.org/bartveneman/extract-css-core/builds/625868162

✔ it rejects on an invalid url (1.4s)
✔ it finds JS generated <style /> CSS (1.5s)
✖ it finds css-in-js, like Styled Components 
✔ it combines server generated <link> and <style> tags with client side created <link> and <style> tags (2.5s)
✔ it fetches css from a page with CSS in server generated <style> inside the <head> (2.6s)
✔ it finds JS generated <link /> CSS (2.6s)
✔ it rejects if the url has an HTTP error status (2.6s)
✔ it fetches css from a page with CSS in a server generated <link> inside the <head> (2.6s)

106:   t.is(actual, expected)

Difference:

  - 'html { color: rgb(255, 0, 0); }'
  + 'html { color: rgb(255, 0, 0); }.hJHBhT { color: blue; font-family: sans-serif; font-size: 3em; }'

How to deal with inline styles?

It should be fairly easy to find inline styles, and it could certainly be very interesting to see their results, but how should they be reported? Inline styles don't have selectors, but merely declarations.

Scraping inline styles from a page:

[...document.querySelectorAll('[style]')].map(el => el.getAttribute('style')).join('')

We could generate a unique selector for each inline style attribute, but it could interfere with the resulting CSS statistics:

const nanoid = require('nanoid')
[...document.querySelectorAll('[style]')].map(el => {
  return `[x-inline-style-${nanoid()}] { // create a custom [x-inline-style-*] selector
    ${el.getAttribute('style')} // contains all declarations
  }`
)

Add element breadcrumbs for style tags and inline styles

Report where in the DOM the <style> and <x style="..."> were found. This breadcrumb could be generated by looking at the target DOM node, and traverse up the (while (node.parentNode)) and generating the selector for that node by taking the nodeName, className and ID.

[
  {
    href: undefined,
    breadcrumb: ['html', 'body', 'thing', 'p'],
    type: 'inline'
    css: '[x-inline-style] { color: red; }'
  },
  {
    href: undefined,
    breadcrumb: ['html', 'head', 'style'],
    type: 'style',
    css: 'p { }'
  }
]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.