Coder Social home page Coder Social logo

corban-lee / scalorize Goto Github PK

View Code? Open in Web Editor NEW
0.0 0.0 0.0 17.4 MB

A web development tool that takes screenshots of a given website at various resolutions in order to help discover scaling problems.

License: MIT License

Python 16.41% JavaScript 35.22% HTML 25.30% CSS 14.31% SCSS 8.76%

scalorize's People

Contributors

corban-lee avatar

Watchers

 avatar

scalorize's Issues

Exception while scraping getbootstrap.com with firefox

Scraping https://getbootstrap.com with Firefox via geckdriver.
I had already scraped the site partially with edge driver.

127.0.0.1 - - [07/Jun/2023 09:56:43] "GET /output/getbootstrap.com/2.3.2/1920%20firefox.png HTTP/1.1" 200 -
Debugging middleware caught exception in streamed response at a point where response headers were already sent.
Traceback (most recent call last):
  File "d:\Projects\WebShot\venv\Lib\site-packages\werkzeug\wsgi.py", line 289, in __next__
    return self._next()
           ^^^^^^^^^^^^
  File "d:\Projects\WebShot\venv\Lib\site-packages\werkzeug\wrappers\response.py", line 32, in _iter_encoded
    for item in iterable:
  File "D:\Projects\WebShot/WebShot\scrape.py", line 108, in stream_screenshots_generator
    for screenshot in screenshots:
  File "D:\Projects\WebShot/WebShot\scrape.py", line 144, in process_url
    self.scrape_urls_to_queue(driver, url)
  File "D:\Projects\WebShot/WebShot\scrape.py", line 164, in scrape_urls_to_queue
    hrefs = [anchor_tag.get_attribute("href") for anchor_tag in self.get_anchor_tags(driver)]
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Projects\WebShot/WebShot\scrape.py", line 164, in <listcomp>
    hrefs = [anchor_tag.get_attribute("href") for anchor_tag in self.get_anchor_tags(driver)]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\Projects\WebShot\venv\Lib\site-packages\selenium\webdriver\remote\webelement.py", line 178, in get_attribute
    attribute_value = self.parent.execute_script(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\Projects\WebShot\venv\Lib\site-packages\selenium\webdriver\remote\webdriver.py", line 500, in execute_script
    return self.execute(command, {"script": script, "args": converted_args})["value"]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\Projects\WebShot\venv\Lib\site-packages\selenium\webdriver\remote\webdriver.py", line 440, in execute
    self.error_handler.check_response(response)
  File "d:\Projects\WebShot\venv\Lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 245, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.StaleElementReferenceException: Message: The element with the reference 3a707de3-efd1-48a7-8cd2-ca4705722dfb is stale; either its node document is not the active document, or it is no longer connected to the DOM
Stacktrace:
RemoteError@chrome://remote/content/shared/RemoteError.sys.mjs:8:8
WebDriverError@chrome://remote/content/shared/webdriver/Errors.sys.mjs:183:5
StaleElementReferenceError@chrome://remote/content/shared/webdriver/Errors.sys.mjs:495:5
element.getKnownElement@chrome://remote/content/marionette/element.sys.mjs:508:11
deserializeJSON@chrome://remote/content/marionette/json.sys.mjs:233:33
cloneObject/result<@chrome://remote/content/marionette/json.sys.mjs:50:52
cloneObject@chrome://remote/content/marionette/json.sys.mjs:50:25
deserializeJSON@chrome://remote/content/marionette/json.sys.mjs:244:16
cloneObject@chrome://remote/content/marionette/json.sys.mjs:56:24
deserializeJSON@chrome://remote/content/marionette/json.sys.mjs:244:16
json.deserialize@chrome://remote/content/marionette/json.sys.mjs:248:10
receiveMessage@chrome://remote/content/marionette/actors/MarionetteCommandsChild.sys.mjs:85:30

Add Settings

  • Always show files before folders
  • Resolutions (list of "1920x1080", "360x640", etc.)
  • Browser (chrome, firefox, edge, etc.)
  • Folders start as collapsed/expanded
  • Allow external domains (allow scraping domains outside of entered URL, may cause infinite loops, check this)

Themes Dropdown

Make the new themes dropdown functional.
The buttons should act like radio buttons, and the setting should be saved to local storage.

Permission Denied Exceptions

Investigate and correct these errors.
It might be trying to serve a file that doesn't exist.

[2023-06-20 21:51:43 +0100] [21876] [INFO] 127.0.0.1:55597 GET /output/getbootstrap.com/docs/4.5/getting-started/contents/ 1.1 500 - 13186
[2023-06-20 21:51:44 +0100] [21876] [ERROR] Error in ASGI Framework
Traceback (most recent call last):
  File "d:\Projects\WebShot\venv\Lib\site-packages\hypercorn\asyncio\task_group.py", line 23, in _handle
    await app(scope, receive, send, sync_spawn, call_soon)
  File "d:\Projects\WebShot\venv\Lib\site-packages\hypercorn\app_wrappers.py", line 33, in __call__
    await self.app(scope, receive, send)
  File "d:\Projects\WebShot\venv\Lib\site-packages\quart\app.py", line 1902, in __call__
    await self.asgi_app(scope, receive, send)
  File "d:\Projects\WebShot\venv\Lib\site-packages\quart\app.py", line 1928, in asgi_app
    await asgi_handler(receive, send)
  File "d:\Projects\WebShot\venv\Lib\site-packages\quart\asgi.py", line 51, in __call__
    _raise_exceptions(done)
  File "d:\Projects\WebShot\venv\Lib\site-packages\quart\asgi.py", line 353, in _raise_exceptions
    raise task.exception()
  File "d:\Projects\WebShot\venv\Lib\site-packages\quart\asgi.py", line 102, in handle_request
    await asyncio.wait_for(self._send_response(send, response), timeout=timeout)
  File "C:\Python311\Lib\asyncio\tasks.py", line 479, in wait_for
    return fut.result()
           ^^^^^^^^^^^^
  File "d:\Projects\WebShot\venv\Lib\site-packages\quart\asgi.py", line 130, in _send_response
    async with response.response as response_body:
  File "d:\Projects\WebShot\venv\Lib\site-packages\quart\wrappers\response.py", line 149, in __aenter__
    self.file = await self.file_manager.__aenter__()
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\Projects\WebShot\venv\Lib\site-packages\aiofiles\base.py", line 98, in __aenter__
    self._obj = await self._coro
                ^^^^^^^^^^^^^^^^
  File "d:\Projects\WebShot\venv\Lib\site-packages\aiofiles\threadpool\__init__.py", line 97, in _open
    f = yield from loop.run_in_executor(executor, cb)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\concurrent\futures\thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
PermissionError: [Errno 13] Permission denied: 'output\\getbootstrap.com\\docs\\4.5\\getting-started\\contents'

'Memory Mode' Setting is Reversed

Enabling memory mode should cause images to be written to and served from disk, and disabling this setting should instead store the image data in memory. Currently, the reverse is true, enabled writes to disk and disabled stores in memory.

This should be fixed.

Setup Web Scraper Logging

There must be a logging system in place to help identify and correct various potential problems with the web scraper.
The logging must be written to a log file, this might require research for asynchronous support.

Redesign 'Actively Scraping' Toast

Currently, it just shows the domain, browser being used and elapsed time in seconds.

image

This should be updated with a new design.

Design Specifications:

  • Must show the domain
  • Must show elapsed time in one form or another
  • Must indicate what browser is being used
  • Add a button to end the task (ignore functionality for the time being)
  • Show some more information, such as the full address of the last scraped item.
  • Design may be something other than a toast, such as a banner or other prominent indicator.

Tasks:

  • Create a new design
  • Implement the new design

High Memory Usage

It's to be expected with this application, but optimisations should still be sought for.

Any changes in memory usage, positive, or negative, should be added to this issue.
Try to reduce memory usage where possible.

image

Mismatched Filetree Item Hrefs and Shelf IDs

Clicking on a filetree item should scroll the page to the shelf with the matching ID. This sometimes works, but other times does not.

This is caused because the ID is constructed differently for the filetree and shelves, also some filetree items are constructed for non-existing shelves โ€“ these should maybe show a pop-up message or act as the dropdown button?

Screenshot Duplicates

Only sometimes, certain shelves see duplicate images.

I believe that somehow the same page is being queued twice, or a redirect is bypassing the visited links checks.

image

Cross-Browser Screenshot Inconsistency

Screenshots requested with chrome driver at 1920x1080 are fine
Screenshots requested with firefox driver at 1920x1080 come out at 1920x995

Find and correct this issue

Save Settings in LocalStorage

These settings aren't saved when the page is reloaded.
Ensure they are stored in local storage when selected, and loaded correctly with the page.

  • Concurrent Processes Limit (Currently resets to 10 on each refresh)
  • Full Page Screenshot (Currently resets to False on each refresh)
  • Memory Mode (Currently resets to False on each refresh)
  • Block Foreign Domains (Currently resets to False on each refresh)
  • Selected Resolutions (Currently, any/all resolutions are de-selected on page refresh)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.