Coder Social home page Coder Social logo

poppingxanax / justclone Goto Github PK

View Code? Open in Web Editor NEW
3.0 2.0 1.0 4.38 MB

Automate website scraping and resource extraction with this Go script, effortlessly downloading CSS, JS, and image files while preserving website structure and providing scraping statistics.

Go 100.00%
clone clone-website cloner cloning go golang html html-scraper scraper

justclone's Introduction

Website Scraping Script

This script is designed to scrape a website and download its CSS files, JS files, and images. It also updates the HTML file with local references to the downloaded files.

Features

  • Downloads CSS files, JS files, and images from a website
  • Replaces the URLs in the HTML with local file references
  • Creates separate directories for CSS, JS, and images
  • Handles redirects and follows them to the final destination
  • Provides scraping statistics including the total number of CSS files, JS files, and images found

Requirements

  • Go 1.16 or higher
  • go get github.com/common-nighthawk/go-figure
  • go get github.com/PuerkitoBio/goquery
  • go get github.com/fatih/color

Usage

  1. Clone the repository or download the script file.
  2. Build the project using the command go build.
  3. Run the executable using ./main.
  4. Wait for the script to complete the scraping process.
  5. The downloaded files will be stored in separate directories (css, js, imgs, etc) under the website's domain name.
  6. The updated HTML file with local references will be saved as index.html in the website's directory.

You may need to run chmod +x justclone

Todo List

  • Proxy Support ❌
  • Browser mode (for scraping sites with JS related challenges) ❌
  • User-Agent use ❌
  • HTML Parsing Improvements ❌
  • Metadata Extraction ❌
  • Interactive Mode (add an interactive mode where users can dynamically input URLs to scrape without relaunching the application each time) ❌
  • Cache improvements ❌
  • Better error logging ❌
  • Rate limit bypassing ❌
  • Authentication Support (if the website requires authentication or session management, add support for handling login credentials and maintaining authenticated sessions during the scraping process) ❌
  • Pre-set cookie(s) ❌

Limitations

  • The script may encounter connection issues with certain websites, especially if they have strict security measures or block scraping activities. In such cases, it may fail to download certain files or raise connection errors.
  • The script may not handle all possible edge cases or complex website structures. It is designed as a basic scraping tool and may require modifications for specific use cases.

Disclaimer

This script is provided as-is without any warranty. Use it responsibly and make sure to comply with the website's terms of service and legal requirements when scraping websites.

justclone's People

Contributors

poppingxanax avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

phuoctranitk20

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.