Coder Social home page Coder Social logo

ia-get's Introduction

ia-get
ia-get

File downloader for archive.org

GitHub all releases

Made with 💝 by 🤖

Usage 📖

Simply pass the URL of an archive.org details page you want to download and ia-get will automatically get the XML metadata and download all files to the current working directory.

ia-get https://archive.org/details/<identifier>

Why? 🤔💭

I wanted to download high-quality scans of ZZap!64 magazine and some read-only memory from archive.org. Archives of this type often include many large files, torrents are not always provided and when they are available they do not index all the available files in the archive.

Archive.org publishes XML documents for every page that indexes every file available. So I co-authored ia-get to automate the download process.

Features ✨

  • 🔽 Reliably download files from the Internet Archive
  • 🌳 Preserves the original directory structure
  • 🔄 Automatically resumes partial or failed downloads
  • 🔏 Hash checks to confirm file integrity
  • 🌱 Can be run multiple times to update existing downloads
  • 📊 Gets all the metadata for the archive
  • 📦️ Available for Linux 🐧 macOS 🍏 and Windows 🪟

Sharing is caring 🤝

You can use ia-get to download files from archive.org, including all the metadata and the .torrent file, if there is one. You can the start seeding the torrent using a pristine copy of the archive, and a complete file set.

A.I. Driven Development 🤖

This program is an experiment 🧪 and has been (mostly) written using Chatty Jeeps. When I started this project I had no experience 👶 with Rust and was curious to see if I could use AI tools to assist in developing a program in a language I do not know.

As featured on Linux Matters podcast! 🎙️ I am a presenter on Linux Matters and we discussed how the initial version of the program was created using Chatty Jeeps (ChatGPT-4) in Episode 16 - Blogging to the Fediverse.

I discussed that process, and the successes and drawbacks. We will be discussing the latest version of the project in a future episode.

Linux Matters Podcast
Linux Matters Podcast

Since that initial MVP, I've used Unfold.ai to add features and improve the code 🧑‍💻 All commits since Oct 27, 2023 that were co-authored by AI have full details of the AI contribution in the commit message. I've picked up some Rust along the way and some refactoring came directly from my own brain 🧠

Demo 🧑‍💻

Development 🏗️

Such as it is.

cargo build

Tests 🤞

I used these commands to test ia-get during development.

ia-get https://archive.org/details/deftributetozzap64
ia-get https://archive.org/details/zzapp_64_issue_001_600dpi

ia-get's People

Contributors

flexiondotorg avatar dependabot[bot] avatar lucperkins avatar

Stargazers

PSV avatar Dauliac avatar Grant Barrett avatar SneakyMthrFckr avatar  avatar  avatar César Augusto avatar Matthew Utin avatar Hank Donnay avatar  avatar Arthur Cervantes avatar Gary Sparks avatar Ralph Loizzo avatar Aftab Hussain avatar Nathaniel Sabanski avatar Adrian van Dongen avatar  avatar Pierre-Antoine Chéron avatar Joshua Fern avatar Galanggg avatar Alfin S. avatar Andy Gherna avatar Sebastián avatar Mandy Schoep avatar  avatar David Krauthamer avatar dai avatar Kevin Ridgway avatar  avatar Henrique Custódio avatar Jeff Uren avatar Clayton Kehoe avatar Tim Kersey avatar James Taylor avatar Mathis Wellmann avatar

Watchers

 avatar

Forkers

danieldewberry

ia-get's Issues

ia-get feedback and request to contribute PRs

Hi Martin,

We met in Riga and sat in the hotel bar with Popey and others until the early morning talking about Ubuntu, Ubuntu Hideout, Bottom, Danger Brothers, and the other exploits of Rik Mayall and Ade Edmondson. It was lovely to meet you both and share a few laughs. (This night )

Feedback

Regarding: Linux Matters Episode 20

The application itself looks good! I appreciate that your are conducting an experiment on the merits of AI Driven Development :TM: but since you mentioned contributions on Episode 20 of Linux Matters I thought I would offer these thoughts. They might be of interest to you to see where an external programmer can spot contextual/ ecosystem deficits in the generated answers.
I would like to create a PR for as many of the following things as you are comfortable allowing contributors address.

Documentation

I would like to contribute comment-documentation for the functions, structures and Rust module which will be converted into proper Rust documentation when cargo doc is executed.

is_url_accessible error management

is_url_accessible returns a Result, but no error is ever propagated from the function - only Ok(true) or Ok(false). This means that error handling here and here never reaches the Err branches.

This can be corrected as follows:

async fn is_url_accessible(url: &str) -> Result<bool, Box<dyn Error>> {
    let client = reqwest::Client::new();
    let response = client.get(url).send().await?;  // ? will propagate errors
    Ok(response.status().is_success()) // If your goal is to return `true` and `false` as Ok, then this will work.
}

.is_success() checks for HTTP response code 200-299 so I suspect you actually would want to handle the false case as an error as well e.g:

async fn is_url_accessible(url: &str) -> Result<bool, Box<dyn Error>> {
    let client = reqwest::Client::new();
    let response = client.get(url).send().await?;
    response.error_for_status()?;  // Propagate an error if HTTP response code is 400 - 599
    Ok(true)  // Only return Ok if no transport or HTTP error is encountered.
}
  • This will correct some currently hidden errors
  • Your existing error handling logic at the call-sites will be compatible with this although further updates can be made to improve the function return semantics:
async fn is_url_accessible(url: &str) -> Result<(), Box<dyn Error>> {
    let client = reqwest::Client::new();
    let response = client.get(url).send().await?;
    response.error_for_status()?;
    Ok(())
}

The call-sites then manage the errors by matching Ok(()) or Err(e) e.g.

    match is_url_accessible(details_url).await {
        Ok(()) => println!("╰╼ Archive.org URL online: 🟢"),  // Ok(true)/ Ok(false)/Err(e) semantics replaced by Ok(())/Err(e)
        Err(e) => {
            println!("├╼ Archive.org URL online: 🔴");
            panic!  ("╰╼ Exiting due to error: {}", e);
        }
    }

and

    match is_url_accessible(&xml_url).await {
        Ok(()) => println!("├╼ Archive.org XML online: 🟢"),
        Err(e) => {
            println!("├╼ Archive.org XML online: 🔴");
            panic!  ("╰╼ Exiting due to error: {}", e);
        }
    }

is_url_accessible request type

Instead of performing a GET request, it might be better internet citizenry to perform a HEAD request since no response body is required.

    let response = client.head(url).send().await?;

Test cases for PATTERN

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn check_valid_pattern() {
        let regex = Regex::new(&PATTERN).expect("Create regex");
        assert!(regex.is_match("https://archive.org/details/Valid-Pattern"))
    }

    #[test]
    fn check_invalid_pattern() {
        let regex = Regex::new(&PATTERN).expect("Create regex");
        assert!(!regex.is_match("https://archive.org/details/Invalid-Pattern-*"))
    }
}

PATTERN syntax

The escape sequences \/ are unnecessary and can be replaced with / only:

static PATTERN: &str = r"^https://archive\.org/details/[a-zA-Z0-9_-]+$";

Valid sequences can be seen here.

The aforementioned test cases demonstrate this works.

Request user-agent

It might be considered good internet citizenry to set the application user-agent to IA-Get for requests.

Packaging

I would like to help create a snapcraft.yaml. I have an existing file which I should be able to recraft.


I hope these thoughts are received in the good spirit they are intended and if you decide to incorporate any of these you would permit me to submit the commits for them.

Best regards

Daniel

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.