Coder Social home page Coder Social logo

backup's Introduction

Backup

A tool to backup data from IPFS to an S3 bucket.

Usage

Drop a .env file in the project root and populate:

DATA_URL=<value> # URL to ndjson file of objects with a CID property for backing up
VERIFIER_URL=<value> # URL to linkdex API
S3_REGION=<value>
S3_BUCKET_NAME=<value>
S3_ACCESS_KEY_ID=<value> # optional
S3_SECRET_ACCESS_KEY=<value> # optional
S3_ENDPOINT=<url> # optional, used to test against minio
CONCURRENCY=<number> # optional

Start the backup:

npm start

Use DEBUG=* to get detailed debugging info.

The tool writes complete CAR files to the S3 bucket to a path like: complete/<CID>.car. Where CID is a normalized, v1 base32 encoded CID.

Docker

There's a Dockerfile that runs the tool in docker.

docker build -t backup .
docker run -d backup

Test

With docker running on your machine you can run the tests with

npm test

peers.json

This file contains the peering config for kubo with all of our cluster nodes in.

You can updated it by running

npm run make-peers
``

backup's People

Contributors

alanshaw avatar francardoso93 avatar olizilla avatar

backup's Issues

Tasks sometimes fail on start

3/31/2023, 3:47:29 PM | node:internal/deps/undici/undici:11279 | backup
3/31/2023, 3:47:29 PM | fetchParams.controller.controller.error(new TypeError("terminated", { | backup
3/31/2023, 3:47:29 PM | ^ | backup
3/31/2023, 3:47:29 PM | TypeError: terminated | backup
3/31/2023, 3:47:29 PM | at Fetch.onAborted (node:internal/deps/undici/undici:11279:53) | backup
3/31/2023, 3:47:29 PM | at Fetch.emit (node:events:525:35) | backup
3/31/2023, 3:47:29 PM | at Fetch.terminate (node:internal/deps/undici/undici:10534:14) | backup
3/31/2023, 3:47:29 PM | at Object.onError (node:internal/deps/undici/undici:11374:36) | backup
3/31/2023, 3:47:29 PM | at Request.onError (node:internal/deps/undici/undici:8168:31) | backup
3/31/2023, 3:47:29 PM | at errorRequest (node:internal/deps/undici/undici:10220:17) | backup
3/31/2023, 3:47:29 PM | at TLSSocket.onSocketClose (node:internal/deps/undici/undici:9668:9) | backup
3/31/2023, 3:47:29 PM | at TLSSocket.emit (node:events:525:35) | backup
3/31/2023, 3:47:29 PM | at node:net:322:12 | backup
3/31/2023, 3:47:29 PM | at TCP.done (node:_tls_wrap:588:7) { | backup
3/31/2023, 3:47:29 PM | [cause]: SocketError: other side closed | backup
3/31/2023, 3:47:29 PM | at TLSSocket.onSocketEnd (node:internal/deps/undici/undici:9647:26) | backup
3/31/2023, 3:47:29 PM | at TLSSocket.emit (node:events:525:35) | backup
3/31/2023, 3:47:29 PM | at endReadableNT (node:internal/streams/readable:1359:12) | backup
3/31/2023, 3:47:29 PM | at process.processTicksAndRejections (node:internal/process/task_queues:82:21) { | backup
3/31/2023, 3:47:29 PM | code: 'UND_ERR_SOCKET', | backup
3/31/2023, 3:47:29 PM | socket: { | backup
3/31/2023, 3:47:29 PM | localAddress: '10.5.4.173', | backup
3/31/2023, 3:47:29 PM | localPort: 54664, | backup
3/31/2023, 3:47:29 PM | remoteAddress: '104.18.23.52', | backup
3/31/2023, 3:47:29 PM | remotePort: 443, | backup
3/31/2023, 3:47:29 PM | remoteFamily: 'IPv4', | backup
3/31/2023, 3:47:29 PM | timeout: undefined, | backup
3/31/2023, 3:47:29 PM | bytesWritten: 244, | backup
3/31/2023, 3:47:29 PM | bytesRead: 1672024 | backup
3/31/2023, 3:47:29 PM | } | backup
3/31/2023, 3:47:29 PM | } | backup
3/31/2023, 3:47:29 PM | } | backup
3/31/2023, 3:47:29 PM | Node.js v18.15.0

Migrate data from IPFS Cluster

We need to migrate data off of IPFS Cluster that is not already in buckets. Rough plan:

Determine list(s) of CIDs to migrate

  • Consider both upload.backup_urls as well as backup table when determining whether to backup
  • web3.storage also needs to consider psa_pin_request

Changes needed to this repo

  • Read an ndjson list of things to backup
  • Start from index (for restarts)
  • Replace updating DB with logging success failure
  • Output results

Some errors are not for modifying

Seeing

2023-02-23T14:30:10 2023-02-23T14:30:10.248Z backup:nft-2.json failed to backup xxx TypeError: Cannot set property code of  which has only a getter
2023-02-23T14:30:10     at withChunkTimeout (file:///home/circleci/app/ipfs-client.js:113:16)
2023-02-23T14:30:10     at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
2023-02-23T14:30:10     at async IpfsClient.dagExport (file:///home/circleci/app/ipfs-client.js:54:5)

from

err.code = 'ERR_TIMEOUT'

as we try to modify the error.code property

Process hangs without feedback

When running as a container in ECS, this process has a tendency to lock up. The container is still running, the process is still up, but it stops logging.

I think there are a couple suspects here:

  1. running node as pid 1

Node.js was not designed to run as PID 1 which leads to unexpected behaviour when running inside of Docker. For example, a Node.js process running as PID 1 will not respond to SIGINT (CTRL-C) and similar signals.
โ€“ https://github.com/nodejs/docker-node/blob/main/docs/BestPractices.md#handling-kernel-signals

  1. npm-run-all does not exit when the node script exits, it looks like the ipfs daemon process remains alive, and so the whole process keeps running, even tho it will no longer do the thing.

Ideally backup should exit if the node script exits.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.