Coder Social home page Coder Social logo

sourceforge-grab-rsync's Introduction

sourceforge-grab-rsync

More information about the archiving project can be found on the ArchiveTeam wiki: SourceForge

Setup instructions

Warning Some Rsync Targets May Be 90GB Or Larger, Use Caution

Be sure to replace YOURNICKHERE with the nickname that you want to be shown as, on the tracker. You don't need to register it, just pick a nickname you like.

In most of the below cases, there will be a web interface running at http://localhost:8001/. If you don't know or care what this is, you can just ignore it—otherwise, it gives you a fancy view of what's going on.

If anything goes wrong while running the commands below, please scroll down to the bottom of this page. There's troubleshooting information there.

Running with a warrior

Follow the instructions on the ArchiveTeam wiki for installing the Warrior, and select the "SourceForge Rsync" project in the Warrior interface.

Running without a warrior

To run this outside the warrior, clone this repository, cd into its directory and run:

pip install seesaw

then start downloading with:

run-pipeline pipeline.py --concurrent 1 YOURNICKHERE

For more options, run:

run-pipeline --help

If you don't have root access and/or your version of pip is very old, you can replace "pip install seesaw" with:

wget https://raw.github.com/pypa/pip/master/contrib/get-pip.py ; python get-pip.py --user ; ~/.local/bin/pip install --user seesaw

so that pip and seesaw are installed in your home, then run

~/.local/bin/run-pipeline pipeline.py --concurrent 1 YOURNICKHERE

Running with lots of disk space

If you have a lot of disk space please consider running in large-rsync or medium-rsync mode. Medium mode (75 GB free recommended) allows your pipeline to download repositories up to 25 GB and large mode (300 GB free recommended) allows your pipeline to download repositories up to 150 GB. The default pipeline will not download repositories larger than 5GB.

Your pipeline will have an uncompressed copy and a compressed copy of the repository on disk at the same time for each count of concurrency, thus using approximately two times the repository size in disk space.

To enable medium-rsync mode cd into the sourceforge-grab-rsync directory and run the command:

touch MEDIUM-RSYNC

To enable large-rsync mode cd into the sourceforge-grab-rsync directory and run the command:

touch LARGE-RSYNC

Running multiple instances on different IPs

This feature requires seesaw version 0.0.16 or greater. Use pip install --upgrade seesaw to upgrade.

Use the --context-value argument to pass in bind_address=123.4.5.6 (replace the IP address with your own).

Example of running 1 thread, no web interface, and Wget binding of IP address:

run-pipeline pipeline.py --concurrent 1 YOURNICKHERE --disable-web-server --context-value bind_address=123.4.5.6

Distribution-specific setup

For Debian/Ubuntu:

adduser --system --group --shell /bin/bash archiveteam
apt-get install -y git-core libgnutls-dev screen python-dev python-pip bzip2 zlib1g-dev rsync
pip install seesaw
su -c "cd /home/archiveteam; git clone https://github.com/ArchiveTeam/sourceforge-grab-rsync.git; cd sourceforge-grab-rsync;" archiveteam
screen su -c "cd /home/archiveteam/sourceforge-grab-rsync/; run-pipeline pipeline.py --concurrent 1 --address '127.0.0.1' YOURNICKHERE" archiveteam
[... ctrl+A D to detach ...]

Wget-lua is also available on ArchiveTeam's PPA for Ubuntu.

For CentOS:

Ensure that you have the CentOS equivalent of bzip2 installed as well. You might need the EPEL repository to be enabled.

yum -y install gnutls-devel python-pip zlib-devel rsync
pip install seesaw
[... pretty much the same as above ...]

For openSUSE:

zypper install screen python-pip libgnutls-devel bzip2 python-devel rsync
pip install seesaw
[... pretty much the same as above ...]

For OS X:

You need Homebrew. Ensure that you have the OS X equivalent of bzip2 installed as well.

brew install python gnutls
pip install seesaw
[... pretty much the same as above ...]

There is a known issue with some packaged versions of rsync. If you get errors during the upload stage, sourceforge-grab will not work with your rsync version.

This supposedly fixes it:

alias rsync=/usr/local/bin/rsync

For Arch Linux:

Ensure that you have the Arch equivalent of bzip2 installed as well.

  1. Make sure you have python2-pip installed.
  2. Install [https://aur.archlinux.org/packages/wget-lua/](the wget-lua package from the AUR).
  3. Run pip2 install seesaw.
  4. Modify the run-pipeline script in seesaw to point at #!/usr/bin/python2 instead of #!/usr/bin/python.
  5. useradd --system --group users --shell /bin/bash --create-home archiveteam
  6. screen su -c "cd /home/archiveteam/sourceforge-grab/; run-pipeline pipeline.py --concurrent 2 --address '127.0.0.1' YOURNICKHERE" archiveteam

For FreeBSD:

Honestly, I have no idea. ./get-wget-lua.sh supposedly doesn't work due to differences in the tar that ships with FreeBSD. Another problem is the apparent absence of Lua 5.1 development headers. If you figure this out, please do let us know on IRC (irc.efnet.org #archiveteam).

Troubleshooting

Broken? These are some of the possible solutions:

wget-lua was not successfully built

If you get errors about wget.pod or something similar, the documentation failed to compile - wget-lua, however, compiled fine. Try this:

cd get-wget-lua.tmp
mv src/wget ../wget-lua
cd ..

The get-wget-lua.tmp name may be inaccurate. If you have a folder with a similar but different name, use that instead and please let us know on IRC what folder name you had!

Optionally, if you know what you're doing, you may want to use wgetpod.patch.

Problem with gnutls or openssl during get-wget-lua

Please ensure that gnutls-dev(el) and openssl-dev(el) are installed.

ImportError: No module named seesaw

If you're sure that you followed the steps to install seesaw, permissions on your module directory may be set incorrectly. Try the following:

chmod o+rX -R /usr/local/lib/python2.7/dist-packages

Issues in the code

If you notice a bug and want to file a bug report, please use the GitHub issues tracker.

Are you a developer? Help write code for us! Look at our developer documentation for details.

Other problems

Have an issue not listed here? Join us on IRC and ask! We can be found at irc.efnet.org #coldstorage.

sourceforge-grab-rsync's People

Contributors

arkiver2 avatar chpwssn avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.