Coder Social home page Coder Social logo

borgbackup / borg Goto Github PK

View Code? Open in Web Editor NEW
10.5K 152.0 728.0 32.82 MB

Deduplicating archiver with compression and authenticated encryption.

Home Page: https://www.borgbackup.org/

License: Other

Python 82.77% C 3.99% Shell 1.85% HTML 3.29% Cython 8.08% Ruby 0.01%
python python-3 compression dedupe ssh deduplication backup borgbackup encryption cython

borg's People

Contributors

abogical avatar anarcat avatar ape avatar bigtedde avatar bket avatar edgewood avatar elho avatar enkore avatar fantasya-pbem avatar finefoot avatar gu1nness avatar hansmi avatar hexagonrecursion avatar jborg avatar jdchristensen avatar jrast avatar m3nu avatar mh4ckt3mh4ckt1c4s avatar milkey-mouse avatar motwok avatar perguth avatar plasmapower avatar rayyanansari avatar ronnypfannschmidt avatar rugk avatar sanskritfritz avatar sourcejedi avatar textshell avatar thomaswaldmann avatar ypid avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

borg's Issues

Thoughts on "delete" for files/directories?

What are the thoughts on allowing the delete of specific files or entire directories WITHIN a repository or archive?

Such as...delete all the .tmp files in aprilBackup:
"borg delete myRepository::aprilBackup -p "
.tmp"
(-p = pattern or delete pattern file or such; don't know what exact notation would be)

Remove the tmp files in entire repository:
"borg delete myRepository -p "*.tmp"
etc...etc...etc....

Yes, I'm aware that you should ideally exclude putting files into the repository in the first place, but that is definitely not always possible or known. There are many instances where you might later realize significant files that have been backed up, are not wanted, and (most importantly) are HIGHLY non-dedupable. You want to pull those out with a pattern and shrink your repository without removing entire archives.

(I've run across this in practice by backing up multiple FirefoxPortable instances. Obviously you exclude the CACHE, but what wasn't known immediately is that there are numerous other files that are essentially cache or temp in nature while not so names. These are also hard to dedup and large in size. Now that I know what they are I'd love to wipe them out of the repository, but can't easily.)

This then brings me to a second feature which would be diagnostic in nature and actually allow the finding, programatically, of these 'dedupe hotspots' in multiple backups. I've never seen such a thing in practice and will post it as "issue" immediately following this one for separate discussion. (But the finding of such hotspots is useless if a user can't "delete" individual data in a repository or backup.)

drop python 3.2 and 3.3 compatibility?

being compatible with older python versions puts some burden on development:

  • tests have to be run on them also (slower test runs, we currently test for 4 python versions)
  • some stuff needs special treatment in the code (because it is broken, different or not present)
  • we can not use latest releases of some libraries and tools, because they already dropped 3.2 support

python 3.3: https://docs.python.org/3/whatsnew/3.3.html

  • new: lzma compression
  • new: unicode literals with u"" (this could be interesting if somebody would like to port to 2.7)
  • new: getting the terminal dimensions
  • new: print(..., flush=True), see sys.stdout.flush() in the code
  • new: xattr stdlib code, see #142
  • new: mock is in stdlib, see #145
  • new: os.replace, .posix_fadvise, .posix_fallocate, .sync, SEEK_HOLE/SEEK_DATA
  • new: shutil.disk_usage, stat.filemode, time.process_time
  • new: PyMemoryView_FromMemory
  • improved: nanosecond precision for stat/utime
  • venv comes with it

python 3.4: https://docs.python.org/3/whatsnew/3.4.html

  • new: Enum, pathlib, selectors, hashlib.pbkdf2_hmac
  • improved: ssl, bytes, bytearray, memoryview, hmac, glob, argparse, os, stat, threading
  • pip comes with it

Apple Keychain Integration?

Leaving a password in an environment variable or permission-only secured file just makes me scream inside, so would you guys be ok with a pull request to add apple keychain integration to store file keys or PBDKF2 keys?

Traceback when borg is not installed on remote machine

I tried to init a borg repo via ssh on a remote machine. I forgot to install borg there; the command failed with a traceback. An error message would be prettier:

zsh:1: command not found: borg
Traceback (most recent call last):
  File "/usr/lib/python3.4/site-packages/borg/archiver.py", line 857, in main
    exit_code = archiver.run(sys.argv[1:])
  File "/usr/lib/python3.4/site-packages/borg/archiver.py", line 813, in run
    return args.func(args)
  File "/usr/lib/python3.4/site-packages/borg/archiver.py", line 62, in do_init
    repository = self.open_repository(args.repository, create=True, exclusive=True)
  File "/usr/lib/python3.4/site-packages/borg/archiver.py", line 35, in open_repository
    repository = RemoteRepository(location, create=create)
  File "/usr/lib/python3.4/site-packages/borg/remote.py", line 144, in __init__
    version = self.call('negotiate', 1)
  File "/usr/lib/python3.4/site-packages/borg/remote.py", line 153, in call
    for resp in self.call_many(cmd, [args], **kw):
  File "/usr/lib/python3.4/site-packages/borg/remote.py", line 203, in call_many
    raise ConnectionClosed()
borg.remote.ConnectionClosed
borg: Error: Connection closed by remote host

borg uses the attic libraries

here's a funky one. because the attic directory hasn't been renamed, the egg file created by setup.py has the same module name as the borg file.

this means that when borg starts, it actually loads the attic code:

$ sudo ./setup.py install
$ borg
usage: borg [-h]
            {serve,init,check,change-passphrase,create,extract,delete,list,mount,info,prune,help}
            ...

Attic 0.13_23_ge9c27e8-py3.4-linux-x86_64.egg - Deduplicated Backups

there's still a lot of attic left in there, which makes it impossible to install borg and attic next to each other without messing around with virtualenv.

API documentation?

attic/borg has very little inline comments, but it would still be very useful to expose the API structure in the official documentation.

in fact, I wonder if it would be useful to spend some time crawling through that source code explaining each file and function so that we could get a better idea of what all the parts are doing...

then the sphinx doc engine could automatically generate nice API graphs that would be walkable.

the way i did that in one of my Python modules is like this:

Monkeysign API documentation
============================

GnuPG API
---------

.. automodule:: monkeysign.gpg
   :members:
   :undoc-members:

CLI Interface
-------------

.. automodule:: monkeysign.cli
   :members:
   :undoc-members:

GTK Interface
-------------

.. may fail if the GTK module isn't available, oh well.
.. automodule:: monkeysign.gtkui
   :members:
   :undoc-members:

it's mostly useful to document APIs, but I think it could be useful for borg as well.

opinions?

compatibility policy / support timeframe

one of the main contentious points of #1 is whether borg should be backwards-compatible with attic,
or how/if backwards-compatibility should be broken within borg itself.

so to clarify this, i wish to open a discussion specifically about this topic. this is basically a continuation of jborg/attic#215 and #1.

the original proposal from @ThomasWaldmann was:

  • Don't break it accidentally / without good reason / without warning.
  • Break it if above does not apply. needs more thoughts/discussion
  • As the fork is "new software" from the perspective of a Borg user or a Borg packaging distribution, there is no past we need to stay compatible with - we have the chance to break compatibility and change everything that we think needs changing.
  • Over time, we'll have more users and incompatible changes get harder.
  • Avoid getting into the "compatible forever" trap - we should maybe not assure compatibility of development versions nor spanning major releases.
  • When used for long-term archiving, special considerations and care are required. E.g. a development snapshot of Borg might be not the right thing for this. Also, Borg exists to be able to change things. So if you don't like or can't live with a changing software, don't use it.

I #25, i make an entry in the documentation (the FAQ) that summarizes the points from #1 as:

borg intends to be:

  • simple:
    • as simple as possible, but no simpler
    • do the right thing by default, but offer options
  • open:
    • welcome feature requests
    • accept pull requests of good quality and coding style
    • give feedback on PRs that can't be accepted "as is"
    • discuss openly, don't work in the dark
  • changing:
    • do not break compatibility accidentally, without a good reason
      or without warning
    • borg is not backwards-compatible with attic
    • major versions may not be compatible with older releases

About the last point: i would like to put forward a proposal that will make borg backups compatible from major version X to X+1.

That is, we limit on-disk changes between major releases: those changes should live in a feature branch for a while, then be merged in a development branch, which eventually becomes the X+1 version. The X+1 version can read (and if necessary, convert) backups made with the X version. Then everyone upgrades to the new version and the X+2 version can drop compatibility shims.

So in other words, version X+1 can read and convert backups from version X, but not write them. X cannot read or write backups made with version > X. X+2 cannot read, write or convert backups from version X.

I would personally prefer that the format would be always future proof and you'd be able to restore really old backups without problems. It can be pretty difficult to extract older software on newer platform ("oooh, this was written for Python 2.1, how cute!"), so I would strongly advocate towards keeping backwards compatibility forever. However, I know how hard this can be, so I am ready to concede this can be broken at times. This should be considered an extreme case, and only used when really necessary and we should bundle multiple changes into one to avoid doing that too often.

I would therefore also suggest using semantic versionning for the version numbers, that is version numbers would be X.Y.Z where X is the major number described above, and Y.Z are the regular release numbers used most of the time.

In that way, borg would be attic 2.0.0-alpha.1 (and we simply skipped borg 1.0). note that this would give us the freedom to break compatibility until the golden "2.0.0" release while we put out alphas.

Borg readability and coding style

I am not really fond of the readability of attic's code. Variable names like "t0" and "st" are nice to write but don't make it very readable and brake PEP 0008 as they clearly aren't words:

lowercase with words separated by underscores as necessary to improve readability.

However, changing variable names should be discussed as it brakes code compatibility with attic, see #1

borg stalls forever after interrupted first backup

hi, i made a remote (encrypted) backup, which got interrupted by a jammed uplink (borg stalled even after the uplink was ok again, and i needed to cancel the backup. should i open a seperate issue that borg doesn't recover from a temporar jammed uplink ?)
Now, when i do borg list -v ssh://xxx@SERVER/var/backup/borg, i seem to stall forever without output.
What can i do to provide more details, afaik, there's no --debug flag to borg ?

Have pretty error msgs without losing debug capability

When I try to push to an existing archive an exception is raised. A nice error message would be better.

Traceback (most recent call last):
  File "/usr/lib/python3.4/site-packages/borg/archiver.py", line 857, in main
    exit_code = archiver.run(sys.argv[1:])
  File "/usr/lib/python3.4/site-packages/borg/archiver.py", line 813, in run
    return args.func(args)
  File "/usr/lib/python3.4/site-packages/borg/archiver.py", line 107, in do_create
    numeric_owner=args.numeric_owner, progress=args.progress)
  File "/usr/lib/python3.4/site-packages/borg/archive.py", line 148, in __init__
    raise self.AlreadyExists(name)
borg.archive.AlreadyExists: laptop-2015-06-17
borg: Error: Archive laptop-2015-06-17 already exists

๐Ÿ’ฐ there is a bounty for this

directory content is only implicitly known

when storing a directory, only the directory name and some metadata is stored into the repo, but not the names it contains.

each file is stored with full path, thus using that, they are restored into the correct directory.

but, at extraction time, we can not process a list of directory contents and KNOW when we are finished with it. we just encounter the directory member names while extracting files, but we never know when we are finished with a directory.

if operating strictly sequentially, we can deduct when we are finished with a directory because we know the traversal mechanism that created the archive. but if that strict sequential processing is not available (e.g. due to multithreaded processing in future), we can not set the directory mtime until the very end of the extract operation, because there could be always another directory member coming...

improve efficiency

Borg currently does not use the given resources (CPU and I/O) to their full potential.

Even if no or fast compression (and no or fast encryption) would be used, not even 1 cpu core is fully loaded and at the same time there is I/O bandwidth left (e.g. source and destination on local SSDs).

For an incremental backup, this is no deal breaker (as long as the changes are relatively small), but for the first backup or if the changes are bigger, this is an issue.

IVs (nonces) generation

Current borg code uses AES in CTR mode, which requires an IV that never repeats.

That can be either a COUNTER (thus the "CTR" in the mode name) or a RANDOM value.

COUNTER: this is the way the current code works: load a counter from repo manifest, encrypt stuff (and keep incrementing the counter), at the end: save the counter to repo manifest.
Pro:
a) no birthday paradox
b) no collisions / repetition - IF correctly managed

Problems:
a) if a backup crashes and the manifest is not written, the next backup will reuse same counter start value - and that is a encryption security issue.
b) if encryption code runs in parallel (e.g. threads), it needs management of the different counter start values for the different workers.

RANDOM: future way of doing things?

Pro:
a) no "management" of counter start values, no problem for parallel execution
b) no storage of highest counter needed

Problems:
a) needs good randomness source (we also need that for key generation already)
b) source needs to keep up with demand

Needs many random bits to avoid collisions ("birthday problem"), NIST recommends a fixed value + lots of random bits as IV.

Related idea: create a new encryption key per chunk and always start CTR from 0.

Related idea: create a new encryption key per backup (if multithreaded: per crypter worker thread) and start ctr from 0 in each backup (in each worker thread).

https://www.bountysource.com/issues/16559386-ivs-nonces-generation

interesting hashes / macs / ciphers / checksums

https://github.com/Cyan4973/xxHash - not a cryptographic hash fn, not for HMAC! So, maybe we could use it as a crc32 replacement (if we keep the crc32(header+all_data) approach). borg uses xxh64 at some places

siphash - cryptographic hash fn (internally used by python >= 3.4), but: only 64bits return value. a 128bit version is "experimental".

libsodium has some hashes / macs also. but not yet widespread on linux dists.

last but not least: sha512-256 is faster on 64bit CPUs than sha256.

advanced sparse file support

See there for the basics: jborg/attic#256

The current state in borg is that it has simple sparse file support (meaning that it does nothing special on "create", but offers the option to deal with all-zero chunks in 2 ways at "extract" time: a) write zeros to disk (default) b) just "seek" in the output file, creating a hole in a sparse file (--sparse).

See the original attic ticket: while this always works correctly from a logical file content perspective, it is not extracting the data and hole sections of files exactly as they were when the archive was made.

Precise reproduction would need saving the type (data vs hole), length and in case of data, the binary data for each section in the file. SEEK_DATA and SEEK_HOLE support this.

Attic (and current Borg) just has a simple stream of binary file contents and as there is no type/length yet, it can't be added in a compatible way on the file contents level.

Update: I put a bounty on this. It is for implementing sparse file support at archive creation time (seeking over holes [not reading them as zeros], storing which parts of a file are holes / contain data) and reproducing sparse files hole/data layout precisely at archive extraction time.


๐Ÿ’ฐ there is a bounty for this

try / document borg + ntfsclone

could be useful to clone windows systems (with ntfs filesystems)?
due to ntfsclone, would only save allocated blocks.

try it and document your results.

decompress, dedup, store, load, reassemble, recompress

JS had a crazy (and maybe not easy to implement) idea I just wanted to keep here:

the backup tool could recognize some popular compression formats and decompress them before running the data through the deduplication and storing them into repository. this would vastly increase chances for deduplication. e.g. .tar.gz / bz2 / xz if most files in the tar are the same or similar to stuff we already processed.

at restore time, it would have to reassemble / recompress the original file so that we get back a file that is identical to the original file.

cache rebuilds if same repo is used by multiple machines

if one backups multiple machines to same repo (which is what one tends to do, if the machines share a lot of similar files, like e.g. same operating system), a backup on one machine invalidates the cache on the other and triggers a cache rebuild that analyzes all the archives in the repo - which is quite time consuming.

find out if there is a more efficient way to deal with this.

"Borg help patterns" missing or hard to find

The documentation usage page:
https://borgbackup.github.io/borgbackup/usage.html
has multiple instances of the following sentence:
"See โ€œborg help patternsโ€ for more help on exclude patterns."

The phrase/term itself has no hyperlink (which would be the best solution) and searching the page or the left table of contents shows no result or way to get to this documentation. Maybe it is right in front of me and I'm missing it somehow?

use py.test for testing, refactor tests

pytest is way superior (and also more comfortable) compared to unittest.

less work to write tests, prettier / easier readable tests, more useful output (esp. on failure).

the existing tests can be taken "as is" as a starting point, pytest is able to run them.

later, more pytest specific features can be used to further improve.

be careful!

borg.selftest is special because it must not depend on pytest. it runs every time borg is invoked by a user.

see there: #6157 (comment)

These must not be converted to unittest:

from .testsuite.hashindex import HashIndexDataTestCase, HashIndexRefcountingTestCase, HashIndexTestCase
from .testsuite.crypto import CryptoTestCase
from .testsuite.chunker import ChunkerTestCase

SELFTEST_CASES = [
    HashIndexDataTestCase,
    HashIndexRefcountingTestCase,
    HashIndexTestCase,
    CryptoTestCase,
    ChunkerTestCase,
]

return codes

how shall we deal with return codes of borg?

currently it is:
0 == no error, normal termination
1 == some error

this could be way more informative. :)
0 == no error, normal termination
1 == some error that hasn't been assigned a better rc yet
2... == other separately identifiable conditions
128+N == killed by signal N

what I am asking myself:

are there standard return codes for some stuff or does everybody invent them individually?
update: i found sysexits.h - maybe better than nothing.

what rc do we use if some files could not get processed (not found, permissions problems, ...), but the rest was ok?


๐Ÿ’ฐ there is a bounty for this

use siphash?

cryptographic fast hashing algorithm internally used by python >= 3.4.

but: only 64bits return value. a 128bit version is "experimental".

platform / os testers wanted

anything except 64bit linux. enter your hardware and OS platform if you'ld like to regularly help with testing.

check py3 style

Some places use python2 syntax / style, refactor them:
super(...)

Use pylint / pyflake / pep8 checker...

What Microsoft did with 'hotspot' chunks

As Attic / Borg grow and look at new ideas I wanted to mention something Microsoft did with their take on deduplication:
https://technet.microsoft.com/en-us/library/hh831434.aspx

Essentially they made the unique step (from what I've seen) of deciding that any data chunk which is referenced over 100 times (by being found in multiple files or multiple versions of files) will be stored more than once in the filesystem/repository/whatever.

I've seen other backup programs (bup?) which, instead, will use PAR as an add-on command/utility to accomplish something similar...meaning that if you have a deduping backup you need to made sure (and DARN sure when it's a 'hotspot') that you don't lose any bits at anytime in history.

Common responses are "use a file system like XFS" to maintain integrity of the repository, etc. Choice of OS or device or system is often restricted or not open as a solution. I also think microsoft often manages (polar criticisms aside) to at times take a new look at something and find an improvement or new angle.

So...It might be interesting to consider taking 'hotspot' chunks whose loss would clearly destroy a large amount of backed up data and keep multiple records of them. "attic check" would check these. Perhaps a check would also happen during extraction.

different magics, accept/convert attic repos

Just wanted to write something about this change, that makes current borg code incompatible to attic repos (although currently the code likely would still work on them).

Magic strings of the repo files are BORG* now (not ATTIC*) and also the local cache path is .cache/borg/... now (not .cache/attic/...)

See there: 159315e

The reason for this is to avoid interference. At any time, there might be some little change in either attic or borg that makes them incompatible in one or both directions. If both would use the same magic and you would use borg to access attic repos or vice versa (maybe even accidentally), it could just blow up or even permanently damage the repo files. In the worst case, you could not access the repo with either tool any more.


๐Ÿ’ฐ there is a bounty for this

PBKDF2 iterations need to be far higher

I first ran into this issue investigating arq's crypto here: arqbackup/arq_restore#7

Currently we have iterations = 100000 (see key.py) , which is a pretty low number. According to my tests with a calibration function, that number should be x7.8 to x78 times higher.

I would suggest immediately raising the number of iterations to be around 1'000'000 times initially and giving an option to make the number of iterations to be 100'000'000, and then storing that generated key somewhere so people don't have to wait 7s for every borg command to execute.

I made a bit of test code to figure out what the number of iterations should be on a macbook pro 13" (2.6 GHz Intel Core i5) :

#import <CommonCrypto/CommonCrypto.h>

- (void) testSuite
{
    [self testForMsec:500];
    [self testForMsec:1000];
    [self testForMsec:2000];
    [self testForMsec:5000];
}

- (void) testForMsec:(uint32_t)msec
{
    NSString* strongPassword = @"I make aodso f  sadfoijo###ijfoj oiwej foawejf oiawe 28";
    NSString* password = @"weakPassword393";
    NSLog(@"msec: %d",msec);
    NSLog(@"strong:\t%d",[self roundsForPassword:strongPassword forMilliseconds:msec]);
    NSLog(@"weak:\t\t%d",[self roundsForPassword:password forMilliseconds:msec]);
}

- (uint) roundsForPassword:(NSString*)password forMilliseconds:(uint32_t)msec
{
    NSString* saltStr = @"01bd79c7219926ecad1216a224ee0fe77c82a3ea4addb7a18ad12009166d0e1e"; //a repo id
    NSData* salt = [saltStr dataUsingEncoding:NSUnicodeStringEncoding];
    uint rounds = CCCalibratePBKDF(kCCPBKDF2,
                                   [password length],
                                   [salt length],
                                   kCCPRFHmacAlgSHA256,
                                   CC_SHA256_DIGEST_LENGTH,
                                   msec);
    return rounds;
}

I got these results:

 msec: 500
 strong:    781250
 weak:      769230
 msec: 1000
 strong:    1587301
 weak:      1515151
 msec: 2000
 strong:    3030303
 weak:      3174603
 msec: 5000
 strong:    7812500
 weak:      7812500

streams, resource forks, other unusual file contents

NTFS supports "streams" - multipe content streams associated with one file(name).

Mac OS X supports "resource forks", a similar mechanism.

If some windows / OS X users could comment on the importance of these mechanisms and of their support in a backup software, it would be very useful.

int32 chunk reference counter, is it a problem?

I wonder whether that counter could overflow.

It seems like Cython would raise an OverflowError when converting a too-long python integer to a 32 bit (C) int, so at least we would notice.

The smallest chunk attic can create seems to be 1kiB. So it seems the worst (and rather unlikely) case is an overflow if you have 2 TiB data made from 2^31 repetitions of that 1kiB chunk.

Is there any frequent chunk so this could be a problem in practice for multi-TB backups?

"Analyze" function to find (and remove) missed non-dedupable temp/cache hotspots

My previous issue post was to ask if a 'delete' command modification is possible to remove individual files or directories from within one or more archives (or entire repository). The feature discussed below is a method of finding non-dedupable 'hotspots' in backups (which would typically be missed/hidden cache or temp files) then deleting them to reclaim space.

I suggest consideration of a command such as "analyze" working on a repository level (or multiple archives..the more the better). This command would look for two things:

  1. Files (of fixed name and directory location) which, over multiple backups, have an extremely high non-dedupable ratio of data vs their size.

  2. Directories (of fixed name and location) which, over multiple backups, have a very high ratio of non-dedupable data vs their size.

You can see that such a scan/analyze will immediately reveal accidentally missed swap files, temp files, and temp directories. An administrator can use this command to search for (and upon further analysis) find and delete this data.

In the first case (1) if the file name and location stay the same between archives and yet the file keeps changing so every backup it has a massively high amount of new data chunks then almost certainly you've found some sort of temp file whose deletion from the backup will reclaim a large amount of space. For example, on backups of windows machines this test case would find "pagefile.sys" as being a huge redflag (windows swap file). Obviously note it isn't in a 'cache' directory and doesn't have a .TMP extension...yet this file is not necessary to backup and it's exclusion (or deletion post-backup with 'delete' command) would allow massive size savings.

Case (2) is where you have temp files such that the names of the files keep changing randomly (so case (1) won't work) but the location doesn't change. This would find hotspots like "c:\window\temp"...again something that could be deleted and reclaimed from a backup database. (In this case the exclusion is clearly labeled 'temp' but this was just the first example I could think of. There are multiple instances on computers of temp directories using random file names which don't immediately become noticed by looking at their name.)

The analyze command specific parameters would need some testing to determine what to display and how to calc/display it. And any results would require further manual inspection before going off and deleting things obviously. But such a feature would do a good job of highlighting missed hotspots in large or complex backups.

Thoughts?

Dealing with attic issues

Here is a list of all open issues in attic, acquired with this dirty python script . I suggest we go through all of them and tick them when fixed.

Done / tracked here / invalid / out-of-scope:

make compression a little more flexible

borg currently has hardcoded zlib level 6 compression (same as attic).

this is a throughput bottleneck if your I/O is rather fast (and you do not use encryption or your encryption is also fast, e.g. due to AES-NI or fast cpu).

i already made compression very flexible (offerings different and better algorithms, different compression levels, very high speed for some), but it requires bigger code and format changes and thus lives in experimental branch currently.

but there could be a rather simple change for 0.x.x releases, that doesn't change any storage format:
just making it possible to adjust the zlib compression level (but ALWAYS use zlib, even for 0 compression).
one could even read these repos with older borg versions as uncompress works on any zlib compressed data, not matter what the compression level was.

compression of a 100MB text file
================================

algo  time ratio compress/uncompressed
--------------------------------------
zlib0  0.64 1.000
zlib1  2.18 0.505
zlib2  2.41 0.498
zlib3  3.02 0.491
zlib4  3.52 0.470
zlib5  4.79 0.457
zlib6  6.81 0.456 (borg / attic hardcoded default)
zlib7  8.23 0.456
zlib8 23.49 0.456
zlib9 30.91 0.456

I think especially level 0 and 1 would be interesting if one wants more speed (and either does not care for backup size [-> level 0] or can also live with slightly less compression than level 6 [-> level 1]).

I'ld just add a compression level argument to borg create, so one can adjust the compression level on a per-backup basis.

installation by pip3 fails due to missing header

Debian 8.1, running on armv7. It seems that GCC can't find pyconfig.h. Let me know what other information you need.

By the way, I am not well-versed in python package management. I could be making a very basic error.

$ pip3 install borgbackup 
Downloading/unpacking borgbackup
  Getting page https://pypi.python.org/simple/borgbackup/
  URLs to search for versions for borgbackup:
  * https://pypi.python.org/simple/borgbackup/
  Analyzing links from page https://pypi.python.org/simple/borgbackup/
    Found link https://pypi.python.org/packages/source/b/borgbackup/borgbackup-0.23.0.tar.gz#md5=77fba21ce1d2bdedd7945f2f58ee1b4f (from https://pypi.python.org/simple/borgbackup/), version: 0.23.0
  Downloading from URL https://pypi.python.org/packages/source/b/borgbackup/borgbackup-0.23.0.tar.gz#md5=77fba21ce1d2bdedd7945f2f58ee1b4f (from https://pypi.python.org/simple/borgbackup/)
  Running setup.py (path:/tmp/pip-build-1n22fxxk/borgbackup/setup.py) egg_info for package borgbackup
    running egg_info
    creating pip-egg-info/borgbackup.egg-info
    writing requirements to pip-egg-info/borgbackup.egg-info/requires.txt
    writing pip-egg-info/borgbackup.egg-info/PKG-INFO
    writing dependency_links to pip-egg-info/borgbackup.egg-info/dependency_links.txt
    writing top-level names to pip-egg-info/borgbackup.egg-info/top_level.txt
    writing manifest file 'pip-egg-info/borgbackup.egg-info/SOURCES.txt'
    warning: manifest_maker: standard file '-c' not found

    reading manifest file 'pip-egg-info/borgbackup.egg-info/SOURCES.txt'
    reading manifest template 'MANIFEST.in'
    warning: no previously-included files matching '*.pyc' found under directory 'docs'
    warning: no previously-included files matching '*.pyo' found under directory 'docs'
    no previously-included directories found matching 'docs/_build'
    writing manifest file 'pip-egg-info/borgbackup.egg-info/SOURCES.txt'
  Source in /tmp/pip-build-1n22fxxk/borgbackup has version 0.23.0, which satisfies requirement borgbackup
Downloading/unpacking msgpack-python>=0.4.6 (from borgbackup)
  Getting page https://pypi.python.org/simple/msgpack-python/
  URLs to search for versions for msgpack-python>=0.4.6 (from borgbackup):
  * https://pypi.python.org/simple/msgpack-python/
  Analyzing links from page https://pypi.python.org/simple/msgpack-python/
    Skipping link https://pypi.python.org/packages/2.7/m/msgpack-python/msgpack-python-0.3.0.win-amd64-py2.7.exe#md5=d482b1169ac997c1d34b817d3c85ccbd (from https://pypi.python.org/simple/msgpack-python/); unknown archive format: .exe
    Skipping link https://pypi.python.org/packages/2.7/m/msgpack-python/msgpack-python-0.3.0.win32-py2.7.exe#md5=25d5b77a7dcb6f4ee655a6f5d88313b6 (from https://pypi.python.org/simple/msgpack-python/); unknown archive format: .exe
    Skipping link https://pypi.python.org/packages/2.7/m/msgpack-python/msgpack_python-0.2.0-py2.7-win-amd64.egg#md5=a8d0d3ce2b02fdc0b053e835c14726c5 (from https://pypi.python.org/simple/msgpack-python/); unknown archive format: .egg
    Skipping link https://pypi.python.org/packages/2.7/m/msgpack-python/msgpack_python-0.2.0-py2.7-win32.egg#md5=d52bd856ca8c8d9a6ee86937e1b4c644 (from https://pypi.python.org/simple/msgpack-python/); unknown archive format: .egg
    Skipping link https://pypi.python.org/packages/2.7/m/msgpack-python/msgpack_python-0.3.0-py2.7-win-amd64.egg#md5=b2015ad4316ddbd8688693542769413d (from https://pypi.python.org/simple/msgpack-python/); unknown archive format: .egg
    Skipping link https://pypi.python.org/packages/2.7/m/msgpack-python/msgpack_python-0.3.0-py2.7-win32.egg#md5=d9fee3a6bb8aec510dbec5a55fbc3d16 (from https://pypi.python.org/simple/msgpack-python/); unknown archive format: .egg
    Skipping link https://pypi.python.org/packages/3.2/m/msgpack-python/msgpack_python-0.2.0-py3.2-win-amd64.egg#md5=4ac6e179b3dfe9a919e98e72fbe62965 (from https://pypi.python.org/simple/msgpack-python/); unknown archive format: .egg
    Skipping link https://pypi.python.org/packages/3.2/m/msgpack-python/msgpack_python-0.2.0-py3.2-win32.egg#md5=985a37940d2bb87637f851e614f3b96e (from https://pypi.python.org/simple/msgpack-python/); unknown archive format: .egg
    Skipping link https://pypi.python.org/packages/3.3/m/msgpack-python/msgpack-python-0.3.0.win-amd64-py3.3.exe#md5=327350979a89422556a94902fa982885 (from https://pypi.python.org/simple/msgpack-python/); unknown archive format: .exe
    Skipping link https://pypi.python.org/packages/3.3/m/msgpack-python/msgpack-python-0.3.0.win32-py3.3.exe#md5=7df5d4b72621f5c5f5a40177f953fa94 (from https://pypi.python.org/simple/msgpack-python/); unknown archive format: .exe
    Skipping link https://pypi.python.org/packages/3.3/m/msgpack-python/msgpack_python-0.2.3-py3.3-win-amd64.egg#md5=ff6fbd6170874140f76d9ac077ec4fef (from https://pypi.python.org/simple/msgpack-python/); unknown archive format: .egg
    Skipping link https://pypi.python.org/packages/3.3/m/msgpack-python/msgpack_python-0.2.3-py3.3-win32.egg#md5=011a6114a3f377dd4d1869d3237cf935 (from https://pypi.python.org/simple/msgpack-python/); unknown archive format: .egg
    Skipping link https://pypi.python.org/packages/3.3/m/msgpack-python/msgpack_python-0.3.0-py3.3-win-amd64.egg#md5=9bcfca39afa8219d12048f97a5d2a2df (from https://pypi.python.org/simple/msgpack-python/); unknown archive format: .egg
    Skipping link https://pypi.python.org/packages/3.3/m/msgpack-python/msgpack_python-0.3.0-py3.3-win32.egg#md5=b7032156929c5f424e583ec97a11521b (from https://pypi.python.org/simple/msgpack-python/); unknown archive format: .egg
    Skipping https://pypi.python.org/packages/3.4/m/msgpack-python/msgpack_python-0.4.2-cp34-none-win32.whl#md5=dda44ef5bd9dc0458fc3805507526e2b (from https://pypi.python.org/simple/msgpack-python/) because it is not compatible with this Python
    Skipping https://pypi.python.org/packages/3.4/m/msgpack-python/msgpack_python-0.4.2-cp34-none-win_amd64.whl#md5=dcb0ee896c2f1ea5c09b1af8b3bb8901 (from https://pypi.python.org/simple/msgpack-python/) because it is not compatible with this Python
    Skipping https://pypi.python.org/packages/cp27/m/msgpack-python/msgpack_python-0.4.3-cp27-none-win32.whl#md5=7d91f8e5e40bd6823a7477ce24670363 (from https://pypi.python.org/simple/msgpack-python/) because it is not compatible with this Python
    Skipping https://pypi.python.org/packages/cp27/m/msgpack-python/msgpack_python-0.4.3-cp27-none-win_amd64.whl#md5=37ab6db5247b2f8730615cd59fa6cb33 (from https://pypi.python.org/simple/msgpack-python/) because it is not compatible with this Python
    Skipping https://pypi.python.org/packages/cp27/m/msgpack-python/msgpack_python-0.4.6-cp27-none-win32.whl#md5=f1ddd501644867049836225ecc4b1198 (from https://pypi.python.org/simple/msgpack-python/) because it is not compatible with this Python
    Skipping https://pypi.python.org/packages/cp27/m/msgpack-python/msgpack_python-0.4.6-cp27-none-win_amd64.whl#md5=73b9b22edd0a42637afb5963585b8d5e (from https://pypi.python.org/simple/msgpack-python/) because it is not compatible with this Python
    Skipping https://pypi.python.org/packages/cp34/m/msgpack-python/msgpack_python-0.4.3-cp34-none-win32.whl#md5=07c2c9bbe330b7515a8f0b4316c2d03e (from https://pypi.python.org/simple/msgpack-python/) because it is not compatible with this Python
    Skipping https://pypi.python.org/packages/cp34/m/msgpack-python/msgpack_python-0.4.3-cp34-none-win_amd64.whl#md5=cb936061a79dee07657bf2eaaa7179fc (from https://pypi.python.org/simple/msgpack-python/) because it is not compatible with this Python
    Skipping https://pypi.python.org/packages/cp34/m/msgpack-python/msgpack_python-0.4.6-cp34-none-win32.whl#md5=d95ea9552d5d7767bc1c017120abeee3 (from https://pypi.python.org/simple/msgpack-python/) because it is not compatible with this Python
    Skipping https://pypi.python.org/packages/cp34/m/msgpack-python/msgpack_python-0.4.6-cp34-none-win_amd64.whl#md5=0947fb08fc6d2116220e7e93424464b7 (from https://pypi.python.org/simple/msgpack-python/) because it is not compatible with this Python
    Found link https://pypi.python.org/packages/source/m/msgpack-python/msgpack-python-0.1.10.tar.gz#md5=a31f16d20ea8ec79cc8cba1103f951d8 (from https://pypi.python.org/simple/msgpack-python/), version: 0.1.10
    Found link https://pypi.python.org/packages/source/m/msgpack-python/msgpack-python-0.1.11.tar.gz#md5=900cfc1b085c0a4058bc67fae617415a (from https://pypi.python.org/simple/msgpack-python/), version: 0.1.11
    Found link https://pypi.python.org/packages/source/m/msgpack-python/msgpack-python-0.1.12.tar.gz#md5=121a203e961b566f2039f527f3556a5d (from https://pypi.python.org/simple/msgpack-python/), version: 0.1.12
    Found link https://pypi.python.org/packages/source/m/msgpack-python/msgpack-python-0.1.13.tar.gz#md5=a6781bf2b670963c0ff1316976e66b14 (from https://pypi.python.org/simple/msgpack-python/), version: 0.1.13
    Found link https://pypi.python.org/packages/source/m/msgpack-python/msgpack-python-0.1.3.tar.gz#md5=d9b486424706ba422ac1199898829263 (from https://pypi.python.org/simple/msgpack-python/), version: 0.1.3
    Found link https://pypi.python.org/packages/source/m/msgpack-python/msgpack-python-0.1.4.tar.gz#md5=f50410aca1ef48cd2181f9885feaf3d2 (from https://pypi.python.org/simple/msgpack-python/), version: 0.1.4
    Found link https://pypi.python.org/packages/source/m/msgpack-python/msgpack-python-0.1.6.tar.gz#md5=d4c9e0c6d03542659e8980cc011cb32e (from https://pypi.python.org/simple/msgpack-python/), version: 0.1.6
    Found link https://pypi.python.org/packages/source/m/msgpack-python/msgpack-python-0.1.7.tar.gz#md5=95750dae8f4ee2a365fd548fd5308908 (from https://pypi.python.org/simple/msgpack-python/), version: 0.1.7
    Found link https://pypi.python.org/packages/source/m/msgpack-python/msgpack-python-0.1.8.tar.gz#md5=bd1e2d8a755b38a808a6c8edbd6c32ba (from https://pypi.python.org/simple/msgpack-python/), version: 0.1.8
    Found link https://pypi.python.org/packages/source/m/msgpack-python/msgpack-python-0.1.9.tar.gz#md5=fea360812fd4bd485c07b03239f1ddd0 (from https://pypi.python.org/simple/msgpack-python/), version: 0.1.9
    Found link https://pypi.python.org/packages/source/m/msgpack-python/msgpack-python-0.2.0.tar.gz#md5=cdac1d250cf9c0f0bd36abdfe2c96f8b (from https://pypi.python.org/simple/msgpack-python/), version: 0.2.0
    Found link https://pypi.python.org/packages/source/m/msgpack-python/msgpack-python-0.2.1.tar.gz#md5=dbaf026487da6c5302c51a715e36a4e0 (from https://pypi.python.org/simple/msgpack-python/), version: 0.2.1
    Found link https://pypi.python.org/packages/source/m/msgpack-python/msgpack-python-0.2.2.tar.gz#md5=5a74289d5c57ec52b54bed440453e5e9 (from https://pypi.python.org/simple/msgpack-python/), version: 0.2.2
    Found link https://pypi.python.org/packages/source/m/msgpack-python/msgpack-python-0.2.3.tar.gz#md5=5d6b1c6b2f3dc7dc514f14a67ad75cec (from https://pypi.python.org/simple/msgpack-python/), version: 0.2.3
    Found link https://pypi.python.org/packages/source/m/msgpack-python/msgpack-python-0.2.4.tar.gz#md5=c4bb313cd35b57319f588491b1614289 (from https://pypi.python.org/simple/msgpack-python/), version: 0.2.4
    Found link https://pypi.python.org/packages/source/m/msgpack-python/msgpack-python-0.3.0.tar.gz#md5=10dec96c90992b0f6e38bdf0cc5a8e79 (from https://pypi.python.org/simple/msgpack-python/), version: 0.3.0
    Found link https://pypi.python.org/packages/source/m/msgpack-python/msgpack-python-0.4.0.tar.gz#md5=8b9ce43619fd1428bf7baddf57e38d1a (from https://pypi.python.org/simple/msgpack-python/), version: 0.4.0
    Found link https://pypi.python.org/packages/source/m/msgpack-python/msgpack-python-0.4.1.tar.gz#md5=3ff478e75e783f4e69c1a8d5ca63dea4 (from https://pypi.python.org/simple/msgpack-python/), version: 0.4.1
    Found link https://pypi.python.org/packages/source/m/msgpack-python/msgpack-python-0.4.2.tar.gz#md5=e3a0fdfd864c72c958bb501d39b39caf (from https://pypi.python.org/simple/msgpack-python/), version: 0.4.2
    Found link https://pypi.python.org/packages/source/m/msgpack-python/msgpack-python-0.4.3.tar.gz#md5=f3cc76a0653bffa19bf2b359783ad8a9 (from https://pypi.python.org/simple/msgpack-python/), version: 0.4.3
    Found link https://pypi.python.org/packages/source/m/msgpack-python/msgpack-python-0.4.4.tar.gz#md5=86187bcd95b01753a5975424fe42ca81 (from https://pypi.python.org/simple/msgpack-python/), version: 0.4.4
    Found link https://pypi.python.org/packages/source/m/msgpack-python/msgpack-python-0.4.5.tar.gz#md5=3b82bc542d5599896695512e7c32f42d (from https://pypi.python.org/simple/msgpack-python/), version: 0.4.5
    Found link https://pypi.python.org/packages/source/m/msgpack-python/msgpack-python-0.4.6.tar.gz#md5=8b317669314cf1bc881716cccdaccb30 (from https://pypi.python.org/simple/msgpack-python/), version: 0.4.6
  Ignoring link https://pypi.python.org/packages/source/m/msgpack-python/msgpack-python-0.1.10.tar.gz#md5=a31f16d20ea8ec79cc8cba1103f951d8 (from https://pypi.python.org/simple/msgpack-python/), version 0.1.10 doesn't match >=0.4.6
  Ignoring link https://pypi.python.org/packages/source/m/msgpack-python/msgpack-python-0.1.11.tar.gz#md5=900cfc1b085c0a4058bc67fae617415a (from https://pypi.python.org/simple/msgpack-python/), version 0.1.11 doesn't match >=0.4.6
  Ignoring link https://pypi.python.org/packages/source/m/msgpack-python/msgpack-python-0.1.12.tar.gz#md5=121a203e961b566f2039f527f3556a5d (from https://pypi.python.org/simple/msgpack-python/), version 0.1.12 doesn't match >=0.4.6
  Ignoring link https://pypi.python.org/packages/source/m/msgpack-python/msgpack-python-0.1.13.tar.gz#md5=a6781bf2b670963c0ff1316976e66b14 (from https://pypi.python.org/simple/msgpack-python/), version 0.1.13 doesn't match >=0.4.6
  Ignoring link https://pypi.python.org/packages/source/m/msgpack-python/msgpack-python-0.1.3.tar.gz#md5=d9b486424706ba422ac1199898829263 (from https://pypi.python.org/simple/msgpack-python/), version 0.1.3 doesn't match >=0.4.6
  Ignoring link https://pypi.python.org/packages/source/m/msgpack-python/msgpack-python-0.1.4.tar.gz#md5=f50410aca1ef48cd2181f9885feaf3d2 (from https://pypi.python.org/simple/msgpack-python/), version 0.1.4 doesn't match >=0.4.6
  Ignoring link https://pypi.python.org/packages/source/m/msgpack-python/msgpack-python-0.1.6.tar.gz#md5=d4c9e0c6d03542659e8980cc011cb32e (from https://pypi.python.org/simple/msgpack-python/), version 0.1.6 doesn't match >=0.4.6
  Ignoring link https://pypi.python.org/packages/source/m/msgpack-python/msgpack-python-0.1.7.tar.gz#md5=95750dae8f4ee2a365fd548fd5308908 (from https://pypi.python.org/simple/msgpack-python/), version 0.1.7 doesn't match >=0.4.6
  Ignoring link https://pypi.python.org/packages/source/m/msgpack-python/msgpack-python-0.1.8.tar.gz#md5=bd1e2d8a755b38a808a6c8edbd6c32ba (from https://pypi.python.org/simple/msgpack-python/), version 0.1.8 doesn't match >=0.4.6
  Ignoring link https://pypi.python.org/packages/source/m/msgpack-python/msgpack-python-0.1.9.tar.gz#md5=fea360812fd4bd485c07b03239f1ddd0 (from https://pypi.python.org/simple/msgpack-python/), version 0.1.9 doesn't match >=0.4.6
  Ignoring link https://pypi.python.org/packages/source/m/msgpack-python/msgpack-python-0.2.0.tar.gz#md5=cdac1d250cf9c0f0bd36abdfe2c96f8b (from https://pypi.python.org/simple/msgpack-python/), version 0.2.0 doesn't match >=0.4.6
  Ignoring link https://pypi.python.org/packages/source/m/msgpack-python/msgpack-python-0.2.1.tar.gz#md5=dbaf026487da6c5302c51a715e36a4e0 (from https://pypi.python.org/simple/msgpack-python/), version 0.2.1 doesn't match >=0.4.6
  Ignoring link https://pypi.python.org/packages/source/m/msgpack-python/msgpack-python-0.2.2.tar.gz#md5=5a74289d5c57ec52b54bed440453e5e9 (from https://pypi.python.org/simple/msgpack-python/), version 0.2.2 doesn't match >=0.4.6
  Ignoring link https://pypi.python.org/packages/source/m/msgpack-python/msgpack-python-0.2.3.tar.gz#md5=5d6b1c6b2f3dc7dc514f14a67ad75cec (from https://pypi.python.org/simple/msgpack-python/), version 0.2.3 doesn't match >=0.4.6
  Ignoring link https://pypi.python.org/packages/source/m/msgpack-python/msgpack-python-0.2.4.tar.gz#md5=c4bb313cd35b57319f588491b1614289 (from https://pypi.python.org/simple/msgpack-python/), version 0.2.4 doesn't match >=0.4.6
  Ignoring link https://pypi.python.org/packages/source/m/msgpack-python/msgpack-python-0.3.0.tar.gz#md5=10dec96c90992b0f6e38bdf0cc5a8e79 (from https://pypi.python.org/simple/msgpack-python/), version 0.3.0 doesn't match >=0.4.6
  Ignoring link https://pypi.python.org/packages/source/m/msgpack-python/msgpack-python-0.4.0.tar.gz#md5=8b9ce43619fd1428bf7baddf57e38d1a (from https://pypi.python.org/simple/msgpack-python/), version 0.4.0 doesn't match >=0.4.6
  Ignoring link https://pypi.python.org/packages/source/m/msgpack-python/msgpack-python-0.4.1.tar.gz#md5=3ff478e75e783f4e69c1a8d5ca63dea4 (from https://pypi.python.org/simple/msgpack-python/), version 0.4.1 doesn't match >=0.4.6
  Ignoring link https://pypi.python.org/packages/source/m/msgpack-python/msgpack-python-0.4.2.tar.gz#md5=e3a0fdfd864c72c958bb501d39b39caf (from https://pypi.python.org/simple/msgpack-python/), version 0.4.2 doesn't match >=0.4.6
  Ignoring link https://pypi.python.org/packages/source/m/msgpack-python/msgpack-python-0.4.3.tar.gz#md5=f3cc76a0653bffa19bf2b359783ad8a9 (from https://pypi.python.org/simple/msgpack-python/), version 0.4.3 doesn't match >=0.4.6
  Ignoring link https://pypi.python.org/packages/source/m/msgpack-python/msgpack-python-0.4.4.tar.gz#md5=86187bcd95b01753a5975424fe42ca81 (from https://pypi.python.org/simple/msgpack-python/), version 0.4.4 doesn't match >=0.4.6
  Ignoring link https://pypi.python.org/packages/source/m/msgpack-python/msgpack-python-0.4.5.tar.gz#md5=3b82bc542d5599896695512e7c32f42d (from https://pypi.python.org/simple/msgpack-python/), version 0.4.5 doesn't match >=0.4.6
  Downloading from URL https://pypi.python.org/packages/source/m/msgpack-python/msgpack-python-0.4.6.tar.gz#md5=8b317669314cf1bc881716cccdaccb30 (from https://pypi.python.org/simple/msgpack-python/)
  Running setup.py (path:/tmp/pip-build-1n22fxxk/msgpack-python/setup.py) egg_info for package msgpack-python
    running egg_info
    creating pip-egg-info/msgpack_python.egg-info
    writing pip-egg-info/msgpack_python.egg-info/PKG-INFO
    writing dependency_links to pip-egg-info/msgpack_python.egg-info/dependency_links.txt
    writing top-level names to pip-egg-info/msgpack_python.egg-info/top_level.txt
    writing manifest file 'pip-egg-info/msgpack_python.egg-info/SOURCES.txt'
    warning: manifest_maker: standard file '-c' not found

    reading manifest file 'pip-egg-info/msgpack_python.egg-info/SOURCES.txt'
    writing manifest file 'pip-egg-info/msgpack_python.egg-info/SOURCES.txt'
  Source in /tmp/pip-build-1n22fxxk/msgpack-python has version 0.4.6, which satisfies requirement msgpack-python>=0.4.6 (from borgbackup)
Installing collected packages: borgbackup, msgpack-python
  Running setup.py install for borgbackup
    Running command /usr/bin/python3 -c "import setuptools, tokenize;__file__='/tmp/pip-build-1n22fxxk/borgbackup/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-xrel06ah-record/install-record.txt --single-version-externally-managed --compile
    running install
    running build
    got version from file /tmp/pip-build-1n22fxxk/borgbackup/borg/_version.py {'version': '0.23.0', 'full': 'c7da105fd0a75eec0454762df1dcc84510a2d813'}
    running build_py
    creating build
    creating build/lib.linux-armv7l-3.4
    creating build/lib.linux-armv7l-3.4/borg
    copying borg/xattr.py -> build/lib.linux-armv7l-3.4/borg
    copying borg/archive.py -> build/lib.linux-armv7l-3.4/borg
    copying borg/remote.py -> build/lib.linux-armv7l-3.4/borg
    copying borg/cache.py -> build/lib.linux-armv7l-3.4/borg
    copying borg/key.py -> build/lib.linux-armv7l-3.4/borg
    copying borg/fuse.py -> build/lib.linux-armv7l-3.4/borg
    copying borg/lrucache.py -> build/lib.linux-armv7l-3.4/borg
    copying borg/archiver.py -> build/lib.linux-armv7l-3.4/borg
    copying borg/__init__.py -> build/lib.linux-armv7l-3.4/borg
    copying borg/helpers.py -> build/lib.linux-armv7l-3.4/borg
    copying borg/repository.py -> build/lib.linux-armv7l-3.4/borg
    copying borg/platform.py -> build/lib.linux-armv7l-3.4/borg
    copying borg/_version.py -> build/lib.linux-armv7l-3.4/borg
    creating build/lib.linux-armv7l-3.4/borg/testsuite
    copying borg/testsuite/xattr.py -> build/lib.linux-armv7l-3.4/borg/testsuite
    copying borg/testsuite/archive.py -> build/lib.linux-armv7l-3.4/borg/testsuite
    copying borg/testsuite/key.py -> build/lib.linux-armv7l-3.4/borg/testsuite
    copying borg/testsuite/run.py -> build/lib.linux-armv7l-3.4/borg/testsuite
    copying borg/testsuite/lrucache.py -> build/lib.linux-armv7l-3.4/borg/testsuite
    copying borg/testsuite/crypto.py -> build/lib.linux-armv7l-3.4/borg/testsuite
    copying borg/testsuite/archiver.py -> build/lib.linux-armv7l-3.4/borg/testsuite
    copying borg/testsuite/__init__.py -> build/lib.linux-armv7l-3.4/borg/testsuite
    copying borg/testsuite/helpers.py -> build/lib.linux-armv7l-3.4/borg/testsuite
    copying borg/testsuite/repository.py -> build/lib.linux-armv7l-3.4/borg/testsuite
    copying borg/testsuite/platform.py -> build/lib.linux-armv7l-3.4/borg/testsuite
    copying borg/testsuite/chunker.py -> build/lib.linux-armv7l-3.4/borg/testsuite
    copying borg/testsuite/hashindex.py -> build/lib.linux-armv7l-3.4/borg/testsuite
    copying borg/testsuite/mock.py -> build/lib.linux-armv7l-3.4/borg/testsuite
    running build_ext
    building 'borg.crypto' extension
    creating build/temp.linux-armv7l-3.4
    creating build/temp.linux-armv7l-3.4/borg
    arm-linux-gnueabihf-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include -I/usr/include/python3.4m -c borg/crypto.c -o build/temp.linux-armv7l-3.4/borg/crypto.o
    borg/crypto.c:8:22: fatal error: pyconfig.h: No such file or directory
     #include "pyconfig.h"
                          ^
    compilation terminated.
    error: command 'arm-linux-gnueabihf-gcc' failed with exit status 1
    Complete output from command /usr/bin/python3 -c "import setuptools, tokenize;__file__='/tmp/pip-build-1n22fxxk/borgbackup/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-xrel06ah-record/install-record.txt --single-version-externally-managed --compile:
    running install

running build

got version from file /tmp/pip-build-1n22fxxk/borgbackup/borg/_version.py {'version': '0.23.0', 'full': 'c7da105fd0a75eec0454762df1dcc84510a2d813'}

running build_py

creating build

creating build/lib.linux-armv7l-3.4

creating build/lib.linux-armv7l-3.4/borg

copying borg/xattr.py -> build/lib.linux-armv7l-3.4/borg

copying borg/archive.py -> build/lib.linux-armv7l-3.4/borg

copying borg/remote.py -> build/lib.linux-armv7l-3.4/borg

copying borg/cache.py -> build/lib.linux-armv7l-3.4/borg

copying borg/key.py -> build/lib.linux-armv7l-3.4/borg

copying borg/fuse.py -> build/lib.linux-armv7l-3.4/borg

copying borg/lrucache.py -> build/lib.linux-armv7l-3.4/borg

copying borg/archiver.py -> build/lib.linux-armv7l-3.4/borg

copying borg/__init__.py -> build/lib.linux-armv7l-3.4/borg

copying borg/helpers.py -> build/lib.linux-armv7l-3.4/borg

copying borg/repository.py -> build/lib.linux-armv7l-3.4/borg

copying borg/platform.py -> build/lib.linux-armv7l-3.4/borg

copying borg/_version.py -> build/lib.linux-armv7l-3.4/borg

creating build/lib.linux-armv7l-3.4/borg/testsuite

copying borg/testsuite/xattr.py -> build/lib.linux-armv7l-3.4/borg/testsuite

copying borg/testsuite/archive.py -> build/lib.linux-armv7l-3.4/borg/testsuite

copying borg/testsuite/key.py -> build/lib.linux-armv7l-3.4/borg/testsuite

copying borg/testsuite/run.py -> build/lib.linux-armv7l-3.4/borg/testsuite

copying borg/testsuite/lrucache.py -> build/lib.linux-armv7l-3.4/borg/testsuite

copying borg/testsuite/crypto.py -> build/lib.linux-armv7l-3.4/borg/testsuite

copying borg/testsuite/archiver.py -> build/lib.linux-armv7l-3.4/borg/testsuite

copying borg/testsuite/__init__.py -> build/lib.linux-armv7l-3.4/borg/testsuite

copying borg/testsuite/helpers.py -> build/lib.linux-armv7l-3.4/borg/testsuite

copying borg/testsuite/repository.py -> build/lib.linux-armv7l-3.4/borg/testsuite

copying borg/testsuite/platform.py -> build/lib.linux-armv7l-3.4/borg/testsuite

copying borg/testsuite/chunker.py -> build/lib.linux-armv7l-3.4/borg/testsuite

copying borg/testsuite/hashindex.py -> build/lib.linux-armv7l-3.4/borg/testsuite

copying borg/testsuite/mock.py -> build/lib.linux-armv7l-3.4/borg/testsuite

running build_ext

building 'borg.crypto' extension

creating build/temp.linux-armv7l-3.4

creating build/temp.linux-armv7l-3.4/borg

arm-linux-gnueabihf-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include -I/usr/include/python3.4m -c borg/crypto.c -o build/temp.linux-armv7l-3.4/borg/crypto.o

borg/crypto.c:8:22: fatal error: pyconfig.h: No such file or directory

 #include "pyconfig.h"

                      ^

compilation terminated.

error: command 'arm-linux-gnueabihf-gcc' failed with exit status 1

----------------------------------------
Cleaning up...
Command /usr/bin/python3 -c "import setuptools, tokenize;__file__='/tmp/pip-build-1n22fxxk/borgbackup/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-xrel06ah-record/install-record.txt --single-version-externally-managed --compile failed with error code 1 in /tmp/pip-build-1n22fxxk/borgbackup
Exception information:
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/pip/basecommand.py", line 122, in main
    status = self.run(options, args)
  File "/usr/lib/python3/dist-packages/pip/commands/install.py", line 295, in run
    requirement_set.install(install_options, global_options, root=options.root_path)
  File "/usr/lib/python3/dist-packages/pip/req.py", line 1436, in install
    requirement.install(install_options, global_options, *args, **kwargs)
  File "/usr/lib/python3/dist-packages/pip/req.py", line 707, in install
    cwd=self.source_dir, filter_stdout=self._filter_install, show_stdout=False)
  File "/usr/lib/python3/dist-packages/pip/util.py", line 716, in call_subprocess
    % (command_desc, proc.returncode, cwd))
pip.exceptions.InstallationError: Command /usr/bin/python3 -c "import setuptools, tokenize;__file__='/tmp/pip-build-1n22fxxk/borgbackup/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-xrel06ah-record/install-record.txt --single-version-externally-managed --compile failed with error code 1 in /tmp/pip-build-1n22fxxk/borgbackup

Multithreading

I started some experimental multithreading code there:

https://github.com/thomaswaldmann/borg/tree/multithreading

especially:

ThomasWaldmann@240a27a

From Python c-api docs:
"the standard zlib and hashlib modules release the GIL when compressing or hashing data."
So current compression and hashing should be ok for good multithreading, as well as python's I/O (it releases GIL before I/O ops).

Additionally, later changesets implement code to release the GIL there:

  • chunker
  • crypto
  • lz4 compression

๐Ÿ’ฐ there is a bounty for this

Discuss Goals

Ideas about potential goals for Borg

Borg is a fork of Attic and it was done to allow some different approaches to development, goals and policy (for details about how and why the fork happened, see jborg/attic#217 and the attic mailing list):

Openness

  • Borg is intended to be an effort by "The Borg Collective"
    (see AUTHORS) and you can be assimilated into it, if you like.
  • Welcome feature requests, discuss their general usefulness.
  • Accept pull requests of good quality and coding style,
    give feedback on PRs that can't be accepted "as is".
  • Openly discuss about stuff, don't work in the dark.

As simple as possible, but not simpler

  • Nobody likes tools that are too complicated, ...
  • ... but nobody likes everything to be fixed and inflexible either.
  • Do the usually right thing by default, but offer other options.
  • Accept the fact that the usually right defaults might be totally unfit
    for some users / use cases.

Compatibility - boon and bane of backup software

  • Don't break it accidentally / without good reason / without warning.
  • Break it if above does not apply. needs more thoughts/discussion
  • As the fork is "new software" from the perspective of a Borg user or
    a Borg packaging distribution, there is no past we need to stay compatible
    with - we have the chance to break compatibility and change everything
    that we think needs changing.
  • Over time, we'll have more users and incompatible changes get harder.
  • Avoid getting into the "compatible forever" trap - we should maybe not
    assure compatibility of development versions nor spanning major releases.
  • When used for long-term archiving, special considerations and care are required.
    E.g. a development snapshot of Borg might be not the right thing for this.
    Also, Borg exists to be able to change things. So if you don't like or can't
    live with a changing software, don't use it.

cython / c / ctypes mix

Feedback by waldi - can / should we simplify?

Notes: the only ctypes stuff is in module xattr (everything else is Python/Cython/C).

See also #113 and #142.

extended globbing / wildcard behaviour

to continue from jborg/attic#97

(To have * not match over directory boundaries, and add ** which does - which makes it far more logical)

Patch got into the glob2 library initially but was later reverted as other functions that used it expected the old behaviour.

miracle2k/python-glob2#6

@ThomasWaldmann would you consider either of these options below - and if so which would you prefer?

A replacement function (note this is based on python library code so need to check licence compatibility) - this would be my preferred way.

diff --git a/attic/helpers.py b/attic/helpers.py
index 2ad2806..ad03e6d 100644
--- a/attic/helpers.py
+++ b/attic/helpers.py
@@ -9,7 +9,6 @@ import stat
 import sys
 import time
 from datetime import datetime, timezone, timedelta
-from fnmatch import translate
 from operator import attrgetter
 import fcntl

@@ -243,12 +242,12 @@ class ExcludePattern(IncludePattern):
     """
     def __init__(self, pattern):
         if pattern.endswith(os.path.sep):
-            self.pattern = pattern+'*'+os.path.sep
+            self.pattern = pattern+'**'+os.path.sep
         else:
-            self.pattern = pattern+os.path.sep+'*'
+            self.pattern = pattern+os.path.sep+'**'
         # fnmatch and re.match both cache compiled regular expressions.
         # Nevertheless, this is about 10 times faster.
-        self.regex = re.compile(translate(self.pattern))
+        self.regex = re.compile(self.translate())

     def match(self, path):
         return self.regex.match(path+os.path.sep) is not None
@@ -256,6 +255,43 @@ class ExcludePattern(IncludePattern):
     def __repr__(self):
         return '%s(%s)' % (type(self), self.pattern)

+    def translate(self):
+        pat=self.pattern
+        i, n = 0, len(pat)
+        res = ''
+        while i < n:
+            c = pat[i]
+            i = i+1
+            if c == '*':
+                if i < n and pat[i] == '*':
+                    res = res + '.*'
+                    i = i+1
+                else:
+                    res = res + '[^\\' + os.path.sep + ']*'
+            elif c == '?':
+                res = res + '.'
+            elif c == '[':
+                j = i
+                if j < n and pat[j] == '!':
+                    j = j+1
+                if j < n and pat[j] == ']':
+                    j = j+1
+                while j < n and pat[j] != ']':
+                    j = j+1
+                if j >= n:
+                    res = res + '\\['
+                else:
+                    stuff = pat[i:j].replace('\\','\\\\')
+                    i = j+1
+                    if stuff[0] == '!':
+                        stuff = '^' + stuff[1:]
+                    elif stuff[0] == '^':
+                        stuff = '\\' + stuff
+                    res = '%s[%s]' % (res, stuff)
+            else:
+                res = res + re.escape(c)
+        return res + '\Z(?ms)'
+

 def is_cachedir(path):
     """Determines whether the specified path is a cache directory (and

Modify the regular expression that comes out of fnmatch (Somewhat hacky)

diff --git a/attic/helpers.py b/attic/helpers.py
index 2ad2806..926a0b6 100644
--- a/attic/helpers.py
+++ b/attic/helpers.py
@@ -243,12 +243,19 @@ class ExcludePattern(IncludePattern):
     """
     def __init__(self, pattern):
         if pattern.endswith(os.path.sep):
-            self.pattern = pattern+'*'+os.path.sep
+            self.pattern = pattern+'**'+os.path.sep
         else:
-            self.pattern = pattern+os.path.sep+'*'
+            self.pattern = pattern+os.path.sep+'**'
         # fnmatch and re.match both cache compiled regular expressions.
         # Nevertheless, this is about 10 times faster.
-        self.regex = re.compile(translate(self.pattern))
+        pattern = translate(self.pattern)
+        # rework the regular expression so that it is the equivalent of doing
+        # ** -> .* and * -> [^\/]* so ** matches any sequence of characters,
+        # and * matches any except a path separator.
+        pattern = pattern.replace('.*.*','**')
+        pattern = pattern.replace('.*','[^\\'+os.path.sep+']*')
+        pattern = pattern.replace('**','.*')
+        self.regex = re.compile(pattern)

     def match(self, path):
         return self.regex.match(path+os.path.sep) is not None

๐Ÿ’ฐ there is a bounty for this

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.