Coder Social home page Coder Social logo

svn2svn's Introduction

svn2svn

Replicate (replay) changesets from one Subversion repository to another.

Features

  • Meant for replaying history into an "empty" target location. This could be an empty target repo or simply an empty folder/branch in the target repo.
  • Maintains logical history (ancestry) when possible, e.g. uses "svn copy" for renames.
  • Maintains original commit messages.
  • Optionally maintain source commit authors (svn:author) and commit timestamps (svn:date). Requires a "pre-revprop-change" hook script in the target repo, to be able to change the "svn:author" and "svn:date" revprops after target commits have been made.
  • Optionally maintain identical revision #'s between source vs. target repo. Effectively requires that you're replaying into an empty target repo, or rather that the first source repo revision to be replayed is less than the last target repo revision. Create blank "padding" revisions in the target repo as needed.
  • Optionally run an external shell script before each replayed commit, to give the ability to dynamically exclude or modify files as part of the replay.

Requirements

  • Python 2.6 or higher.
  • Subversion 1.6 or higher.
  • Written for a UNIX-type environment, e.g. Linux, Mac OSX, etc. For Windows-based usage, recommend using Cygwin for best compatibility.

Overview

This is a utility for replicating the revision history from a source path in a source SVN repository to a target path in a target SVN repository. In other words, it "replays the history" of a given source SVN repository/branch/path into a target SVN repository/branch/path.

This can be useful to create filtered version of a source SVN repository. For example, say that you have a huge SVN repository with a lot of old branch history which is taking up a lot of disk-space and not serving a lot of purpose going forward. You can this utility to replay/filter just the "/trunk" SVN history into a new repository, so that things like "svn log" and "svn blame" will still show the correct (logical) history/ancestry, even though we end-up generating new commits which will have newer commit-dates and revision #'s (though this script can optionally maintain the original commit-dates and revision #'s if desired).

While this replay process will obviously run faster if you're running between both a local source and target repositories, none of this requires direct access to the repo server. You could access both the source and target repo's over standard http://, svn://, svn+ssh://, etc. methods.

Usage

See svnreplay.py --help:

$ ./svnreplay.py --help
svn2svn, version 1.7.0
<http://nynim.org/code/svn2svn> <https://github.com/tonyduckles/svn2svn>

Usage: svnreplay.py [OPTIONS] source_url target_url

Replicate (replay) history from one SVN repository to another. Maintain
logical ancestry wherever possible, so that 'svn log' on the replayed repo
will correctly follow file/folder renames.

Examples:
  Create a copy of only /trunk from source repo, starting at r5000
  $ svnadmin create /svn/target
  $ svn mkdir -m 'Add trunk' file:///svn/target/trunk
  $ svnreplay -av -r 5000 http://server/source/trunk file:///svn/target/trunk
    1. The target_url will be checked-out to ./_wc_target
    2. The first commit to http://server/source/trunk at/after r5000 will be
       exported & added into _wc_target
    3. All revisions affecting http://server/source/trunk (starting at r5000)
       will be replayed to _wc_target. Any add/copy/move/replaces that are
       copy-from'd some path outside of /trunk (e.g. files renamed on a
       /branch and branch was merged into /trunk) will correctly maintain
       logical ancestry where possible.

  Use continue-mode (-c) to pick-up where the last run left-off
  $ svnreplay -avc http://server/source/trunk file:///svn/target/trunk
    1. The target_url will be checked-out to ./_wc_target, if not already
       checked-out
    2. All new revisions affecting http://server/source/trunk starting from
       the last replayed revision to file:///svn/target/trunk (based on the
       svn2svn:* revprops) will be replayed to _wc_target, maintaining all
       logical ancestry where possible.

Options:
      --version         show program's version number and exit
  -h, --help            show this help message and exit
  -v, --verbose         Enable additional output (use -vv or -vvv for more).
  -a, --archive         Archive/mirror mode; same as -UDP (see REQUIRES
                        below).
                        Maintain same commit author, same commit time, and
                        file/dir properties.
  -U, --keep-author     Maintain same commit authors (svn:author) as source.
                        (REQUIRES 'pre-revprop-change' hook script to allow
                        'svn:author' changes.)
  -D, --keep-date       Maintain same commit time (svn:date) as source.
                        (REQUIRES 'pre-revprop-change' hook script to allow
                        'svn:date' changes.)
  -P, --keep-prop       Maintain same file/dir SVN properties as source.
  -R, --keep-revnum     Maintain same rev #'s as source. Creates placeholder
                        target revisions (by modifying a 'svn2svn:keep-revnum'
                        property at the root of the target repo).
  -c, --continue        Continue from last source commit to target (based on
                        svn2svn:* revprops).
  -f, --force           Allow replaying into a non-empty target-repo folder.
  -r, --revision=ARG    Revision range to replay from source_url.
                        A revision argument can be one of:
                           START        Start rev # (end will be 'HEAD')
                           START:END    Start and ending rev #'s
                        Any revision # formats which SVN understands are
                        supported, e.g. 'HEAD', '{2010-01-31}', etc.
  -u, --log-author      Append source commit author to replayed commit
                        mesages.
  -d, --log-date        Append source commit time to replayed commit messages.
  -l, --limit=NUM       Maximum number of source revisions to process.
  -n, --dry-run         Process next source revision but don't commit changes
                        to target working-copy (forces --limit=1).
  -x, --verify          Verify ancestry and content for changed paths in
                        commit after every target commit or last target
                        commit.
  -X, --verify-all      Verify ancestry and content for entire target_url tree
                        after every target commit or last target commit.
      --pre-commit=CMD  Run the given shell script before each replayed
                        commit, e.g. to modify file-content during replay.
                        Called as: CMD [wc_path] [source_rev]
      --debug           Enable debugging output (same as -vvv).

Side Effects

  • The source repo is treated as strictly read-only. We do log/info/export/etc. actions from the source repo, to get the history to replay and to get the file contents at each step along teh way.
  • You must have commit access to the target repo. Additionally, for some of the optional command-line args, you'll need access to the target repo to setup hook scripts, e.g. "pre-revprop-change".
  • This script will create some folders off of your current working directory:
    • "_wc_target": This is the checkout of the target_url, where we replay actions into and where we commit to the target repo. You can safely remove this directory after a run, and the script will do a fresh "svn checkout" (if needed) when starting the next time.
    • "_wc_target_tmp": This is a temporary folder, which will only be created if using --keep-revnum mode and it should only exist for brief periods of time. This is where we commit dummy/padding revisions to the target repo, checking out the root folder of the target repo and modifying a "svn2svn:keep-revnum" property, i.e. a small change to trigger a commit and in a location that will likely go un-noticed in the final target repo.

Examples

Create a copy of only /trunk from source repo, starting at r5000

$ svnadmin create /svn/target
$ svn mkdir -m 'Add trunk' file:///svn/target/trunk
$ svnreplay -av -r 5000 http://server/source/trunk file:///svn/target/trunk
  1. The target_url will be checked-out to ./_wc_target.
  2. The first commit to http://server/source/trunk at/after r5000 will be exported & added into _wc_target.
  3. All revisions affecting http://server/source/trunk (starting at r5000) will be replayed to _wc_target. Any add/copy/move/replaces that are copy-from'd some path outside of /trunk (e.g. files renamed on a /branch and branch was merged into /trunk) will correctly maintain logical ancestry where possible.

Use continue-mode (-c) to pick-up where the last run left-off

$ svnreplay -avc http://server/source/trunk file:///svn/target/trunk
  1. The target_url will be checked-out to ./_wc_target, if not already checked-out
  2. All new revisions affecting http://server/source/trunk starting from the last replayed revision to file:///svn/target/trunk (based on the "svn2svn:*" revprops) will be replayed to _wc_target, maintaining all logical ancestry where possible.

Credits

This project borrows a lot of code from the "hgsvn" project. Many thanks to the folks who have contributed to the hgsvn project, for writing code that is easily re-usable:

This project was originally inspired by this "svn2svn" project:
http://code.google.com/p/svn2svn/
That project didn't fully meet my needs, so I forked and ended-up rewriting the majority of the code.

Links

svn2svn's People

Contributors

davidcie avatar tonyduckles avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

svn2svn's Issues

Windows: terminal color coding doesn't work under cmd.exe

When running under cmd.exe (CPython 2.7.2 on Windows XP 64), the color escape codes aren't interpreted and make debug output rather messy. I ended up just commenting out:

#if color in _colors:
#    msg = '%s%s%s' % ("\x1b["+_colors[color]+(";1" if bold else "")+"m", msg, "\x1b[0m")

in ui.py to get going. I'm not aware of the coding necessary for cmd.exe, so perhaps it's easiest to just check the SHELL environment variable or os.name and disable it.

Windows: extracting paths from svn logs misses CRs

In svn2svn.py, I had to change any calls to

top_paths.strip("\n").split("\n") 

to

top_paths.strip("\r\n").split("\r\n").  

Otherwise, the carriage returns stay in and mess up subsequent svn commands. I'm not certain about the .split() portion, but it worked for me.

invalid syntax on <>

I'm getting a python error. my-project/trunk does exist, but my-project/trunk/my-lib does not.

$ ./svnreplay.py --verbose --keep-author --keep-date --keep-prop file:///var/svn/my-lib/trunk file:///var/svn/my-project/trunk/my-lib
Traceback (most recent call last):
  File "./svnreplay.py", line 3, in <module>
    from svn2svn.run.svnreplay import main
  File "/var/svn/svn2svn/svn2svn/run/svnreplay.py", line 251
    is_diff = True if sum1 <> sum2 else False
                            ^
SyntaxError: invalid syntax
$ python --version
Python 3.4.6

Better KeyboardInterrupt handling in real_main()

Do better KeyboardInterrupt try/except handling in real_main(). It's fine for a process_svn_log_entry call to be interrupted, but other calls here should be considered atomic, e.g. commit_from_svn_log_entry.

Incorrect non-ASCII paths handling

When working with Subversion repo with non-ASCII characters in paths one gets KeyError in urllib.quote:

Traceback (most recent call last):
  File ".../svnreplay.py", line 5, in <module>
    sys.exit(main() or 0)
  File ".../svn2svn/run/svnreplay.py", line 1108, in main
    return real_main(args)
  File ".../svn2svn/run/svnreplay.py", line 896, in real_main
    svnclient.export(join_path(source_start_url, path_offset), source_start_rev, path_offset, force=True)
  File ".../svn2svn/svnclient.py", line 506, in export
    args += [safe_path(svn_url, rev_number), safe_path(path)]
  File ".../svn2svn/svnclient.py", line 43, in safe_path
    path = urllib.quote(path, ":/+")
  File ".../python2.7/urllib.py", line 1303, in quote
    return ''.join(map(quoter, s))
KeyError: u'\u0430'

This can be fixed with explicit encoding unicode paths with user-provided encoding when forming safe path.

Multiple source roots?

I would like to extract multiple directories, e.g. trunk/foo and trunk/bar, keeping revision numbers, into a new repo. Is this possible with svn2svn?

Windows: rmtree fails

In svn2svn.py, rmtree should be imported from ..shell and all instances of shutil.rmtree changed to the rmtree override defined in shell.py.

Incorrect symbolic links transferring

Come across this issue when replaying SVN repo with symbolic links.

Code in in_svn function uses svn status to check is file is under Subversion control, but svn status, svn info and possibly other commands seem to follow symbolic links when called directly on them.

Traceback (most recent call last):
  File ".../svn2svn/run/svnreplay.py", line 957, in real_main
    process_svn_log_entry(log_entry, source_ancestors, commit_paths)
  File ".../svn2svn/run/svnreplay.py", line 718, in process_svn_log_entry
    sync_svn_props(source_url, source_rev, path_offset)
  File ".../svn2svn/run/svnreplay.py", line 391, in sync_svn_props
    run_svn(["propset", prop, source_props[prop], svnclient.safe_path(path_offset)])
  File ".../svn2svn/shell.py", line 208, in run_svn
    args=args, bulk_args=bulk_args, fail_if_stderr=fail_if_stderr, no_fail=no_fail)
  File ".../svn2svn/shell.py", line 141, in run_command
    return _run_raw_command(cmd, map(_transform_arg, args), fail_if_stderr, no_fail)
  File ".../svn2svn/shell.py", line 114, in _run_raw_command
    % (pipe.returncode, cmd_string, err, out))
ExternalCommandFailed: External program failed (return code 1): svn propset 'svn:special' '*' '.../symlink'
svn: E200009: Cannot set 'svn:special' on a directory ('.../_wc_target/.../symlink')

Furthermore, cleanup code is not designed to work with symbolic links to directories.

Traceback (most recent call last):
  File ".../svn2svn/svnreplay.py", line 5, in <module>
    sys.exit(main() or 0)
  File ".../svn2svn/run/svnreplay.py", line 1110, in main
    return real_main(args)
  File ".../svn2svn/run/svnreplay.py", line 994, in real_main
    full_svn_revert()
  File ".../svn2svn/run/svnreplay.py", line 361, in full_svn_revert
    shell.rmtree(path)
  File ".../svn2svn/shell.py", line 72, in rmtree
    return shutil.rmtree(path, False, _rmtree_error_handler)
  File "/usr/lib64/python2.6/shutil.py", line 197, in rmtree
    onerror(os.path.islink, path, sys.exc_info())
  File "/usr/lib64/python2.6/shutil.py", line 195, in rmtree
    raise OSError("Cannot call rmtree on a symbolic link")
OSError: Cannot call rmtree on a symbolic link

svnreplay doesn't seem to handle comments with carriage returns (running on Windows)

Traceback (most recent call last):
File "C:\stephen\working\svnreplay.py", line 5, in
sys.exit(main() or 0)
File "C:\stephen\working\svn2svn\run\svnreplay.py", line 1106, in main
return real_main(args)
File "C:\stephen\working\svn2svn\run\svnreplay.py", line 910, in real_main
target_rev = commit_from_svn_log_entry(source_start_log, target_revprops=target_revprops)
File "C:\stephen\working\svn2svn\run\svnreplay.py", line 103, in commit_from_svn_log_entry
rev_num = parse_svn_commit_rev(output) if output else None
File "C:\stephen\working\svn2svn\run\svnreplay.py", line 46, in parse_svn_commit_rev
return int(rev_num)
ValueError: invalid literal for int() with base 10: '2.\r'

Add verify-mode

Add optional command-line arg(s) for verifying the replayed target history matches the (logical) source history.

  • Need a verify-all (post-commit) arg, for verifying a fully replayed target.
  • Maybe want a verify-changed (ideally pre-commit, but post-commit would work too), to verify individual replayed commits as we go. Might be useful for debug/testing, but likely not for real-world uses.

Incorrect handling of dir -> file replaces

When replacing dir, working copy must be updated before committing. Svn2svn does this correctly for dir -> dir replaces, but not for cases when dir is replaced by file:

Traceback (most recent call last):
  File ".../svn2svn/run/svnreplay.py", line 960, in real_main
    target_rev = commit_from_svn_log_entry(log_entry, commit_paths, target_revprops=target_revprops)
  File ".../svn2svn/run/svnreplay.py", line 102, in commit_from_svn_log_entry
    output = run_svn(args)
  File ".../svn2svn/shell.py", line 208, in run_svn
    args=args, bulk_args=bulk_args, fail_if_stderr=fail_if_stderr, no_fail=no_fail)
  File ".../svn2svn/shell.py", line 141, in run_command
    return _run_raw_command(cmd, map(_transform_arg, args), fail_if_stderr, no_fail)
  File ".../svn2svn/shell.py", line 114, in _run_raw_command
    % (pipe.returncode, cmd_string, err, out))
ExternalCommandFailed: External program failed (return code 1): svn commit --force-log -m 'Replace dir with file' --with-revprop 'svn2svn:source_uuid=46727182-e339-4f52-8b88-d18f8a30a573' --with-revprop 'svn2svn:source_rev=3' --with-revprop 'svn2svn:source_url=file://...' dir
svn: E155011: Commit failed (details follow):
svn: E155011: File '.../_wc_target/dir' is out of date
svn: E160028: Directory '/dir' is out of date

Symoblic link treatment

In my SVN history I have a symbolic link that points to itself. Chashes when replaying SVN history.

While script is performing svn export, this error message occurs:

svn: E000040: Can't check path '...': Too many levels of symbolic links

Need to URL-escape user-supplied source_url/target_url params

If the user-supplied source_url or target_url URLs aren't properly URL-encoded, some of the comparisons in real_main() will fail because the repos_url value obtained from "svn info" will be URL-encoded.

Should be able to use urllib.quote for this.

Inital padding revisions error

An excellent piece of work and is just what I needed for dis-tangling some projects out of a massive repo!

It all seems to work well under Cywin on my Windows machine.

One small error I think I've found is that the when preserving revision numbers, the first lot of padding if skipped, as you've used source_rev instead of source_start_rev and this meant the first commit from the source always went in as rev #1 in the destination and then there was a bunch of padding to get it back in step for the next commit. Diff file for the fix is:

--- C:/Temp/SVN Migration/svn2svn-master/svn2svn-master/svn2svn/run/svnreplay.py    Sat Sep 19 23:18:50 2015
+++ C:/Temp/SVN Migration/svn2svn-master/svn2svn-master/svn2svn/run/svnreplay - Copy.py Fri Jan 22 10:33:35 2016
@@ -876,8 +876,8 @@ def real_main(args):
         source_start_rev = int(source_start_log['revision'])
         ui.status("Starting at source revision %s.", source_start_rev, level=ui.VERBOSE)
         ui.status("", level=ui.VERBOSE)
-        if options.keep_revnum and source_rev > target_rev_last:
-            target_rev_last = keep_revnum(source_rev, target_rev_last, wc_target_tmp)
+        if options.keep_revnum and source_start_rev > target_rev_last:
+            target_rev_last = keep_revnum(source_start_rev, target_rev_last, wc_target_tmp)

         # For the initial commit to the target URL, export all the contents from
         # the source URL at the start-revision

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.