psf / gh-migration Goto Github PK

View Code? Open in Web Editor NEW

42.0 20.0 8.0 7 KB

This repo is used to manage the migration from bugs.python.org to GitHub.

gh-migration's Introduction

gh-migration

This repo is used to manage the migration from bugs.python.org to GitHub.

Relevant Documents

PEPs

PEP 581 -- Using GitHub Issues for CPython: outlines the rationale for the migration and
PEP 588 -- GitHub Issues Migration Plan: outlines the initial migration plan
PEP 595 -- Improving bugs.python.org: includes issues with PEP 581/588, migration considerations, and a list of advantages that Roundup has over GitHub Issues

PEP 581 has been accepted, PEP 588 is still a draft (thus subject to changes), and PEP 595 has been withdrawn (but still contains valuable information).

Discourse threads

PEP 581 - Using GitHub Issues (Dec 2018, 37 msgs): early discussion about PEP 581
Proposal: Create “Bug Triage” team on GitHub (Mar 2019, 59 msgs): discussion about the triage team, labels, CODEOWNER file, etc.
What are next steps for PEP 581? (Feb 2019, 3 msgs): short discussion about triaging, components and priority labels
Using CLA Assistant for Python (Mar 2019, 25 msgs): discussion about using CLA-assistant
CLA-assistant is No-Go (Jul 2019, 32 msgs): discussion about using CLA-assistant

Mailing list threads

PEP 581: Using GitHub Issues for CPython (Mar 2019, python-dev, 7 msgs)
Steering Council Update for April 2019 (Apr 2019, python-committers, 12 msgs)
PEP 581 (Using GitHub issues for CPython) is accepted (May 2019, python-dev, 24 msgs)
PEP 595: Improving bugs.python.org (May 2019, python-dev, 13 msgs):
PEP 581/588 RFC: Collecting feedback about GitHub Issues (Aug 2019, python-dev, 3 msgs)
Re: PEP 581/588 RFC: Collecting feedback about GitHub Issues (Sep 2019, python-committers, 11 msgs)

Mailing list threads (historical)

PSF Infrastructure Committee's recommendation for a new issue tracker (Oct 2006, python-dev, 3 msgs)
PSF Infrastructure has chosen Roundup as the issue tracker for Python development (Oct 2006, python-dev, 3 msgs)

GitHub repos, issues, and projects

PEP 581/588 RFC: Collecting feedback about GitHub Issues (core-workflow repo): a long list of features/issues with comments and votes (from Aug 19th)
Consider whether or not to migrate bugs.python.org source code to this repo (bugs.python.org repo): contains info about different repos related to our fork of Roundup, b.p.o, and the other instances
Backup GitHub information (core-workflow repo): discussion about backing up GitHub data
Adding triagers role into CPython GitHub project (core-workflow repo)
Migrating to CLA Assistant project (core-workflow repo)

Wiki links

How to make the tracker read-only: https://wiki.roundup-tracker.org/ReadOnlyTracker
The Desired Tracker Features page discusses features we wanted for Roundup, and different ways to handle labels
The Tracker Development page contains (mostly outdated) info about setting up and maintaining Roundup and irker, and using roundup-admin
The Tracker Development Planning page contains other discussions

Zulip streams

https://python.zulipchat.com/#narrow/stream/130206-pep581

Twitter threads

https://twitter.com/mariatta/status/1128531347914407936 : Tweet about PEP 581 being approved, with some comments/concerns/suggestions
https://twitter.com/VictorStinner/status/1128476712084410373 : Tweet about PEP 581 being approved, with a comment about not being able to assign issues to non-coredevs

Blog posts

Python Core Sprint 2018: Part Two / PEP 581 (Sep 2018, Mariatta's blog)
Mariatta Wijaya: Let's Use GitHub Issues Already! (May 2019, PSF blog)

Other instances

In addition to bugs.python.org, we are also hosting two other instances:

Jython

Jython is already using GitHub issues
Still has a legacy instance of Roundup at https://bugs.jython.org/ (bjo)
Thread on the Jython ML that suggests to switch to GitHub issues but keep bugs.jython.org around
They are using the form on python.org to sign the CLA and bpo to check it.
They would be fine with a mostly read-only bjo

Roundup

The tracker for Roundup is at https://issues.roundup-tracker.org/
They are not planning to migrate
They depend on us for hosting, unless they find another place

gh-migration's People

Contributors

Stargazers

Watchers

Forkers

imaduddinamajid trallard isabella232 jeff5 seanpm2001

gh-migration's Issues

Make bpo read-only

The first step of the migration is making BPO read-only. The Roundup wiki has a page about making a tracker read-only, however this requires both code changes and fiddling with the roundup-admin.

A better approach would be:

~~Add the detector from the wiki to prevent changes~~ (apparently not needed)
Remove edit permissions from Users and Developers) in schema.py
Update the HTML templates (note that if users don't have edit permissions, the forms shouldn't be visible already):
- ~~The register form for creating new users should be removed~~ (already gone if users don't have permissions)
- ~~The login form could be removed/hidden (unless needed by coordinators) if we decide users shouldn't log in anymore~~
- ~~The "Create new" link should be removed (might not be necessary if users can't login)~~ (already gone)
- A banner pointing to GitHub should be visible on most/all pages (psf/bpo-tracker-cpython#12)
- Each individual issue should have a link to the corresponding GH issue (psf/bpo-tracker-cpython#17)
- In the issue list, there should be a link to each corresponding GH issue (psf/bpo-tracker-cpython#17)

The banner pointing to GitHub will initially be more prominent and mention that the migration is in progress, informing users that it won't be possible to report and edit issues during the migration. After the migration is complete it will be updated to be less prominent.

This should be tested locally first, and the test should include:

Check that none of the roles can create/edit issues (unless explicitly allowed, e.g. for Admin/Coordinator)
Check that creation/edit is prevented through all different interfaces:
- Web
- Mail
- XMLRPC

These changes should be made available with a single PR, that can be merged (and if needed, rolled back) easily.

We should also determine what access level we should leave to users, since they might be able to access/remove their bpo accounts, updating/removing their email address, names, GH username, timezone, etc, and possibly even messages. Logged users also have access to summaries about the issues they created and that they are following, so they should also be able to review those summaries.

Replace the irker detector

The irker detector has been used to post updates to the #python-dev-notifs IRC channel (see the bugs.sls file).

Find a replacement
Set up the replacement
python/psf-salt#232

Add links from bpo to GitHub

At the end of the migration, GH will provide a file that maps the old bpo IDs to the corresponding GH IDs. We should:

edit the issue item schema to add an additional "GitHub id" field
update the field with the GH IDs provided in the file

The GH ID should then be used

at the top of each bpo issue to point to the corresponding GH issue
in a new GH id column in the issue list page

This can be accomplished by editing respectively the issue.item.html template (adding the link at the top, instead of the editing form) and the issue.list.html template (adding a column with the GH ids next to the bpo ids that sends to the GH issue once clicked). This should be implemented as a separate PR to be merged after the migration.

Map bpo users to GitHub users

On bpo users can specify their GitHub username. If they do so, their bpo issues/comments can be mapped to their GitHub users, however this only works for users that belong to the "python" organization.

For users with a GitHub username that don't belong to the "python" org and for users that haven't specified their GitHub usernames, a placeholder user (called mannequin) with either their GitHub or bpo username will be created.

The mannequin will only show the username, so:

even if the user exists outside of the "python" org:
- there is no direct way to know their real name
- clicking on their username will not open their user page
- they (probably) will not receive notifications for new comments
- they won't be able to edit their old comments
if the bpo username is used:
- there is no direct way to know their real name
- if will not be possible to know their GitHub username (if they have one)
- they will not receive notifications
- it might create confusion if the original author comments with a different GitHub username
- it might create confusion if a different GitHub user exists with the same username

Mannequins can be manually reclaimed after the import, but this might still be impossible if the users don't belong to the org. A possible workaround is to create a new org, add all the bpo users that have a GitHub username to that org (possibly without sending out notifications), perform the import there so that all the users get mapped, then copy all the issues to python/cpython and remove the new org. This might preserve the user mapping even if the users don't belong to the "python" org.

Some GH PR references were not ported correctly

python/cpython#90908 (comment) has two incorrect references. The first one goes to GH-75453, which is bpo-31270. It should be GH-31270 (https://bugs.python.org/issue46752#msg413260). Note that the reference is correct in the following 'New changeset' comment.

This is the only case I've seen so far, but I haven't systematically looked for more.

Issue (re)numbering

GitHub uses the same namespace for issues and PRs, and the current PR numbers already overlap with the original bpo numbers.

Current situation:

bpo issues (as of 2020-10-23):
- open: 7608
- closed: 46258
- total: 53866
Used bpo ranges:
- 1000-42000+ (~7500 open, ~40800 total, mostly contiguous)
- 207608-1779871 (178 open, 12914 total, non-contiguous, old SourceForge issues)
Used GitHub range (PRs):
- 1-22500+ (~1400 open, ~21500 closed, contiguous)

Questions and issues:

Is there a way to preserve the bpo numbering?
- Can we renumber PRs instead?
- Can we separate the namespace for issues and PRs?
- Does GitHub offer other solutions/options?
Should we renumber old SourceForce issues?
- could be condensed in a continuous block of ~13k issues
- can be placed just before/after the bpo block
If we renumber the issues, what pattern should we follow?
- 1-23k PRs, 87k-100k old SF issues (condensed, renumberd), 101k-142k bpo issues (original_number + 100k), >143k new GH issues/PRs
- 1-23k PRs, 27k-40k old SF issues, 41k-82k bpo issues (original_number + 40k), >82k new GH issues/PRs
- 1k-42k bpo issues (original_number), 43k-56k old SF issues, 57k-80k current PRs (if PRs can be renumbered), >80k new GH issues/PRs
- other options?
- Almost-to-scale representation of the three patterns, for the visually-inclined folks:
```
0          23k 27k   41k                     87k    101k                  143k
|PRPRPRPRPRP|_________________________________|SFSFSF|BPOBPOBPOBPOBPOBPOBPO|NEW...|
|PRPRPRPRPRP|__|SFSFSF|BPOBPOBPOBPOBPOBPOBPO|NEW...|
_|BPOBPOBPOBPOBPOBPOBPO|SFSFSF|PRPRPRPRPRP|NEW...|
 1k                   43k    57k         80k 
```
  Every char is ~2k issues. PR: current PRs; SF: old SourceForge issues; BPO: current BPO issues; NEW...: new issues; _: unused range.

Other considerations:

The numbers of all new issues/PRs will follow the highest number in the repo (max(issue_ids) + 1) [confirm with GH].
If we renumber bpo issues by using original_number + 40k or original_number + 100k it will be easier to find the corresponding issue without relying on a mapping.
Trying to do the same for old SF issues is probably not worth it (they are scattered over a wide range and have high IDs).
There are references to existing bpo issues number that might need to be updated if the issues are renumbered.
Updating issue references in other issues can be done while migrating.
Updating issue references in code comments might not be necessary (people can find the bpo issue and from there the corresponding GH issue).
If bpo is kept alive, a link to the corresponding GH issue can be added to the bpo issue, if not, a redirect script that maps old and new ids should be used.

Update

After talking with GitHub, it appears that is not possible to import the issues in our current repo using the current tools, but they should be imported in a separate repo instead.
This solves the problem with the (re)numbering (except for the SF issues, that should probably be renumbered and condensed).
It is also possible to redirect users elsewhere using "issue templates" (e.g. VueJS uses this issue template file to redirect users to different pages -- see documentation here.)
❓ Can the redirect be automated? Will issues/PR references still work?

Notify bpo users once the migration is done

After the migration, we should send users an email informing them that the migration happened and listing issues that have been created by them, assigned to them, and followed by them.

In order to do this, we should write a tool that goes through all the users, gathers the data, formats the messages, and sends them to the users. This summary could be made available on bpo too, and the email could contain a link in addition to or instead of the lists (these summaries are already available in the sidebar for logged-in users).

This works well for occasional contributors that are involved in less than one or two dozens of issues, since they can go through them, review them, and resubscribe manually (if they are still interested). However it doesn't scale too well for people that follow hundreds of issues, and having a way to resubscribe users to issues that they were following would be better (see #5 under "nosy list").

Even if we find a way to preserve the nosy list during the migration, the email would still be useful because:

it will inform every bpo member that the migration happened (unless we want to limit the recipients to people that have been active recently)
it will give them a chance to review and possible update their old issues
it will provide a quick way to find their issues on GH, even if they lost access to bpo

The exact wording and format of the email still needs to be determined:

it should include a paragraph or two explaining that the issues have been migrated.
it should include 3 lists: created by you, assigned to you, followed by you. These matches the 3 summaries available in the sidebar of bpo. Links to the summaries and total number of issues in each list could also be included.
it should include the issue titles, links to the GH issues, and possibly links to the corresponding bpo issues too (not essential since they are already linked at the top of the GH issues).
it should encourage the users to review and possibly close old issues if they are no longer relevant.
it could list all issues, or be limited to a maximum amount in case someone is following hundreds or thousands of issues.
it could be sorted by different fields depending on the list, or just sorted by date/descending
it could be limited to issues that are still open, or also include closed issues (possibly in separate lists).
it could include additional metadata (possibly similar to the weekly report).

Given the number of users, we might have to take some care in sending out a large number of emails at once, since it might be seen as spam.

Subversion revisions references were not ported

For example:

https://bugs.python.org/issue2116#msg87822

Committed in r72662, r72670. Thanks!

python/cpython#46370 (comment)

Committed in r72662, r72670. Thanks!

Replace the weekly summary report

The roundup-summary script script has been used to send weekly reports to the python-dev ML. The script is executed once a week by a cron job.

Find/write a replacement for the weekly summary report
Set up the replacement to send a weekly mail to python-dev

Adding a dashboard similar to the Django dashboard has been proposed on the TrackerDevelopmentPlanning wiki page and discussed. Custom views provided by GitHub issues could be a simpler alternative to the dashboard.

Archive bpo-related repos

Now that we migrated to GitHub, I think the following repos can be archived:

@ewdurbin: can you look into it? Am I missing any other repo?

Replace the sendmail script

The sendmail Roundup detector has been used to send mails to the new-bugs-announce mailing list and the python-bugs-list mailing list whenever a new issue was created or a new message posted respectively. The addresses are configured in the bugs.sls file.

Find a replacement
Set up the replacement

Replace the local_replace.py script

The local_replace.py extension has been used to automatically convert parts of messages posted to b.p.o into links, including: issues, messages, reference numbers, PEPs, files, tracebacks, etc. (see the triaging page of the devguide). Some of these links are already automatically handled by GitHub.

If we want to create links for the rest, a new tool should be written.

This tool will be used for new messages, but it might also be used either at import time, or after the issues have been imported to GitHub.

Map bpo issue metadata to GitHub fields/labels

This issue is about issue metadata (priority, versions, status, etc.), how/where to import them in GitHub, and what metadata to keep/add/remove/update. User/comment/file metadata will be discussed in a separate issue.

bpo tracks different metadata for each issue (see e.g. https://bugs.python.org/issue2771 ) including: title, comments, files (attachments), creator, creation, actor, activity, type, stage, components, versions, status, resolution, dependencies, superseder, assigned to, nosy list, priority, keywords, remote HG repos, linked PRs

The meaning of each field is explained in the devguide. The fields are defined in the schema.py of the bpo instance. The creator, creation (datetime), (last) actor, (last) activity (datetime) are common to all classes.

GitHub already has corresponding fields for the followings: title, messages (comments), linked PRs, assigned to (assignees), creator (user) and creation (created_at).
- bpo stores messages as a list of id on the issue, GitHub has a separate list of comments linked to the issue
- GitHub issues have a body that contains the first comment
- Linked PRs seem to be generated automatically at runtime, not at import/export time
❓ Does GitHub have fields for (last) actor, (last) activity (datetime)? Do we need them?
- ✔️ there is an updated_at field (datetime), but no last actor. We probably don't need the last actor.

The other fields will need to be replaced with something else (mostly labels) or removed.

Labels in GitHub can be grouped either with colors, and/or with a prefix like priority-high, priority-medium, priority-low. GitHub is working on adding custom fields, but they will be available in ~6 months.

Actions can be used to automate certain tasks in addition or instead of bots (e.g. adding labels, closing stale issues, etc.).

Unused metadata that are not converted to labels (or anything else) can be stored in a comment so that can be retrieved if needed (e.g. if we move away from GH).

On the python/cpython there are currently 32 labels:

5 stage labels (yellow), apparently set by bedevere-bot: awaiting change review, awaiting changes, awaiting core eview, awaiting merge, awaiting review
6 type-related (blue/red) labels: type-bugfix, type-documentation, type-enhancement, type-performance, type-security, type-tests
5 version-related (gray) labels for backports (used by bots): needs backport to 3.6-3.10
5 more labels used by bots: automerge, DO-NOT-MERGE, skip issue, skip news, test-with-buildbots
2 CLA-related labels (used by bots): CLA not signed, CLA signed
2 OS-related labels: OS-mac, OS-windows
7 more misc labels: invalid, ctypes, dependencies, expert-asyncio, spam, sprint, stale

This is the full list of all the fields we have in Roundup, and how we could convert them to GitHub Issues:

creator, creation, activity
title

The exporter creates an event when the title has been updated

comments

The exporter exports comment author, content, and date.
See #3 for more info on the msg content.

files (attachments)

Files will still be hosted on bpo
The exporter will create direct links to the files

assigned to

The exporter sets the Assignees field and creates events when the assignee changes

linked PRs

These can not be imported and the list can't be populated automatically
PRs are now listed in the table at the top of each imported issue

nosy list

To replace the nosy list users can (un)subscribe to individual issues, and can be @mentioned.
The nosy list users are listed/mentioned in the table at the top of each issue, but this doesn't affect subscriptions.
❓ How can we preserve the initial nosy list? @mention all nosy list users in the first message?
- ~~✅ it's possible to subscribe people to the issue without sending out any notification when the issue are imported, and enabling notification afterwards so that they will get updates.~~
- ❌ Subscribing other people is not possible, but it might be possible to retrigger mentions by editing the imported messages to have them notified.
- #12 might also help
❓ How can we replace the nosy autocomplete?
- ✅ probably not possible, but GitHub suggests reviewers and there is a CODEOWNERS file
❓ Can we automatically add people when a certain label is added?
- ✔️ this is now possible, see #16

dependencies

❓ What options do we have to track dependencies with GitHub? (Projects might be one way, but they are probably overkill for simpler cases -- other ways?)
- ❌ ~~currently there is no built-in support for dependencies, GitHub might add it later.~~
- ✔️ It is now possible to add a checkbox list of issues, and GitHub will track them as tasks (won't enforce closing all the dependencies before closing the issue though)
  - ❌ this doesn't work in tables, so either we list them in a table as a plain list with no checkboxes, or the list of deps should be moved after the table. Since these are bpo-xxxxx issues, even if they are moved after the table the checkboxes won't be updated automatically.
- Dependencies are now listed on the table at the top
- Projects/milestones could also be used to track complex issues that are broken down in multiple issues.

superseder

❓ Does GitHub has a way to mark an issue as duplicate?
- ✅ writing Duplicate of #xxxxx as a reply marks the issue as duplicate. A default "duplicate" reply can also be added to the saved replies (the icon with the left-pointing arrow on the top-right).
  - ❌ This doesn't work with bpo-xxxxx ref, so it can't be used for imported issues
    - ✅ we might be able to replace the bpo-xxxxx ref with a GH ref after the migration
- The superseder is now included in the table at the top

remote HG repos
- These are mostly outdated and haven't been migrated.

If the link still works, these should be converted to a PR (or a patch)
❓ Do we need to import the link into GitHub?
- there are currently 340 valid links and 228 unique ones
  - of the 228 unique ones, 88 are reachable, 125 are 404, and 14 are unreachable
  - of the 88 that are reachable, 55 are hg.python.org links, 26 are GH/Gist links (so invalid HG links, but might contain a valid patch/branch), and 7 link to other repos
- I could add a "linked repos" row to the table, a simple link to the bpo issue that says "There are repos with patches linked to the original issue", or just ignore them.

type

There are currently 7 types on bpo: behavior, crash, compile error, resource usage, security, performance, enhancement
There are currently 6 type-* labels on GitHub: type-bugfix, type-documentation, type-enhancement, type-performance, type-security, type-tests

so:

type-bugfix seems to replace behavior, crash, compile error
type-enhancement, type-performance, and type-security replace the corresponding fields
resource usage is gone (possibly included in type-performance)
type-tests and type-documentation are set automatically for test_*.py and *.rst files (not sure if they should be types -- they were components on bpo and got added in python/bedevere#108)

stage

There are currently 6 stages: test needed, needs patch, patch review, commit review, backport needed, resolved
See also this (old) proposed structure and this discussion

The stage could use the existing stage labels. An awaiting triaging might be added.

status

There are currently 3 statuses: open, pending, closed
Events are now created for closed/reopened issues
Issues are labeled with the stale label when pending

components

There are currently 27 components: 2to3 (2.x to 3.x conversion tool), Argument Clinic, asyncio, Build, C API, Cross-Build, ctypes, Demos and Tools, Distutils, Documentation, email, Extension Modules, FreeBSD, IDLE, Installation, Interpreter Core, IO, Library (Lib), macOS, Regular Expressions, SSL, Subinterpreters, Tests, Tkinter, Unicode, Windows, XML
❓ People can be automatically added to the nosy list when a component is selected, can we automatically do the same with labels?
- ✔️ now we can, see #16

versions

There are currently 5 versions: Python 3.10, Python 3.9, Python 3.8, Python 3.7, Python 3.6
Versions need to be added/removed as new versions of Python are released/retired.
❓ Do we want to keep versions?

resolution

There are currently 11 resolutions: duplicate, fixed, not a bug, later, out of date, postponed, rejected, remind, wont fix, works for me, third party
❓ Do we want to keep resolutions?

priority

There are currently 6 priorities: release blocker, deferred blocker, critical, high, normal, low
We might be able to get rid of this field and use milestones for release/deferred blocker.
❓ Can we automatically warn release managers somehow?
- ✅ if we keep the release/deferred blocker labels we could set autonosy for the RMs (see #16)
- ✅ we could use milestones/projects to track release/deferred blockers for each release and the RMs can use/follow those more easily.

keywords

There are currently 17 keywords: 3.2regression, 3.3regression, 3.4regression, 3.5regression, 3.6regression, 3.7regression, 3.8regression, 3.9regression, buildbot, easy, easy (C), gsoc, needs review, newcomer friendly, patch, pep3121, security_issue
❓ Do we want to keep any of these?

Fate of Roundup and the instances

Roundup has a "core" and one or more tracker "instances". The fork of Roundup that we are currently hosting/running is used by 3 instances:

https://bugs.python.org/ (bpo), used to track CPython issues
https://bugs.jython.org/ (bjo), used to track Jython issues
https://issues.roundup-tracker.org/, used to track Roundup issues

Since the other two instances rely on us, we need to keep this into account before we shut down Roundup/bpo.

Jython

Jython is already using GitHub issues
Still has a legacy instance of Roundup at https://bugs.jython.org/ (bjo)
They haven't migrated the old issues
They are also using bjo for the CLA
They would be fine with a mostly read-only bjo
See also this thread on the Jython ML that suggests to switch to GitHub issues but keep bugs.jython.org around

Roundup

The tracker for Roundup is at https://issues.roundup-tracker.org/
They are not planning to migrate
They depend on us for hosting, unless they find another place

CPython

After the migration to GitHub issues is completed, we have at least 3 options:

Make bpo read-only and keep Roundup running;
- the other instances can keep running;
- it will be possible to access old issues and their metadata;
- it will be possible to add messages with redirect links;
- it will be possible to search and filter old issues;
- we need to make it read-only;
Create a static mirror of bpo and shut down Roundup:
- other instances will need to find an alternative solution;
- it will be possible to access old issues and their metadata (HTML only);
- it will be possible to add messages with redirect links (before making it static);
- it will not be possible to search and filter old issues;
- we need to create the static mirror;
Create a script that redirects to the corresponding GH issue:
- other instances will need to find an alternative solution;
- it will not be possible to access/search old issues;
- we need to create a redirect script that maps the issue numbers;

Since this issue is non-blocking, we can adopt the first option and then switch to the second or third down the line, depending on what the other projects do.

Write a tool to export data from bpo

In order to import data into GitHub we need to export bpo data in a format compatible with the importer tool.

There are at least 5 ways to do this:

Using the Roundup Python API to directly access the db (see below);
Using roundup-admin to export the data and then parsing the output;
Using the REST API;
Using the XMLRPC interface;
Accessing the PostreSQL DB directly.

The first option is likely the easiest solution. The script that generates the weekly "Summary of Python tracker issue" does something similar to access the database and extract data about the issues. The Roundup documentation has a table that summarizes the available functions.

By using one of these solutions, we can write a tool that extracts the data from bpo and rearranges them in the right format. The tool will also need reformat the issues (see #3), rearrange the labels, and possibly make other changes. The first version of the tool doesn't need to include these changes -- they can be added once we solved the other issues.

We should also take care of exporting attachments such as patches, sample scripts, screenshots, etc..

Update (2021-09-16)
I'm writing a tool using the first option above:

Add a page that redirects from bpo to GitHub

As a follow-up of #15, once we have the GitHub id as an attribute in the issue items, we need to create a new script accessible through a URL like bugs.python.org/redirect/BPO-ID that redirects to the corresponding GitHub issue. This could be deployed and tested with fake IDs even before the migration starts.

The plan is to replace #XXXXX issues references with BPO-XXXXX in messages and set GitHub autolinking to point to bugs.python.org/redirect/XXXXX, which in turn redirects back to the corresponding GitHub issue.

Message conversion and formatting

This issue is about converting and formatting the content (text) of the bpo messages (not the issue metadata) before importing them into GitHub.

bpo messages are raw text with no formatting, whereas GitHub issues use Markdown. If messages are imported directly, special characters in the bpo messages might be wrongly interpreted as Markdown formatting, resulting in erroneous rendering.

Possible solutions:

Import messages within code-block markup, to render it literally:
- quick and easy solution, but the result looks ugly
- SymPy used this approach (see e.g. this issue)
Import messages as normal text, but escape special characters
- can this be done reliably?
- are there already existing tools that can do it?
Detect and convert to Markdown links, code blocks, lists, etc.
- can this be done reliably?
- are there already existing tools that can do it?

Edit: I went with option 3. It's not perfect, but it seems to work well enough.

Other considerations:

On bpo, links to other issues, messages, PRs, PEPs, etc. are added at rendering time either by Roundup itself or by using regexes (see also the list of special links in the devguide).
- #XXXX, issueXXXX, issue XXXX refs should be replaced by bpo-XXXX and possibly replaced after the migration
- msgXXXX and msg XXXX could be converted to markdown links to the corresponding bpo issue.
  - not ideal but should give enough context to locate the message on GH manually
- fileXXXX and file XXXX are not used frequently and could be ignored
  - ~~however a link to the file can be added in the message that attached it~~
- PEPs can be left alone since autolinking can already turn them into links
  - we might want to convert PEP xxx to PEP-xxx or the autolinking won't work
  - missing leading 0s also break the link (see python/peps#2420)
- GHXXXX, GH XXXX, PRXXXX, PR XXXX, pull request XXXX, BPOXXXX, BPO XXXX should all be hyphenated or the autolinking won't work
- Old SVN refs (rXXXXX) link to https://hg.python.org/lookup/rXXXXX but are currently broken
  - These will be left unchanged
- Files (Lib/somefile.py, Modules/somemodule.c, Doc/somedocfile.rst) can be converted to markdown links
- Traceback are now within code blocks, so files in the tracebacks can't be converted into links
  - Tracebacks have been left unchanged
  - We could list them after the traceback, but probably it's not worth the effort
  - It might be interesting to have an action that does this down the line though
The same regexes can be used to convert all these links to Markdown.
Issue numbers can also be remapped from the bpo to the GH numbers during the same step.
- There is no way to know the new GH number in advance
- The transfer tool can rewrite references like #xxxx during the transfer but only for issues that have been transferred already
- Converting them to bpo-xxxx prevents rewrite and can use the bpo redirect added in #17
We might want to preserve somewhere the original (raw) text.
- We can leave this on bpo, otherwise we would have to duplicate all messages.

TODO:

Convert the messages to Markdown
Add links to issues/PRs/msg

Replace the stats page

The stats page on bugs.python.org is used to display graphs and statistics about the issues. The stats page uses a JSON file created by the roundup-summary script and the issuestats.py script.

If we want to keep this functionality, an equivalent page should be created. This could also be combined with the dashboard discussed on #6.

Set up autonosy on labels

On bpo, when certain labels are selected, people are assigned automatically based on the expert index of the Devguide. GitHub doesn't seem to offer this out of the box, but there are actions (e.g. https://github.com/marketplace/actions/issue-label-notifier) that can add this functionality.

Migration and risk management plans

This issue describes the migration plan, testing strategy, execution plan, and risk management plan. This list of steps is not final, new steps might be added, the time estimates should be more accurate, and each step should be assigned to someone. This plan overrides PEP-588, and might eventually be turned into a PEP. For the time being is kept here for convenience.

This document uses the following terms:

(bpo) export: exporting issues from bpo (bugs.python.org) to a zip archive using a custom-made script
(ECI) import: importing the zip archive with the issues into a new repo on GitHub through the ECI (Enterprise Cloud Importer)
transfer: transferring issues from the repo where the issues got imported into an existing repo (e.g. python/cpython)
migration: the whole process including the three steps above and possibly additional minor steps

Migration plan

These are the steps required to migrate issues from bpo to GitHub:

Inform the users about the migration (~2w)
Start the migration by making bpo read-only
Export all issues from bpo (<1h -- ~22m without attachments)
Import issues in a new repo through the ECI (~~~25h~~ ~12h *)
Enable the issues tab on the cpython repo
Transfer issues to the cpython repo (~~~4-7d~~ ~20h **)
Possibly setup and run post-migration actions
Test everything and remove the issue template from the cpython repo
Inform the users that the migration happened

* Importing 500 issues (without attachments) on a Friday morning (Europe)/Thursday night (US) took 13m. We currently have almost 60k issues, so it should take around 25h. Earlier imports took about half of this time though, so it might depend on the server load. Further testing showed that it takes about 12h.

** The transfer has been optimized, and it now takes about 20h.

Testing strategy

Each step of the previous list should be tested (if possible):

✔️ Informing users is tested by telling them and see their reaction.
✔️ Should be tested on a local instance of bpo. The test should verify that it's not possible to create new issues nor editing existing ones (this includes both changing fields and adding new comments). Issue redirects can also be tested and enabled before the migration starts.
✔️ This has been tested several times already, but a full test export should be performed shortly before the actual migration.
✔️ Like 3. this has also been tested and should be tested with a full import before the actual migration.
✔️ The issue template config has been tested on a separate repo and on python/cpython.
✔️ We already performed a test import with a subset of the issues (~500). We will perform more tests using small subsets until all the issues are ironed out, and we should perform a full test import before doing the actual migration.
✔️ ~~GitHub Actions (e.g. updating issue references) can be tested on separate repos, and possibly added to the source tree before the migration starts.~~ we currently don't have any additional actions.
✔️ This is just a matter of merging a PR that removes the issue template config file. (python/cpython#32106)
✔️ This doesn't require testing for emails/social media, but it does for #12.

Execution plan

If all goes well, these are the actions that we will take:

Users should be informed through different means, including but not limited to mails to python-dev/python-commiters, posts on Discourse, blog posts and other social media, and a banner on bpo.
- Discourse announcement and update (@ambv, @ezio-melotti)
- python-dev announcement (just a link to the Discourse thread) (@ambv)
- bpo banner: psf/bpo-tracker-cpython#10 (@ezio-melotti)
[Fri 25, evening] When the migration starts, the PR that makes bpo read-only will be merged and tested. The PR should also include a banner for bpo to explain users that the migration is in progress.
- Merge psf/bpo-tracker-cpython#16(@ezio-melotti)
- Disable python/cpython -> bpo webhook (@ezio-melotti)
- Test that bpo is read-only (@ezio-melotti, @ambv)
[Fri 25, evening] After the PR has been merged and deployed, and after verifying that bpo is read-only, the export tool will be used to produce a zip file.
- use the export tool to create the zip (@ezio-melotti)
[Fri 25, evening] The zip file will be then fed into the ECI. Given the amount of issues, the ECI might timeout and must be monitored to ensure that the import completes successfully. This will result in a new and separate repo that will include all the bpo issues.
- Import the archive into the ECI (@ezio-melotti)
- ~~Start a backup import ~4h in (@ezio-melotti)~~
  - GitHub says it will only increase the load and make the first import slower
- Save the migration ID/GUID of the import (@ezio-melotti)
- Get the name of the on-call GitHub engineer (@ezio-melotti)
- Monitor the import overnight until it's complete (@ezio-melotti, GitHub team)
  - If the import gives an error, use the "Retry" button to resume
  - If it gets stuck without errors, ping GitHub
[Sat 26, morning] At this point, we can enable the issues tab, with the issue template config already in place.
- Enable the issues tab (@ezio-melotti)
[Sat 26, morning] After everything is ready, we will inform GitHub. They will then start the issue transfer. This will need to be monitored in case of errors.
- Inform the GitHub team (@ezio-melotti)
- Start the transfer and monitor it until it's complete (GitHub team, @ezio-melotti, @ambv)
[Sun 27, morning] Once the transfer is complete, we might need to run some post-migration actions (e.g. to update issue references). We will also manually run some of the other installed actions to make sure they work properly. Note that some actions might need to be tested after the next step. (@ambv, @ezio-melotti)
- Retrieve issue mapping from the GitHub team (@ezio-melotti)
- Update the github field of all issues on bpo (@ezio-melotti)
- Merge psf/bpo-tracker-cpython#17 (@ezio-melotti)
- Update bpo-* autolinking on python/cpython (@ezio-melotti)
- TBD (@ambv, @ezio-melotti)
[Sun 27, morning] Once all the issues have been transferred and tested, the issue template config will be removed by the cpython repo, allowing users to create new issues.
- python/cpython#32106 (@ezio-melotti)
[Sun 27, afternoon] Pre-written messages will be sent out on MLs and social media to inform the users. The script required for #12 could be run now or later. Additional actions (e.g. weekly summary) could also be installed later.
- Update bpo banner: psf/bpo-tracker-cpython#12 (@ezio-melotti)
- Post a Discourse announcement (@ezio-melotti)
- Post a python-dev announcement (@ezio-melotti)
- Merge the devguide update PR (python/devguide#814) (@ambv, @ezio-melotti)
- Merge the docs.python.org issue links PR (python/cpython#32342) (@ezio-melotti)
- Remove the weekly summary cronjob on bpo (python/psf-salt#234) (@ezio-melotti)
- Remove irker on bpo (python/psf-salt#232) (@ezio-melotti)
- TBD

There are also a number of related changes that should be done:

After the migration, and once we have the bpo->GH mapping, we could:

replace bpo-* refs with actual GH-* refs (this enables the mouse-over popup)
replace the dependencies list with a checklist of GH issues (this enables task tracking)
replace the superseder with Duplicate of GH-* (this enables duplicates tracking)

These changes affect the "Last update" datetime, so we could do them lazily through a GitHub action whenever someone edits an existing issue.

Risk management plan

This section discusses the failures we might encounter during each step of the migration and suggest ways to prevent them and deal with them. None of these things are expected to happen, but we should have a plan B just in case.

Once we inform the users:
- They might protest, but at this point the migration is going to happen, so the best we can do is addressing their feedback to the best of our ability.
When we make bpo read-only:
- If we fail to make bpo read-only, the migration will be delayed until we verified that is not possible to create/edit issues. This should also be tested on a local copy of the tracker beforehand.
- If we make bpo read-only, but people (or bots) somehow manage to create a few issues and/or messages some other way, we could just inform them and ask them to recreate them on GitHub once the migration is done (if it's just bot messages we could even ignore them).
Exporting issues from bpo:
- This is easy to test but if somehow a new/recent issue/message breaks the exporter, I could try to identify and fix the problem on the fly, causing a small delay. If the issue is too complex to fix quickly, we might reopen bpo and reschedule the migration.
- We highly depend on devguide documentation to ease transition from bpo to Github Issues for users unfamiliar with Github issues.
Import issues in the ECI:
- This is also easy to test, but time-consuming. We could also import the archive twice at the same time, so that if an import fails the other might succeed. If they both succeed we will also have a backup repo in case something goes wrong during the transfer.
- If the import timeouts (as it often happens with big archives), a "Retry" button appears that will generally make the import resume. The timeouts also report a code and the migration id, and these can be used by GItHub to investigate the issue.
- If the import fails because of a problem with the archive, either the problem should be fixed by opening and editing the archive manually, or by fixing the exporting tool and exporting a new archive. A full test import before the migration should help mitigate this risk.
- If the import fails because of a problem with the ECI and can't be resumed, we will have to restart the import.
- If the PC performing the import crashes or in case of blackout, it won't be possible to hit "Retry" from the ECI, but we could use the migration IDs to resume and complete the migration. The migration IDs should be saved beforehand. If this happens soon after the migration starts, it might be better to restart it from the ECI.
Possibly partially lock the cpython repo:
- Once we decided if/how to do this we should be able to test it on a separate repo, so it shouldn't fail as long as we document the steps and follow them
- If locking doesn't work and people are somehow able to create issues, this will interfere with the numbering but I guess we will have to live with it (the numbering is changed anyway). As long as we advertise somehow that the migration is happening and users shouldn't create/edit issues, I think it's ok if those issues get lost.
Transfer issues to the cpython repo:
- This is handled entirely by GitHub team, so we have little control over this. It seems they have a certain degree of control, and they can transfer in batches and/or resume/retry the transfer. Doing a full test transfer will ensure that there no issues with problematic fields.
- If an issue can't be transferred, it might be possible to edit the source issue and try again. If the import stops at the first failure, we might be able to preserve the ID ordering, if not, it could also be transferred again at the end or even after the migration.
- Transferring deletes issues from the source repo, so -- unless there is a way to preserve them -- if something goes wrong and the transfer needs to be performed again, the archive will need to be imported again. This could be done preemptively so that after exporting the bpo issues we import the archive twice in two separate repos.
Possibly setup and run post-migration actions
- This depends on the actual actions being executed.
- Once the migration is completed successfully, every other non-critical action could be done afterward, and should only cause minor inconveniences.
Unlock the cpython repo and test everything
- If something went wrong, we could disable the issues tab and unlock the repo while we investigate. We might be able to fix the issue directly, or possibly we will have to lock it again for a short time to re-import a few issues. Worst case scenario we will have to wipe away all issues and redo the transfer from scratch. Having a script able to inspect/edit/remove one or more issues through the API (since if the issues tab is disabled we won't be able to do it from there) might be helpful.
Inform the users that the migration happened

We should be able to address any concern that didn't arise before the migration after the migration is complete. Informing the users clearly, widely, and in advance will help ensure that people knows about the migration, about what is getting transferred, about the duration of the downtime, and other things. This should help minimize surprises and hostile reactions.