psf / gh-migration Goto Github PK
View Code? Open in Web Editor NEWThis repo is used to manage the migration from bugs.python.org to GitHub.
This repo is used to manage the migration from bugs.python.org to GitHub.
GitHub uses the same namespace for issues and PRs, and the current PR numbers already overlap with the original bpo numbers.
Current situation:
Questions and issues:
original_number + 100k
), >143k new GH issues/PRsoriginal_number + 40k
), >82k new GH issues/PRsoriginal_number
), 43k-56k old SF issues, 57k-80k current PRs (if PRs can be renumbered), >80k new GH issues/PRs0 23k 27k 41k 87k 101k 143k
|PRPRPRPRPRP|_________________________________|SFSFSF|BPOBPOBPOBPOBPOBPOBPO|NEW...|
|PRPRPRPRPRP|__|SFSFSF|BPOBPOBPOBPOBPOBPOBPO|NEW...|
_|BPOBPOBPOBPOBPOBPOBPO|SFSFSF|PRPRPRPRPRP|NEW...|
1k 43k 57k 80k
PR
: current PRs; SF
: old SourceForge issues; BPO
: current BPO issues; NEW...
: new issues; _
: unused range.Other considerations:
max(issue_ids) + 1
) [confirm with GH].original_number + 40k
or original_number + 100k
it will be easier to find the corresponding issue without relying on a mapping.Update
The first step of the migration is making BPO read-only. The Roundup wiki has a page about making a tracker read-only, however this requires both code changes and fiddling with the roundup-admin
.
A better approach would be:
Users
and Developers
) in schema.py
The banner pointing to GitHub will initially be more prominent and mention that the migration is in progress, informing users that it won't be possible to report and edit issues during the migration. After the migration is complete it will be updated to be less prominent.
This should be tested locally first, and the test should include:
Admin
/Coordinator
)These changes should be made available with a single PR, that can be merged (and if needed, rolled back) easily.
We should also determine what access level we should leave to users, since they might be able to access/remove their bpo accounts, updating/removing their email address, names, GH username, timezone, etc, and possibly even messages. Logged users also have access to summaries about the issues they created and that they are following, so they should also be able to review those summaries.
The sendmail Roundup detector has been used to send mails to the new-bugs-announce mailing list and the python-bugs-list mailing list whenever a new issue was created or a new message posted respectively. The addresses are configured in the bugs.sls file.
This issue describes the migration plan, testing strategy, execution plan, and risk management plan. This list of steps is not final, new steps might be added, the time estimates should be more accurate, and each step should be assigned to someone. This plan overrides PEP-588, and might eventually be turned into a PEP. For the time being is kept here for convenience.
This document uses the following terms:
These are the steps required to migrate issues from bpo to GitHub:
cpython
repocpython
repo (cpython
repo* Importing 500 issues (without attachments) on a Friday morning (Europe)/Thursday night (US) took 13m. We currently have almost 60k issues, so it should take around 25h. Earlier imports took about half of this time though, so it might depend on the server load. Further testing showed that it takes about 12h.
** The transfer has been optimized, and it now takes about 20h.
Each step of the previous list should be tested (if possible):
3.
this has also been tested and should be tested with a full import before the actual migration.python/cpython
.If all goes well, these are the actions that we will take:
python-dev
/python-commiters
, posts on Discourse, blog posts and other social media, and a banner on bpo.
python/cpython
-> bpo webhook (@ezio-melotti)github
field of all issues on bpo (@ezio-melotti)bpo-*
autolinking on python/cpython
(@ezio-melotti)cpython
repo, allowing users to create new issues.
There are also a number of related changes that should be done:
.github/actions/
on python/cpython
After the migration, and once we have the bpo->GH mapping, we could:
bpo-*
refs with actual GH-*
refs (this enables the mouse-over popup)Duplicate of GH-*
(this enables duplicates tracking)These changes affect the "Last update" datetime, so we could do them lazily through a GitHub action whenever someone edits an existing issue.
This section discusses the failures we might encounter during each step of the migration and suggest ways to prevent them and deal with them. None of these things are expected to happen, but we should have a plan B just in case.
Once we inform the users:
When we make bpo read-only:
Exporting issues from bpo:
Import issues in the ECI:
Possibly partially lock the cpython
repo:
Transfer issues to the cpython
repo:
Possibly setup and run post-migration actions
Unlock the cpython
repo and test everything
Inform the users that the migration happened
As a follow-up of #15, once we have the GitHub id as an attribute in the issue items, we need to create a new script accessible through a URL like bugs.python.org/redirect/BPO-ID
that redirects to the corresponding GitHub issue. This could be deployed and tested with fake IDs even before the migration starts.
The plan is to replace #XXXXX
issues references with BPO-XXXXX
in messages and set GitHub autolinking to point to bugs.python.org/redirect/XXXXX
, which in turn redirects back to the corresponding GitHub issue.
After the migration, we should send users an email informing them that the migration happened and listing issues that have been created by them, assigned to them, and followed by them.
In order to do this, we should write a tool that goes through all the users, gathers the data, formats the messages, and sends them to the users. This summary could be made available on bpo too, and the email could contain a link in addition to or instead of the lists (these summaries are already available in the sidebar for logged-in users).
This works well for occasional contributors that are involved in less than one or two dozens of issues, since they can go through them, review them, and resubscribe manually (if they are still interested). However it doesn't scale too well for people that follow hundreds of issues, and having a way to resubscribe users to issues that they were following would be better (see #5 under "nosy list").
Even if we find a way to preserve the nosy list during the migration, the email would still be useful because:
The exact wording and format of the email still needs to be determined:
Given the number of users, we might have to take some care in sending out a large number of emails at once, since it might be seen as spam.
The roundup-summary script script has been used to send weekly reports to the python-dev ML. The script is executed once a week by a cron job.
Adding a dashboard similar to the Django dashboard has been proposed on the TrackerDevelopmentPlanning wiki page and discussed. Custom views provided by GitHub issues could be a simpler alternative to the dashboard.
The local_replace.py extension has been used to automatically convert parts of messages posted to b.p.o into links, including: issues, messages, reference numbers, PEPs, files, tracebacks, etc. (see the triaging page of the devguide). Some of these links are already automatically handled by GitHub.
If we want to create links for the rest, a new tool should be written.
This tool will be used for new messages, but it might also be used either at import time, or after the issues have been imported to GitHub.
The stats page on bugs.python.org is used to display graphs and statistics about the issues. The stats page uses a JSON file created by the roundup-summary script and the issuestats.py script.
If we want to keep this functionality, an equivalent page should be created. This could also be combined with the dashboard discussed on #6.
Now that we migrated to GitHub, I think the following repos can be archived:
@ewdurbin: can you look into it? Am I missing any other repo?
In order to import data into GitHub we need to export bpo data in a format compatible with the importer tool.
There are at least 5 ways to do this:
The first option is likely the easiest solution. The script that generates the weekly "Summary of Python tracker issue" does something similar to access the database and extract data about the issues. The Roundup documentation has a table that summarizes the available functions.
By using one of these solutions, we can write a tool that extracts the data from bpo and rearranges them in the right format. The tool will also need reformat the issues (see #3), rearrange the labels, and possibly make other changes. The first version of the tool doesn't need to include these changes -- they can be added once we solved the other issues.
We should also take care of exporting attachments such as patches, sample scripts, screenshots, etc..
Update (2021-09-16)
I'm writing a tool using the first option above:
On bpo, when certain labels are selected, people are assigned automatically based on the expert index of the Devguide. GitHub doesn't seem to offer this out of the box, but there are actions (e.g. https://github.com/marketplace/actions/issue-label-notifier) that can add this functionality.
For example:
https://bugs.python.org/issue2116#msg87822
python/cpython#46370 (comment)
Committed in r72662, r72670. Thanks!
At the end of the migration, GH will provide a file that maps the old bpo IDs to the corresponding GH IDs. We should:
The GH ID should then be used
This can be accomplished by editing respectively the issue.item.html
template (adding the link at the top, instead of the editing form) and the issue.list.html
template (adding a column with the GH ids next to the bpo ids that sends to the GH issue once clicked). This should be implemented as a separate PR to be merged after the migration.
On bpo users can specify their GitHub username. If they do so, their bpo issues/comments can be mapped to their GitHub users, however this only works for users that belong to the "python" organization.
For users with a GitHub username that don't belong to the "python" org and for users that haven't specified their GitHub usernames, a placeholder user (called mannequin) with either their GitHub or bpo username will be created.
The mannequin will only show the username, so:
Mannequins can be manually reclaimed after the import, but this might still be impossible if the users don't belong to the org. A possible workaround is to create a new org, add all the bpo users that have a GitHub username to that org (possibly without sending out notifications), perform the import there so that all the users get mapped, then copy all the issues to python/cpython
and remove the new org. This might preserve the user mapping even if the users don't belong to the "python" org.
python/cpython#90908 (comment) has two incorrect references. The first one goes to GH-75453, which is bpo-31270. It should be GH-31270 (https://bugs.python.org/issue46752#msg413260). Note that the reference is correct in the following 'New changeset' comment.
This is the only case I've seen so far, but I haven't systematically looked for more.
The irker detector has been used to post updates to the #python-dev-notifs
IRC channel (see the bugs.sls file).
Roundup has a "core" and one or more tracker "instances". The fork of Roundup that we are currently hosting/running is used by 3 instances:
Since the other two instances rely on us, we need to keep this into account before we shut down Roundup/bpo.
Jython
Roundup
CPython
After the migration to GitHub issues is completed, we have at least 3 options:
Make bpo read-only and keep Roundup running;
Create a static mirror of bpo and shut down Roundup:
Create a script that redirects to the corresponding GH issue:
Since this issue is non-blocking, we can adopt the first option and then switch to the second or third down the line, depending on what the other projects do.
This issue is about converting and formatting the content (text) of the bpo messages (not the issue metadata) before importing them into GitHub.
bpo messages are raw text with no formatting, whereas GitHub issues use Markdown. If messages are imported directly, special characters in the bpo messages might be wrongly interpreted as Markdown formatting, resulting in erroneous rendering.
Possible solutions:
Edit: I went with option 3. It's not perfect, but it seems to work well enough.
Other considerations:
#XXXX
, issueXXXX
, issue XXXX
refs should be replaced by bpo-XXXX
and possibly replaced after the migrationmsgXXXX
and msg XXXX
could be converted to markdown links to the corresponding bpo issue.
fileXXXX
and file XXXX
are not used frequently and could be ignored
PEP xxx
to PEP-xxx
or the autolinking won't work0
s also break the link (see python/peps#2420)GHXXXX
, GH XXXX
, PRXXXX
, PR XXXX
, pull request XXXX
, BPOXXXX
, BPO XXXX
should all be hyphenated or the autolinking won't workrXXXXX
) link to https://hg.python.org/lookup/rXXXXX
but are currently broken
Lib/somefile.py
, Modules/somemodule.c
, Doc/somedocfile.rst
) can be converted to markdown links#xxxx
during the transfer but only for issues that have been transferred alreadybpo-xxxx
prevents rewrite and can use the bpo redirect added in #17TODO:
This issue is about issue metadata (priority, versions, status, etc.), how/where to import them in GitHub, and what metadata to keep/add/remove/update. User/comment/file metadata will be discussed in a separate issue.
bpo tracks different metadata for each issue (see e.g. https://bugs.python.org/issue2771 ) including: title, comments, files (attachments), creator, creation, actor, activity, type, stage, components, versions, status, resolution, dependencies, superseder, assigned to, nosy list, priority, keywords, remote HG repos, linked PRs
The meaning of each field is explained in the devguide. The fields are defined in the schema.py of the bpo instance. The creator, creation (datetime), (last) actor, (last) activity (datetime) are common to all classes.
GitHub already has corresponding fields for the followings: title, messages (comments), linked PRs, assigned to (assignees), creator (user) and creation (created_at).
❓ Does GitHub have fields for (last) actor, (last) activity (datetime)? Do we need them?
The other fields will need to be replaced with something else (mostly labels) or removed.
Labels in GitHub can be grouped either with colors, and/or with a prefix like priority-high
, priority-medium
, priority-low
. GitHub is working on adding custom fields, but they will be available in ~6 months.
Actions can be used to automate certain tasks in addition or instead of bots (e.g. adding labels, closing stale issues, etc.).
Unused metadata that are not converted to labels (or anything else) can be stored in a comment so that can be retrieved if needed (e.g. if we move away from GH).
On the python/cpython there are currently 32 labels:
awaiting change review
, awaiting changes
, awaiting core eview
, awaiting merge
, awaiting review
type-bugfix
, type-documentation
, type-enhancement
, type-performance
, type-security
, type-tests
needs backport to 3.6
-3.10
automerge
, DO-NOT-MERGE
, skip issue
, skip news
, test-with-buildbots
CLA not signed
, CLA signed
OS-mac
, OS-windows
invalid
, ctypes
, dependencies
, expert-asyncio
, spam
, sprint
, stale
This is the full list of all the fields we have in Roundup, and how we could convert them to GitHub Issues:
bpo-xxxxx
issues, even if they are moved after the table the checkboxes won't be updated automatically.Duplicate of #xxxxx
as a reply marks the issue as duplicate. A default "duplicate" reply can also be added to the saved replies (the icon with the left-pointing arrow on the top-right).
bpo-xxxxx
ref, so it can't be used for imported issues
bpo-xxxxx
ref with a GH ref after the migration404
, and 14 are unreachableso:
test_*.py
and *.rst
files (not sure if they should be types -- they were components on bpo and got added in python/bedevere#108)The stage could use the existing stage labels. An awaiting triaging might be added.
There are currently 3 statuses: open, pending, closed
Events are now created for closed/reopened issues
Issues are labeled with the stale label when pending
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.