microsoft / scalar Goto Github PK
View Code? Open in Web Editor NEWScalar: A set of tools and extensions for Git to allow very large monorepos to run on Git without a virtualization layer
License: MIT License
Scalar: A set of tools and extensions for Git to allow very large monorepos to run on Git without a virtualization layer
License: MIT License
Repo registration (with Scalar.Service
) should happen during scalar clone
rather than mount
.
Additionally, the clone
verb itself should update the registration file (or each clone should create its own file) and Scalar.Service
should read the file(s) to discover which repos have been registered.
Additionally:
Scalar.Service
should also remove repos when it finds they're no longer on disk.It would be ideal if the versioning scheme gave a better indication of the time of release and the milestone associated with the bits.
I propose the following, where {build} = counter(SourceRef, 0)
Source ref | Version template | Version example | Build Number |
---|---|---|---|
refs/heads/releases/19.08.157 | {yy}.{MM}.{Milestone}.{build} | 19.08.157.1 | Release-19.08.157.1 |
refs/pull/25/merge | 10.20.{PRNum}.{build} | 10.20.25.1 | PR-25.1 |
refs/heads/master | {yy}.{MM}.{dd}.{build} | 19.08.10.33 | CI-master.33 |
refs/tags/tagname | {yy}.{MM}.{dd}.{build} | 19.08.10.34 | CI-tagname.34 |
Can we get rid of LibGit2 entirely? Here are some tradeoffs:
CommitAndRootTreeExists()
exists for a VFS for Git reason: to see if we need to prefetch the folders at a commit on clone time so we can generate an index before projecting. This isn't needed any more.LooseObjectsStep
checks for corrupt loose objects. We'll have fewer objects with the batched read-object, and we could teach git pack-objects
to clear corrupt objects, perhaps.Outside of that last one, many of these changes are super small and don't have a huge impact on the full story.
The core.gvfs
config setting does a lot of things, including block unwanted commands.
This setting was dropped as part of the rename effort (#38) and should be put back for now.
However, there are a lot of things that config options does that we may not want it to do in the Scalar world. Update Git to split those actions apart based on other config options or add a core.scalar
for our situation.
For example, we still want to block git gc
, but that could be part of core.virtualizeobjects
instead.
I'm not sure what instance of "GVFS" in the codebase causes the installer to write into C:\Program Files\GVFS, but it requires the GVFS.Service and other GVFS.Mount processes to be terminated for the Scalar installer to work.
UPDATE
The upgrade steps are run as part of scalar mount
will will be going away as part of the mount removal process.
We should remove the upgrade code as part of removing the mount process, and if in the future we need to perform disk layout upgrades it will need to be driving by the service and/or the installer.
We no longer need back-compat logic for previous GVFS disk layouts. We will dramatically change the way we store the repo config, and hopefully do so before we ship to EA.
At some point, we will not allow breaking changes and then will need upgrade logic. Should we delete the disk layout code now and then redesign/reimplement the upgrade logic when we need it?
cc: @mjcheetham
After adding a large set of cones to the repo using scalar sparse --add-stdin
the first git status took a long time:
~/ScalarTests/repo/src>git status
On branch master
Your branch is up to date with 'origin/master'.
It took 17.52 seconds to enumerate untracked files. 'status -uno'
may speed it up, but you have to be careful not to forget to add
new files yourself (see 'git help status').
nothing to commit, working tree clean
If we had the sparse
verb call git status
before it finishes users would have a better experience running git status
for the first time.
New name TBD.
Related to #11
There is code in HooksInstaller
(e.g. MergeHooksData
) that is specific to GitHooksLoader and can be removed.
This is part of the work required to eliminate the mount process.
Special care will need to be taken regarding ACLs. The service runs with elevation, and we need to make sure that non-elevated git processes are still able to use the files produced by the maintenance tasks.
The service runs with elevation, and we need to make sure that non-elevated git processes are still able to use the files produced by the maintenance tasks.
As an alternative, we should investigate running Scalar.Service as the user rather than as admin.
In order to have partial resumability in the event of a network failure, we should limit the number of objects we request in one go and ask for multiple batches rather than one large batch.
This also opens the door for parallelization later.
Goal: provide a user-friendly experience around configuring the sparse checkout.
Scope: Verb to add and remove entire directories from sparse enlistment.
Non-goals: High-performance application of sparse-checkout file.
Rather than sending a message to the mount process, investigate simply acquiring the objects inline.
Should rely largely on yaml from #3. Publishes installers.
While acquiring a set of objects prior to a workdir changing operation like checkout or reset, we should show progress similar to fetch's.
Steps to reproduce:
git add
all of the changes made abovegit restore --staged .
git status
)Result:
Segmentation fault: 11
The GitHooksLoader should not be needed, as we only have the read-object hook, which is already native.
Copied from microsoft/VFSForGit#1447
Currently the log file only includes the overall time, and not the time of the git command specifically.
Set scalar.telemetry-pipe in the installer to get telemetry from scalar daemons.
This is a peer of gvfs.telemetry-pipe.
scalar.telemetry-pipe=scalar-c780ac06-135a-4e9e-ab6c-d41e2d265baa
We do NOT need the corresponding scalar.telemetry-id (like we have in gvfs).
The functional tests use an old copy of the GVFS repo, so all of the paths use GVFS in the names. Those paths were modified automatically as part of the rename operation (#38).
As functional tests are re-added, we will need to revert the changes to those paths, but it will be a manual process.
When installing the product, we need to manually unmount all scalar mount processes.
While we do that
The README is a leftover from VFS for Git. It needs updating. Perhaps it should just point to the roadmap for now?
BUG: if we run git sparse-checkout set A A/B
, then A
is registered as a recursive closure AND A/B
is marked as a recursive closure. This also means that A
is marked as a parent path.
This results in Git complaining that the patterns are not cone-style, and reverts to the slow pattern matching algorithm.
To fix, consider removing paths from the "parent" list if they are in the "recursive" list. Further: remove children from the recursive list.
We need to investigate what we can do about git add
in our target enlistment.
git add -p
from src took 31s.git add .
from src took 43s.These are no-op adds with fsmonitor and untracked cache.
scalar sparse --add
needs progress indicators. These progress indicators need to be in two places (at least):
BlobPrefetcher
needs to provide feedback as it discovers and downloads blobs.
git read-tree -mu HEAD
needs to provide feedback as it populates the working directory.
These are very different solutions, so this issue will track git read-tree -mu HEAD
.
We need to produce a simple scripted installation for macOS that pulls together scalar, git, gcm core, watchman, and internal tooling and correctly configures everything. This is to support demo scenarios and automation like perf and large build runs.
For optimal performance, we need git status
to run in O(modified) time. The fsmonitor
feature exists in Git, and we should take advantage of it.
@dscho is working on this, but I can't assign it to him for some reason.
We'll want to gather objects in bulk before a workdir changing operation like checkout.
To give us high confidence that customer satisfaction will be greater than it would be on VFS for Git, I think we want to measure identical scenarios on both VFS for Git sparse mode and scalar. Ideally we can also configure what part of the cone is available, so we can compare and contrast different sizes of enlistments. We should use representative sparse enlistments for different segments of our customer base.
Thoughts on the approach or execution?
If we supply the same set of paths to git sparse-checkout add
and scalar prefetch --stdin-folders-list
, the prefetch
command gets a smaller set of files than the sparse-checkout requires when writing files to disk. This leads to a very slow first checkout, even after prefetching.
The real solution is described in #36.
However, it may be worth a temporary fix to the BlobPrefetcher
to match a few more paths and speed this up in the short term.
This cleanup task will make it easier to remove the mount process.
We'll fully embrace the new project model and drop remaining support for .NET Framework.
We have the SparseVerb
from VFS for Git. Update it to be a small layer over git sparse-checkout add
with an additional scalar prefetch --folders-list
first. That prefetch, along with #62, will make the expansion much faster.
scalar sparse --add
needs progress indicators. These progress indicators need to be in two places (at least):
BlobPrefetcher
needs to provide feedback as it discovers and downloads blobs.
git read-tree -mu HEAD
needs to provide feedback as it populates the working directory.
These are very different solutions, so this issue will track BlobPrefetcher
.
After #76, update the sparse-mode functional tests to work with a sparse scalar clone.
If the mount process is not running the clone
verb will be unable to download the objects it needs to complete the checkout.
Work in microsoft/vfsforgit sometimes needs a corresponding change here in microsoft/scalar.
Add a comment linking to the PR(s) that need porting to Scalar.
(Use ๐ to indicate you are working on it, ๐ to indicate the item is done. ๐ for "don't need")
Create a functional test set that follows a typical workflow around a sparse enlistment:
scalar clone --sparse=true
scalar sparse add
May be combined with #76.
This issue is to facilitate discussion.
In microsoft/git#171, we introduce the git sparse-checkout
builtin. This has the features we need to get moving on the sparse clones in Scalar, but it is not ready for merging into vfs-2.22.0
. In particular, we need to get feedback from the mailing list before we take a hard dependency on it, especially in the shipped version with microsoft/vfsforgit.
Here is my proposal:
Create a new feature branch features/sparse-checkout
in microsoft/git.
The feature branch will include all updates to sparse-checkout
(#8) and batch object downloading (#7, #36).
As vfs-2.22.0
advances, we can dual-checkin if it is a critical change. This should happen rarely as we are mostly doing upstream-first development in Git for VFS.
As git/git and git-for-windows/git ship new versions, microsoft/git gets a new vfs-2.XX.0
branch. The features/sparse-checkout
will then be rebased on top of that using a force-push.
As features/sparse-checkout
updates, we generate installers with suffix -sc
to indicate this is something to consume in Scalar but not VFS for Git.
This setup should allow us to merge PRs like #54 and start working on functional tests, follow-up features, and perf tests.
/cc @jrbriggs, @wilbaker, @jeffhostetler, @kewillford, @jeschu1, @mjcheetham, @garimasi514, @nickgra.
Rather than fall back on read-object one-by-one, let's precompute what's needed, a la partial.
There is some code specifically for 'old' clones that we should remove.
Ex.
scalar/Scalar/CommandLine/ScalarVerb.cs
Line 807 in 8372f6b
Please audit the code the code and ensure any unnecessary code is removed.
During a scalar clone
, we can go ahead and set up the default fsmonitor hook if we detect that watchman
is installed. This is orthogonal to #66, as we can assume the demo machine already has watchman installed independently. When #66 is complete, then the check will be redundant, but the hook placement will still work.
Customers will always be configured to use Watchman in Scalar repos, and so our functional tests should configure/use Watchman as well.
Must have:
macOS build + unit tests
Windows build + unit tests
The current algorithm for matching in the sparse-checkout
file will not scale to thousands of patterns over millions of files. We need something better.
Match the prefix-matching pattern from the VFS for Git Sparse Mode.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.