emgarten / nuget.catalogreader Goto Github PK
View Code? Open in Web Editor NEWNuGet v3 catalog reader
License: MIT License
NuGet v3 catalog reader
License: MIT License
Thanks for providing a very usable and useful project! I followed the quick start steps and tried to execute NuGetMirror nupkgs https://api.nuget.org/v3/index.json -o d:\tmp
without the ignore errors switch. After a bit, it stopped due to this exception:
Unable to download Tabster.Core 1.0.0
- [System.Xml.XmlException] System does not support 'Windows-1252' encoding. Line 1, position 31.
[System.Xml.XmlException] System does not support 'Windows-1252' encoding. Line 1, position 31.
Not sure if this is an expected exception; Using --ignore-errors
did help. I was wondering if you could mention this switch prominently in the README here itself, or maybe include a link to your blog post which gave me a clue about using --ignore-errors
.
Environment: Microsoft Windows Server 2019 Datacenter 10.0.17763 Build 17763. en-US.
Today, the caching mechanism of CatalogReader
is based on time. Catalog is an append-only structure so caching can be done in a smarter way.
I can think of these options for improving this:
commitTimeStamp
foreverFor NuGet.org catalog implementation, this should be sufficient since catalog items never change and only the last page of the catalog changes. Since there is no way to compare catalog pages other than commitTimeStamp
, we have to treat all pages with this MAX commitTimeStamp
value as the "last" page. In reality, there is only ever one page with the MAX commitTimeStamp
since a bit of time always pages between two commits.
However, both CatalogReader
and NuGet.org's CommitCollector
handle any time when a page or catalog item gets a new commitTimeStamp
, even if it's a catalog item that already exists or a page that isn't the last. We would be losing this flexibility. This may be acceptable but since the catalog is not officially spec'd and there may be other implementations out there, it's hard to say whether this is a good idea.
commitId
as part of the cache key and cache pages and items foreverThis retains the flexibility lost in option 1 but bloats the HTTP cache. There will be N copies of each page in the cache, where N is the number of different commits observed by the reader on that page.
This is a probably the simplest solution.
commitId
for all pages and items in an external store (JSON file?)This avoids the bloat of option 2 but has additional complexity since now we have to invent a new data store thingy.
What are your thoughts?
Also, am I missing something here?
I like option 1 the best. When I get to documenting the V3 protocol, I hope to mandate that the only mutable catalog page is the last and that catalog items are immutable.
/cc @emgarten
It would be great if a NuGet package is available for this project, I have a thought to design a small Blazor wasm app similar nuget.org and that will list all packages in azure feed, what's more I would be using static website in the same storage account and if the static site is all good with features I will make it a public solution in this way we have sleet and then side by side sleet browser.
Let's say I want to discover all the packages that depend on another package, for example, Newtonsoft.JSon , is it possible to do that sort of query with CatalogReader?
Thanks!
Hi,
The website from which the schema is retrieved schema.emgarten.com seems to be down and causes an InvalidDataException when launching the app.
I haven't found a good tool to do this yet but I would love for there to be an option like "--include-dependencies" that when combined with "-i" would pull down the whole chain of dependencies for any included id. I work on an isolated network and mirroring the entire repository is possibly overkill so it would be great if you supported this. It could even go one step further and allow you to specify the target platform(s) for which to pull the dependencies.
Example:
nugetmirror nupkgs https://api.nuget.org/v3/index.json -o c:\packages -i BenchmarkDotnet --include-dependencies netcoreapp11,netframework46
Hi
I have been trying out your app and left it for a while (1 hr or so) then came back and had 34,772 Files, 17,451 Folders.
I stopped the app deleted the folders but left just the cursor
I then started it again and made a coffee (thats the important part) came back and it was downloading what looks like all of the files again into the tmp folder? I am currently on 10868 and decided I would raise it as a question. Is there a better way to restart the app in case of a failure or an outage or simply moving old files from the disk?
P.S love what it does thanks for the effort of making it public for people like me to try :)
Hi, I'd like to use nugetmirror to download all nugets in nuget.org that have more than a configurable amount of downloads with their dependencies.
Is there any way to do this with the current CLI or with small changes to the existing code?
The NuGetMirror executable could upload directly to Amazon S3.
For example, if one would want to resolve a dependency hell issue with all version of .NET Core (1.0, 1.1, 2.0) and all the included templates (mvc, console, xunit etc...),
he would maybe look for a way to download all versions of all packages uploaded by specific official organizations, like Microsoft, aspnet, EntityFramework, dotnetframework, instead of mirroring the whole 990k nuget package repository (which is also full of junk packages, with no cleanup in sight)
I would attempt to dive into the code in a fork and attempt to implement such feature, but I do not have the time currently, so for now, just wanting to open a discussion on the matter..
Is possible to add the support for the feed with authentication?
feed example: https://nuget.telerik.com/v3/index.json
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.