emgarten / nuget.catalogreader Goto Github PK

View Code? Open in Web Editor NEW

44.0 6.0 11.0 451 KB

NuGet v3 catalog reader

License: MIT License

Batchfile 0.10% PowerShell 2.14% Shell 0.68% C# 97.07%

nuget mirror feeds

nuget.catalogreader's People

Contributors

Stargazers

Watchers

Forkers

nkolev92 jainaashish jmarolf asaf400 rytmis mazeadmin noamkfir aaronontheweb lyqwaterway anamnavi arvindshmicrosoft

nuget.catalogreader's Issues

Upgrading to LTS dotnet ?

XmlException: System does not support 'Windows-1252'

Thanks for providing a very usable and useful project! I followed the quick start steps and tried to execute NuGetMirror nupkgs https://api.nuget.org/v3/index.json -o d:\tmp without the ignore errors switch. After a bit, it stopped due to this exception:

Unable to download Tabster.Core 1.0.0
        - [System.Xml.XmlException] System does not support 'Windows-1252' encoding. Line 1, position 31.
[System.Xml.XmlException] System does not support 'Windows-1252' encoding. Line 1, position 31.

Not sure if this is an expected exception; Using --ignore-errors did help. I was wondering if you could mention this switch prominently in the README here itself, or maybe include a link to your blog post which gave me a clue about using --ignore-errors.

Environment: Microsoft Windows Server 2019 Datacenter 10.0.17763 Build 17763. en-US.

Smarter caching on catalog pages and items?

Today, the caching mechanism of CatalogReader is based on time. Catalog is an append-only structure so caching can be done in a smarter way.

I can think of these options for improving this:

1. Cache items (leafs) forever and pages with a non-MAX `commitTimeStamp` forever

For NuGet.org catalog implementation, this should be sufficient since catalog items never change and only the last page of the catalog changes. Since there is no way to compare catalog pages other than commitTimeStamp, we have to treat all pages with this MAX commitTimeStamp value as the "last" page. In reality, there is only ever one page with the MAX commitTimeStamp since a bit of time always pages between two commits.

However, both CatalogReader and NuGet.org's CommitCollector handle any time when a page or catalog item gets a new commitTimeStamp, even if it's a catalog item that already exists or a page that isn't the last. We would be losing this flexibility. This may be acceptable but since the catalog is not officially spec'd and there may be other implementations out there, it's hard to say whether this is a good idea.

2. Use the `commitId` as part of the cache key and cache pages and items forever

This retains the flexibility lost in option 1 but bloats the HTTP cache. There will be N copies of each page in the cache, where N is the number of different commits observed by the reader on that page.

This is a probably the simplest solution.

3. Store the `commitId` for all pages and items in an external store (JSON file?)

This avoids the bloat of option 2 but has additional complexity since now we have to invent a new data store thingy.

Conclusion

What are your thoughts?

Also, am I missing something here?

I like option 1 the best. When I get to documenting the V3 protocol, I hope to mandate that the only mutable catalog page is the last and that catalog items are immutable.

/cc @emgarten

NuGet Package Availability

It would be great if a NuGet package is available for this project, I have a thought to design a small Blazor wasm app similar nuget.org and that will list all packages in azure feed, what's more I would be using static website in the same storage account and if the static site is all good with features I will make it a public solution in this way we have sleet and then side by side sleet browser.

Finding packages that depend upon another package.

Let's say I want to discover all the packages that depend on another package, for example, Newtonsoft.JSon , is it possible to do that sort of query with CatalogReader?

Thanks!

Unable to find a service of type Catalog/3.0.0

Hi,

The website from which the schema is retrieved schema.emgarten.com seems to be down and causes an InvalidDataException when launching the app.

Support including dependencies when specifying Ids to include

I haven't found a good tool to do this yet but I would love for there to be an option like "--include-dependencies" that when combined with "-i" would pull down the whole chain of dependencies for any included id. I work on an isolated network and mirroring the entire repository is possibly overkill so it would be great if you supported this. It could even go one step further and allow you to specify the target platform(s) for which to pull the dependencies.

Example:

nugetmirror nupkgs https://api.nuget.org/v3/index.json -o c:\packages -i BenchmarkDotnet --include-dependencies netcoreapp11,netframework46

Possible bug?

Hi
I have been trying out your app and left it for a while (1 hr or so) then came back and had 34,772 Files, 17,451 Folders.

I stopped the app deleted the folders but left just the cursor

I then started it again and made a coffee (thats the important part) came back and it was downloading what looks like all of the files again into the tmp folder? I am currently on 10868 and decided I would raise it as a question. Is there a better way to restart the app in case of a failure or an outage or simply moving old files from the disk?

P.S love what it does thanks for the effort of making it public for people like me to try :)

Mirroring most downloaded nugets from nuget.org

Hi, I'd like to use nugetmirror to download all nugets in nuget.org that have more than a configurable amount of downloads with their dependencies.
Is there any way to do this with the current CLI or with small changes to the existing code?

Support S3 Mirroring

The NuGetMirror executable could upload directly to Amazon S3.

[Suggestion] Filtering packages by profile (uploader)

For example, if one would want to resolve a dependency hell issue with all version of .NET Core (1.0, 1.1, 2.0) and all the included templates (mvc, console, xunit etc...),

he would maybe look for a way to download all versions of all packages uploaded by specific official organizations, like Microsoft, aspnet, EntityFramework, dotnetframework, instead of mirroring the whole 990k nuget package repository (which is also full of junk packages, with no cleanup in sight)

I would attempt to dive into the code in a fork and attempt to implement such feature, but I do not have the time currently, so for now, just wanting to open a discussion on the matter..

Feed with authentication

Is possible to add the support for the feed with authentication?
feed example: https://nuget.telerik.com/v3/index.json

emgarten / nuget.catalogreader Goto Github PK

nuget.catalogreader's People

Contributors

Stargazers

Watchers

Forkers

nuget.catalogreader's Issues

1. Cache items (leafs) forever and pages with a non-MAX commitTimeStamp forever

2. Use the commitId as part of the cache key and cache pages and items forever

3. Store the commitId for all pages and items in an external store (JSON file?)

Conclusion

Recommend Projects

Recommend Topics

Recommend Org

1. Cache items (leafs) forever and pages with a non-MAX `commitTimeStamp` forever

2. Use the `commitId` as part of the cache key and cache pages and items forever

3. Store the `commitId` for all pages and items in an external store (JSON file?)