Coder Social home page Coder Social logo

nuget.catalogreader's People

Contributors

dee-see avatar dependabot[bot] avatar emgarten avatar jainaashish avatar joelverhagen avatar nkolev92 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

nuget.catalogreader's Issues

XmlException: System does not support 'Windows-1252'

Thanks for providing a very usable and useful project! I followed the quick start steps and tried to execute NuGetMirror nupkgs https://api.nuget.org/v3/index.json -o d:\tmp without the ignore errors switch. After a bit, it stopped due to this exception:

Unable to download Tabster.Core 1.0.0
        - [System.Xml.XmlException] System does not support 'Windows-1252' encoding. Line 1, position 31.
[System.Xml.XmlException] System does not support 'Windows-1252' encoding. Line 1, position 31.

Not sure if this is an expected exception; Using --ignore-errors did help. I was wondering if you could mention this switch prominently in the README here itself, or maybe include a link to your blog post which gave me a clue about using --ignore-errors.

Environment: Microsoft Windows Server 2019 Datacenter 10.0.17763 Build 17763. en-US.

Smarter caching on catalog pages and items?

Today, the caching mechanism of CatalogReader is based on time. Catalog is an append-only structure so caching can be done in a smarter way.

I can think of these options for improving this:

1. Cache items (leafs) forever and pages with a non-MAX commitTimeStamp forever

For NuGet.org catalog implementation, this should be sufficient since catalog items never change and only the last page of the catalog changes. Since there is no way to compare catalog pages other than commitTimeStamp, we have to treat all pages with this MAX commitTimeStamp value as the "last" page. In reality, there is only ever one page with the MAX commitTimeStamp since a bit of time always pages between two commits.

However, both CatalogReader and NuGet.org's CommitCollector handle any time when a page or catalog item gets a new commitTimeStamp, even if it's a catalog item that already exists or a page that isn't the last. We would be losing this flexibility. This may be acceptable but since the catalog is not officially spec'd and there may be other implementations out there, it's hard to say whether this is a good idea.

2. Use the commitId as part of the cache key and cache pages and items forever

This retains the flexibility lost in option 1 but bloats the HTTP cache. There will be N copies of each page in the cache, where N is the number of different commits observed by the reader on that page.

This is a probably the simplest solution.

3. Store the commitId for all pages and items in an external store (JSON file?)

This avoids the bloat of option 2 but has additional complexity since now we have to invent a new data store thingy.

Conclusion

What are your thoughts?

Also, am I missing something here?

I like option 1 the best. When I get to documenting the V3 protocol, I hope to mandate that the only mutable catalog page is the last and that catalog items are immutable.

/cc @emgarten

NuGet Package Availability

It would be great if a NuGet package is available for this project, I have a thought to design a small Blazor wasm app similar nuget.org and that will list all packages in azure feed, what's more I would be using static website in the same storage account and if the static site is all good with features I will make it a public solution in this way we have sleet and then side by side sleet browser.

Support including dependencies when specifying Ids to include

I haven't found a good tool to do this yet but I would love for there to be an option like "--include-dependencies" that when combined with "-i" would pull down the whole chain of dependencies for any included id. I work on an isolated network and mirroring the entire repository is possibly overkill so it would be great if you supported this. It could even go one step further and allow you to specify the target platform(s) for which to pull the dependencies.

Example:

nugetmirror nupkgs https://api.nuget.org/v3/index.json -o c:\packages -i BenchmarkDotnet --include-dependencies netcoreapp11,netframework46

Possible bug?

Hi
I have been trying out your app and left it for a while (1 hr or so) then came back and had 34,772 Files, 17,451 Folders.

I stopped the app deleted the folders but left just the cursor

I then started it again and made a coffee (thats the important part) came back and it was downloading what looks like all of the files again into the tmp folder? I am currently on 10868 and decided I would raise it as a question. Is there a better way to restart the app in case of a failure or an outage or simply moving old files from the disk?

P.S love what it does thanks for the effort of making it public for people like me to try :)

Mirroring most downloaded nugets from nuget.org

Hi, I'd like to use nugetmirror to download all nugets in nuget.org that have more than a configurable amount of downloads with their dependencies.
Is there any way to do this with the current CLI or with small changes to the existing code?

[Suggestion] Filtering packages by profile (uploader)

For example, if one would want to resolve a dependency hell issue with all version of .NET Core (1.0, 1.1, 2.0) and all the included templates (mvc, console, xunit etc...),

he would maybe look for a way to download all versions of all packages uploaded by specific official organizations, like Microsoft, aspnet, EntityFramework, dotnetframework, instead of mirroring the whole 990k nuget package repository (which is also full of junk packages, with no cleanup in sight)

I would attempt to dive into the code in a fork and attempt to implement such feature, but I do not have the time currently, so for now, just wanting to open a discussion on the matter..

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.