Coder Social home page Coder Social logo

Comments (8)

DanielKeep avatar DanielKeep commented on May 24, 2024

I thought about this a little when I was working out how to get binary dependencies for a Rust project. In the end, I decided that what the build script should try (note: this was in Python, before Cargo had build scripts):

  • Check a standard drop location to see if the necessary files are already present (local override).
  • Run any system-specific locators that might help (pkg-config on *nix, shrug and give up on Windows).
  • Try to download a pre-compiled binary from the official website, for the current platform, to a reasonable cache location.
  • Try to check-out the source from the official repository, cross its fingers, and hope the user has the necessary software to build it (probably after prompting them).

I've always felt that just compiling the source is dicey as Windows doesn't have a C compiler by default. Since Rust no longer depends on GCC, you can't even assume that is present on Windows. Besides which, it basically ignores any version installed on the system, which might cause surprising behaviour ("but, I updated libsplang on my system to close the security vulnerability; how'd I get exploited?!", or "why can't prog-a and prog-b share files? They're both using libsplang!").

It might be worth having a standard sysdep package that abstracts all this, so it doesn't have to be re-engineered for every project.

from crates.io.

alexcrichton avatar alexcrichton commented on May 24, 2024

Another possible route here would be to compress with xz or bzip2. For me it shaves 10MB off the size of the cld2 directory packed up. In general though @steveklabnik was right on reddit in that we don't want to let this get out of control too fast.

from crates.io.

emk avatar emk commented on May 24, 2024

@DanielKeep: I'd use system packages for cld2, but it's not a very widely-packaged library. Plus, I need a build solution for Heroku, where I have no control over the installed libraries.

lifthrasiir has just sent me emk/rust-cld2#1 , which removes cld2's documentation, deletes some unused data tables, and strips comments from the source code (which substantially boosts compression performance). This gets the rust-cld2 crate under 10MB, at least for this version, though the recent update to the upstream project may break it.

Is there any way to run a custom script during the packaging process? If not, maybe I need to fork cld2 and produce a stripped down git repo. Or cache tarballs on S3, but I'm trying to avoid that.

I'd love to find a good solution here.

from crates.io.

lifthrasiir avatar lifthrasiir commented on May 24, 2024

@alexcrichton If the crate has a data which inherent entropy exceeds 10MB, we are left with no choice but workarounds.

In the particular case of cld2, the main source of excess entropy is a comment (with UTF-8-encoded words for each entry) and removing comments really helps, but the table itself already exceeds 10MB and no common general purpose compresser can easily pack them. (My estimate is that, the actual entropy is some 7 or 8MB, as about 40% of data can be somewhat correlated to each other. But it wouldn't be very easy to infer.)

from crates.io.

alexcrichton avatar alexcrichton commented on May 24, 2024

@lifthrasiir we've got to draw the line somewhere in terms of package upload or otherwise it'll get out of hand. Some crates will always fall on the other side of the line (and this may for example).

from crates.io.

emk avatar emk commented on May 24, 2024

Yeah, I can see there's an obvious tension between:

  1. Wanting reproducible builds coming entirely from inside crates.io.
  2. Keeping crate sizes reasonable.
  3. Packaging libraries according to the *-sys convention (and therefore being able to easily deploy them to Heroku, etc).

cld2 is a very interesting case, because it legitimately needs large data tables to do its job, and the official version is unpackaged Subversion repository. On the other hand, it's a pretty useful library and I have some server-side Rails projects that use it quite successfully in production.

Then there are the semi-evil solutions, including breaking cld2 up into multiple packages by language detected, or some such. I'm going to try to figure out how these tables fit together, and see if I can find a clever solution.

from crates.io.

emk avatar emk commented on May 24, 2024

Using @lifthrasiir's well-researched patch as a starting point, I've created a new git mirror of the upstream cld2 repository, stripped the comments as proposed, and built an exclude list in my Cargo.toml file. With all these tweaks, the cld2-sys package is now down to 6.5MB.

There are bunch of table files which aren't getting included in the current build, and I'll need to look into those later. So maybe we'll see this probem again in the future.

But at least for now, for this one package, we appear to have a workable solution. Thank you to everybody who helped out, especially to @lifthrasiir for figuring out how to cut down the package size.

from crates.io.

alexcrichton avatar alexcrichton commented on May 24, 2024

With the change I just merged, just contact me over IRC/email/whatnot and I can raise the limit for crates individually

from crates.io.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.