Coder Social home page Coder Social logo

jump's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

jump's Issues

Extract CI integration tests into a form that can easily run in the dev cycle.

This may be as simple as requiring bash - Windows developers will tend to have at least the bash environment that comes along with Git there - and extracting the CI integration test steps as scripts that can be run manually or perhaps via a pseudo-crate run target like the package crate provides for packaging scie-jump.

Support a `${scie.boot}` placeholder for app scratch space in the `~/.nce`

This would allow having additional commands that can perform post-install steps. The idea would be to use the fingerprint of the lift manifest itself as the value for the ${scie.boot} and then use an atomic directory (or maybe just an atomic marker file - we just want to ensure the side-effect is performed only once) at the top of that directory for each additional command name that requests the placeholder. The motivation is is pre-installing PEX venvs to avoid the PEX zip (or unzipped zip) re-director ~50ms overhead similar to how the Pex --sh-boot option operates.

For example:

{
  "command": {
    "exe": "${python}/bin/python"
    "args": [
      "${scie.boot.venv}/venv/pex"
    ]
  }
  "additional_commands": {
    "venv": {
      "env": {
        "PEX_TOOLS": "1"
      },
      "exe": "${python}/bin/python"
      "args": [
        "${pex}", "venv", "--bin-path", "prepend", "--compile", "--rm", "pex", "${scie.boot}/venv"
      ]
    }
  }
}

The `scie-jump` unconditionally fills in size, hash and type information for files at runtime.

The scie-jump boot-pack is designed to be a more reasonable tool than cat for building scies. It trusts you've assembled the files you want and calculates their sizes, hashes and types as well as filling in its own "jump" information. Although the "jump" information is strictly required at runtime, the file information currently is not. If the scie-jump happens to find files of the right name loose on the file system next to it and its lift manifest doesn't already have fully specified file information, it will just fill that information in on the fly, accepting the hash and size of the files next to it.

Support hiding commands from `SCIE_BOOT="bad boot command" ...` output.

Sometimes scies use commands recursively as part of their initial binding setup, but these commands are not meant for end-user use. The scie-pants pants scie provides an example of this:

$ SCIE_BOOT=bad pants
Error: Isolates your Pants from the elements.

Please select from the following boot commands:

scie-pants
bootstrap-tools
pants
pants-debug
update

You can select a boot command by passing it as the 1st argument or else by setting the SCIE_BOOT environment variable.

In this case the boot commands should all be hidden save for the default unnamed command, update and bootstrap-tools. It seems like either having the contract that commands with no description are hidden or else adding a hidden bool to the lift manifest JSON is in order.

.env loading fails partially and silently on invalid files

The current use of dotenvy for applying .env files results in:

  1. errors being silently ignored (e.g. lines that can't be parsed), which is very confusing for users of a scie-jump packaged binary, like scie-pants
  2. (potentially not a problem) a .env file being partially applied, where all lines before the first error are applied, and all lines (even valid ones) after the first error are ignored (due to dotenvy's immediate bail out on error https://github.com/allan2/dotenvy/blob/08e35eea693ca12c4038a226bae2e3144445ba69/dotenv/src/iter.rs#L33)

jump/jump/src/lib.rs

Lines 181 to 186 in f44ca18

if lift.load_dotenv {
let _timer = timer!(Level::Debug; "jump::load_dotenv");
if let Ok(dotenv_file) = dotenvy::dotenv() {
debug!("Loaded env file from {path}", path = dotenv_file.display());
}
}

This was observed in practice in pantsbuild/scie-pants#307.

Add support for `${scie.env.X}`.

Although node support currently works, it looks like this for cowsay installed via npx cowsay and then the resulting node_modules directory zipped up and used as the scie trailer zip:

{
  "scie": {
    "root": "~/.nce",
    "size": 2164160,
    "version": "0.1.0"
  },
  "files": [
    {
      "type": "archive",
      "name": "node",
      "size": 23692504,
      "fingerprint": {
        "algorithm": "sha256",
        "hash": "9429e26d9a35cb079897f0a22622fe89ff597976259a8fcb38b7d08b154789dc"
      },
      "archive_type": "tar.xz"
    },
    {
      "type": "archive",
      "name": "node_modules",
      "size": 406840,
      "fingerprint": {
        "algorithm": "sha256",
        "hash": "d7fa4e96a282573a763173902955d1f77bde176b58cba3abae5bd3bb6129c183"
      },
      "archive_type": "zip"
    }
  ],
  "command": {
      "env": {
          "NODE_PATH": "${node_modules}"
      },
      "exe": "${node}/node-v18.12.0-linux-x64/bin/node",
      "args": [
        "${node_modules}/node_modules/cowsay/cli.js"
      ]
  }
}

Here the command has to bind to the underlying js entry point file that implements cowsay. This is because the js console script equivalent is a #!/usr/bin/env node shebang script. The current env var handling is to always only use env vars defined in the config as defaults and let presence in the env go undisturbed. There probably needs to be a way to specify fallback vs trump and then trumping could do this:

{
  "command": {
    "env": {
      "NODE_PATH": "${node_modules}",
      "=PATH": "${node}/node-v18.12.0-linux-x64/bin:${scie.env.PATH}"
    },
    "exe": "${node}/node-v18.12.0-linux-x64/bin/npx",
    "args": [
      "cowsay"
    ]
  }
}

Here I've used the straw man = prefix on PATH to indicate it should override. This sort of reads like it does and is safe since "These strings have the form name=value; names shall not contain the character '='.": https://pubs.opengroup.org/onlinepubs/009696899/basedefs/xbd_chap08.html.

Add (or default to) a "strict" mode which requires specifying size/hash

For files, you can supply a "size" and sha256 "hash". Without these the boot-pack will calculate
them, but you may want to set them in advance as a security precaution. The scie-jump will refuse
to operate on any file whose size or hash do not match those specified.

Just thinking long-term updates, and supply-chain attacks - might be nice to have the ability to enforce adding that information.

Not urgent, either - might even be useful to wait for some stability and more usage to see if there is another angle.

Support env var defaults.

This would allow, for example, a scie to opt in to a {scie.lift} override via env var. This would allow plugging in alternate URLs for ptex downloads behind corporate firewalls.

Consider supporting VAR="with space and $substitution" in `.env`

Continuing #163, it seems that double quotes aren't supported, so it is inconvenient to use .env to set a variable that has both a space in it and does a $ env var substitution, because all the spaces have to be individually \ escaped.

In particular, it'd be nice to have this table have 4 ✅:

.env line Behaviour Behaviour is desirable Comment
A="x $y" Error This is the most natural syntax, and would be nice
A=x\ $y Accepted, with substitution Annoying for a human to have to write and read all the \s
A='x $y' Accepted, without substitution matches the same line in a shell
A=x $y Error if accepted, this would have different meaning to the same line in a shell

This is effectively a limitation of dotenvy, e.g. allan2/dotenvy#11. I don't know if another library has different behaviour.

Implement boot-pack support.

Either running a bare scie-jump stub with no trailer or else by exporting SCIE_LIFT=<anything> in the environment should activate and internal boot-pack that has 1 required positional argument that is the path to the lift manifest to boot pack. The parent directory of the manifest will be set to CWD and all files in the manifest will be resolved relative to that directory. A --single-lift-line (or similar, but Mad River Glen!) option will add the lift manifest trailer as a single trailing line so that the resulting scie can be inspected with tail -1 <scie binary> | jq ..

Although plain old cat can be used to do all this, cat on Windows assumes text files unless you've gone through some hoops to install a Unix version of the tool; so this will be quite handy there, but also will provide a uniform interface to performing the final pack operation.

Support unpacked execution.

Instead of a bare scie-jump executing a boot-pack of a lift manifest, it should gain the ability to also execute the lift manifest directly. The interface would be the same as the boot-pack in terms of an optional argument indicating the lift manifest to use, defaulting to the sibling lift.json. This would allow a scie to be tested and experimented with before final packaging as a monolithic scie binary. This notably would support Pants-style cache-efficient sandboxes while retaining the ability to run a final boot-pack in a packaging step.

Verify fingerprints on 1st extract.

To ensure the ~/.nce is not corrupted, the fingerprint claimed in the lift manifest should match the actual fingerprint or fail fast. This will ensure the ~/.nce cache is a reliable CAS.

The help screen for a bad SCIE_BOOT could be better.

Here I use the scie-pants pants scie to demonstrate:

$ SCIE_BOOT=bad pants
Error: Isolates your Pants from the elements.

Please select from the following boot commands:

scie-pants
bootstrap-tools
pants
pants-debug
update

You can select a boot command by passing it as the 1st argument or else by setting the SCIE_BOOT environment variable.

Now, in this case, the error should be pretty obvious since I typed SCIE_BOOT=bad pants which is right there in the console along with the error message. Things are more mysterious though if I previously exported SCIE_BOOT=bad into the environment, in which case you just see:

$ pants
Error: Isolates your Pants from the elements.

Please select from the following boot commands:

scie-pants
bootstrap-tools
pants
pants-debug
update

You can select a boot command by passing it as the 1st argument or else by setting the SCIE_BOOT environment variable.

Better would probably be something like:

$ pants
Error: SCIE_BOOT="bad" was found in the environment but "bad" does not correspond to any pants subcommands.

Please select from the following boot commands:
<default> (when SCIE_BOOT is not in the environment): Isolates your Pants from the elements.
scie-pants
bootstrap-tools
pants
pants-debug
update

You can select a boot command by passing it as the 1st argument or else by setting the SCIE_BOOT environment variable.

Start end of final zip discovery 18 bytes from the end of the file.

Since the smallest Zip end of central directory is 22 bytes (when there is no zip comment set, which is uncommon), the search can start there. Currently it starts at the end of the file which will always needlessly look at the 1st 18 bytes every time:

jump/jump/src/jmp.rs

Lines 22 to 33 in 639f589

let offset_from_eof = data
.iter()
.rev()
.take(max_scan)
.tuple_windows::<(_, _, _, _)>()
.position(|chunk| EOCD_SIGNATURE == chunk)
.ok_or_else(|| {
format!(
"Failed to find application zip end of central directory record within the last \
{max_scan} bytes of the file. Invalid NCE."
)
})?;

Support a `{scie.lift}` placeholder.

If requested, the lift manifest will be extracted to the {scie.boot} and that path substituted. Both commands and bindings can use this to read the config. This is definitely needed for #19 and other cases where the custom metadata feature is used.

Reclaim the scie-tote after extraction from it.

The scie-tote must be extracted to handle any zip archives contained within it. Unlike blobs and tarballs, these need a seek-able stream and we'd have to implement this on top of or as a replacement for the zip crate. The expedient path has been to just use the zip crate to extract the scie-tote and then operate on the raw files extracted from it instead:

jump/jump/src/installer.rs

Lines 177 to 201 in 12d7144

if !scie_tote.is_empty() {
let tote_file = context.files.last().ok_or_else(|| {
format!(
"Expected the last file to be the scie-tote holding these files: {scie_tote:#?}"
)
})?;
let scie_tote_dst = context.get_path(tote_file);
let bytes = &payload[(location - tote_file.size)..location];
unpack(tote_file.file_type, Cursor::new(bytes), &scie_tote_dst)?;
for file in scie_tote {
let dst = context.get_path(file);
if let Some(file_type) = file_type_to_unpack(file, &dst) {
let src_path = scie_tote_dst.join(&file.name);
let src = std::fs::File::open(&src_path).map_err(|e| {
format!(
"Failed to open {file:?} at {src} from the unpacked scie-tote: {e}",
src = src_path.display()
)
})?;
unpack(file_type, &src, &dst)?;
} else {
debug!("Cache hit {dst} for {file:?}", dst = dst.display())
};
}
}

This seems like a fine choice except that the scie-tote extraction is currently performed in the ~/.nce and so its permanent and leaks the space of its members which are separately stored under their own cache keys in the ~/.nce. This should be straight-forward to fix by extracting the scie-tote to a temporary directory that is cleaned up after processing instead.

Support conditionally setting arguments to a command

In pantsbuild/scie-pants#249, we encounter a circumstance where a command has an arg that needs to be conditionally set: a binding computes an env var that is sometimes set to --python-repos-find-links=... and sometime empty. If the env var is empty, it'd be better for the command to not receive an argument: invoking pants "" ... causes issues with pants specifically, and in general passing a spurious empty string is likely to cause problems for any program.

It'd be handy to be able to (optionally) make an argument conditional, so that these sort of "maybe set an argument" uses cases are easier to handle.

AFAICT, this isn't currently possible with scie-jump: there's a reified argument for every argument in the manifest, and string like "{scie.bindings.configure:PANTS_SHA_FIND_LINKS}" just becomes an empty string.

jump/jump/src/context.rs

Lines 270 to 274 in 1403e1f

for arg in &cmd.args {
let (reified_arg, needs_manifest) = self.reify_string(&env, arg)?;
needs_lift_manifest |= needs_manifest;
args.push(reified_arg.into());
}

jump/jump/src/context.rs

Lines 576 to 581 in 1403e1f

let value = binding_env
.get(&parsed_env.name)
.map(String::to_owned)
.or(parsed_env.default)
.unwrap_or_default();
reified.push_str(&value)

Suppose there's a manifest like the following, we'd like to be able to annotate the ARGUMENT arg to disappear completely in some cases (particularly when ARGUMENT is empty, but a more general mechanism would be okay too):

...
"commands": {
  "": {
    "exe": ".../executable"
    "args": ["{scie.bindings.some_binding:ARGUMENT}"]
  }
}
...

In shell terms:

ARGUMENT=""

.../executable "$ARGUMENT" # current behaviour: passing a "" argument

.../executable $ARGUMENT # requested behaviour: no argument at all

Workaround that we'll likely use in pantsbuild/scie-pants#250: instead of setting the env var to nothing, set it to a "no-op" argument, so that we're never passing an empty arg to the executable.

(This may apply to arguments in bindings in addition to commands?)

Avoid digesting lift manifest on every scie run.

Currently, the lift manifest is digested on every run of a scie even if it already has a hash recorded:

jump/jump/src/lift.rs

Lines 173 to 204 in 12d7144

fn load(
manifest_path: &Path,
data: &[u8],
reconstitute: bool,
) -> Result<(Option<Jump>, Lift), String> {
let config = Config::parse(data)?;
let resolve_base = manifest_path
.parent()
.unwrap_or_else(|| Path::new(""))
.canonicalize()
.map_err(|e| {
format!(
"Failed to resolve an absolute path for the parent directory of the lift \
manifest {manifest}: {e}",
manifest = manifest_path.display()
)
})?;
let lift = config.scie.lift;
let files = assemble(&resolve_base, lift.files, reconstitute)?;
Ok((
config.scie.jump,
Lift {
name: lift.name,
description: lift.description,
base: lift.base,
boot: lift.boot,
size: data.len(),
hash: fingerprint::digest(data),
files,
},
))
}

Although informal timings put the cost at ~4us, it's hashing that is not needed. Much like the file entries themselves, the hash only need be checked on extraction, which, for lift manifests, is only when the scie.boot is involved (see #7).

Typical informal timings in a noop dispatch timer test rig of:

{
  "scie": {
    "lift": {
      "name": "timer",
      "boot": {
        "commands": {
          "": {
            "exe": "/usr/bin/true",
            "additional_files": [
                "node-v18.12.0-linux-x64.tar.xz"
            ]
          }
        }
      },
      "files": [
        {
          "name": "node-v18.12.0-linux-x64.tar.xz"
        }
      ]
    }
  }
}
$ time RUST_LOG=debug ./timer
[DEBUG TimerFinished] assemble(), Elapsed=2.648µs
[DEBUG TimerFinished] digest(), Elapsed=3.395µs
[DEBUG TimerFinished] load_scie(), Elapsed=60.023µs
[DEBUG TimerFinished] new(), Elapsed=4.722µs
[DEBUG TimerFinished] unpack_archive(), Elapsed=2.554µs
[DEBUG jump::installer] Cache hit /home/jsirois/.nce/9429e26d9a35cb079897f0a22622fe89ff597976259a8fcb38b7d08b154789dc/node-v18.12.0-linux-x64.tar.xz for File { name: "node-v18.12.0-linux-x64.tar.xz", key: Some("node"), size: 0, hash: "9429e26d9a35cb079897f0a22622fe89ff597976259a8fcb38b7d08b154789dc", file_type: Archive(CompressedTar(Xz)), always_extract: false }
[DEBUG TimerFinished] prepare(), Elapsed=14.846µs
[DEBUG TimerFinished] prepare_boot(), Elapsed=103.26µs

real    0m0.001s
user    0m0.001s
sys     0m0.000s
$ RUST_LOG=debug time -v ./timer
[DEBUG TimerFinished] assemble(), Elapsed=2.645µs
[DEBUG TimerFinished] digest(), Elapsed=3.436µs
[DEBUG TimerFinished] load_scie(), Elapsed=50.202µs
[DEBUG TimerFinished] new(), Elapsed=4.473µs
[DEBUG TimerFinished] unpack_archive(), Elapsed=2.45µs
[DEBUG jump::installer] Cache hit /home/jsirois/.nce/9429e26d9a35cb079897f0a22622fe89ff597976259a8fcb38b7d08b154789dc/node-v18.12.0-linux-x64.tar.xz for File { name: "node-v18.12.0-linux-x64.tar.xz", key: Some("node"), size: 0, hash: "9429e26d9a35cb079897f0a22622fe89ff597976259a8fcb38b7d08b154789dc", file_type: Archive(CompressedTar(Xz)), always_extract: false }
[DEBUG TimerFinished] prepare(), Elapsed=14.854µs
[DEBUG TimerFinished] prepare_boot(), Elapsed=93.324µs
        Command being timed: "./timer"
        User time (seconds): 0.00
        System time (seconds): 0.00
        Percent of CPU this job got: 92%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.00
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 2816
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 164
        Voluntary context switches: 1
        Involuntary context switches: 0
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

Add support for the scie modifying itself on 1st run.

The basic idea is to have the scie do all its prep work - extract all files, run all prep commands (see #7) - and then re-write itself to be just the scie-boot, an empty zip and the lift manifest trailer, which is currently a ~2MB sandwich. If the ~/.nce were something like /usr/local/nce and the scie itself was housed in /usr/bin/<scie> you could imagine a very simple way to make a Unix self installing executable. It could be installed via sudo sh -c 'export scie="<scie>" && cp $scie /usr/bin/$(basename $scie) && SCIE=install /usr/bin/$(basename $scie) --strip.

There are tricky bits for this to be semi-robust. Although all the scies in /usr/bin would have their manifests still, for uninstall you'd want copies of these manifests somewhere in /usr/local/nce to be able to check dependents of the various CAS objects before deleting them. Further, this operation would need a global lock, although the atomic directory support slated for #2 could probably be leveraged for that.

When argv0 comes fro the PATH but matches a file in CWD which is not on the PATH, chaos ensues.

The problem here is this code which enables BusyBox via-symlinks:

jump/src/main.rs

Lines 57 to 63 in b5b81a6

fn find_current_exe() -> Result<PathBuf, Exit> {
if let Some(arg) = env::args().next() {
let argv0 = PathBuf::from(arg);
if argv0.is_file() {
return Ok(argv0);
}
}

The concrete case came from scie-pants where it is a scie binary installed on the PATH as pants. Existing Pants-using repos have a ./pants script checked in at the root of the repo. This is normally not on the PATH and instead invoked as ./pants. With scie-pants on the PATH as pants, invoking pants in the same directory as an existing ./pants script confuses the current argv0 code which sees an argv0 of "pants" and a local file also named "pants" and wrongly assumes the two are the same. This leads to an error trying to read the "pants" script in CWD as a scie - it is not.

Support placeholders for files.

The idea is to extend placeholders from parameterizing commands to also parameterizing files.

Like so:

{
  "scie": {
    "lift": {
      "commands": {
        "exe": "{node}/node-v18.12.0-linux-arm64/bin/node"
      },
      "bindings": {
        "fetch": {
          "exe": "{ptex}"
        }
      },
      "files": [
        {
          "name": "ptex"
        },
        {
          "name": "node-v18.12.0-linux-arm64.tar.xz={scie.file.fetch}",
          "key": "node",
          "hash": "...",
          "size": "...",
          "type": "tar.xz"
        }
      ]
    }
  },
  "ptex": {
    "node-v18.12.0-linux-arm64.tar.xz": "https://nodejs.org/dist/v18.12.0/node-v18.12.0-linux-arm64.tar.xz"
  }
}

The idea is a new placeholder {scie.file.<binding>}. A file whose name ends in ={scie.file.<binding>} will not be processed by the boot-pack but instead be processed at install time. The protocol would be something like:

./<binding> <file json blob with stripped name> > ~/.nce/<hash>/<stripped name>

So if the binding command can accept file json and emit that file on stdout, that's all that needed to hydrate files from elsewhere than the scie itself. Since the config is strict for the scie lift manifest, but agnostic for other top-level keys, this allows the configured fetch binding to store extra metadata about where the file should be fetched from. This would allow an application like ptex to be written seperate from the scie-jump that could be included in scies to turn them into ptexes - portable thin executables.

Twitter had a use case for this in the context of distributing very large PEXes in their datacenters. If the scie conceptually contained a packed layout PEX, it could be shipped as a ptexed scie that just had the scie-jump and ptex binary blobs inside with all the rest of the application content, up to and including the Python interpreter, provided lazily by ptex on 1st boot.

With the syntax approach sketched out here the config format stays the same - only a new placeholder is introduced; so this idea can be deferred without worry of breaking the format.

Consider moving CAS to platform-standard cache directories

Ran across this project via pantsbuild.slack.com lurking.

Small suggestion: instead of ~/.nce, store the CAS in $XDG_CACHE_HOME/{scie,nce,whatever} on systems that follow XDG, and in ~/Library/Caches/{scie,nce,whatever} on macos—and probably somewhere under %localappdata% on Windows; less sure about that one.

This makes it clear it's fine to blow away (eg, to free up space), and also removes some clutter from $HOME. It's probably easier to change this before widespread use of scie-packaged programs. :-)

Edit: I just saw #9 which probably conflicts with the idea that the CAS can be blown away. In that case, the equivalent of $XDG_DATA_HOME might be better?

`SCIE=split` fails for scie-tote / files with a non-scie `"source"`.

For example, trying to split a scie-pants scie nets:

$ SCIE=split scie-pants /tmp/scie-pants-split
Error: Failed to open scie-tote zip: invalid Zip archive: Invalid zip header
$ ls -l /tmp/scie-pants-split/
total 7280
-rw-r--r-- 1 jsirois jsirois       0 Dec 19 12:27 cpython-3.8.15+20221106-aarch64-apple-darwin-install_only.tar.gz
-rw-r--r-- 1 jsirois jsirois 5780260 Dec 19 12:27 cpython-3.8.15+20221106-aarch64-unknown-linux-gnu-install_only.tar.gz
-rw-r--r-- 1 jsirois jsirois       0 Dec 19 12:27 cpython-3.8.15+20221106-x86_64-apple-darwin-install_only.tar.gz
-rw-r--r-- 1 jsirois jsirois       0 Dec 19 12:27 cpython-3.8.15+20221106-x86_64-unknown-linux-gnu-install_only.tar.gz
-rw-r--r-- 1 jsirois jsirois       0 Dec 19 12:27 cpython-3.9.15+20221106-aarch64-apple-darwin-install_only.tar.gz
-rw-r--r-- 1 jsirois jsirois       0 Dec 19 12:27 cpython-3.9.15+20221106-aarch64-unknown-linux-gnu-install_only.tar.gz
-rw-r--r-- 1 jsirois jsirois       0 Dec 19 12:27 cpython-3.9.15+20221106-x86_64-apple-darwin-install_only.tar.gz
-rw-r--r-- 1 jsirois jsirois       0 Dec 19 12:27 cpython-3.9.15+20221106-x86_64-unknown-linux-gnu-install_only.tar.gz
-rwxr-xr-x 1 jsirois jsirois 1668304 Dec 19 12:27 scie-jump

Unexpected dependency to homebrew xz library from a scie executable

> otool -L ./backendai-install-macos-aarch64
./backendai-install-macos-aarch64:
        /opt/homebrew/opt/xz/lib/liblzma.5.dylib (compatibility version 8.0.0, current version 8.9.0)
        /usr/lib/libiconv.2.dylib (compatibility version 7.0.0, current version 7.0.0)
        /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1336.0.0)

I discovered that executing my scie-built executable on one of "fresh" macOS setup fails due to a missing dylib: /opt/homebrew/opt/xz/lib/liblzma.5.dylib.

I think it should be statically linked into the scie binary.

The scie-tote is always extracted.

As demonstrated with the 0.3.8 release using the example in ptex which yields a scie of:

$ curl -sSL https://github.com/a-scie/jump/releases/download/v0.3.8/scie-jump-linux-x86_64 > example/scie-jump-linux-x86_64
$ rm skinny-scie && example/scie-jump-linux-x86_64 example/lift.json
example/lift.json: /home/jsirois/dev/a-scie/ptex/skinny-scie
$ zipinfo skinny-scie
Archive:  skinny-scie
Zip file size: 5729378 bytes, number of entries: 1
warning [skinny-scie]:  1459568 extra bytes at beginning or within zipfile
  (attempting to process anyway)
-rwxr-xr-x  4.6 unx  4268672 b- stor 80-Jan-01 00:00 ptex-linux-x86_64
1 file, 4268672 bytes uncompressed, 4268672 bytes compressed:  0.0%

1st run:

$ rm -rf ~/.nce && RUST_LOG=debug ./skinny-scie
[DEBUG TimerFinished] assemble(), Elapsed=1.97µs
[DEBUG TimerFinished] digest(), Elapsed=4.005µs
[DEBUG TimerFinished] load_scie(), Elapsed=72.826µs
[DEBUG TimerFinished] new(), Elapsed=1.781µs
[DEBUG TimerFinished] digest_reader(), Elapsed=6.302608ms
[DEBUG jump::installer] The zip destination /tmp/.tmpyGSDOa/scie-tote of size 4268804 had expected hash
[DEBUG TimerFinished] unpack_archive(), Elapsed=10.431485ms
[DEBUG TimerFinished] digest_reader(), Elapsed=6.356938ms
[DEBUG jump::installer] The blob destination /home/jsirois/.nce/85b1348e197f83cc6c0e607849dc8fa2ac96ffc0885d3f506edbe5d714670adc/ptex-linux-x86_64 of size 4268672 had expected hash
[DEBUG TimerFinished] unpack_blob(), Elapsed=8.304434ms
[INFO  jump::installer] Loading cpython-3.10.8+20221106-x86_64-unknown-linux-gnu-install_only.tar.gz via "/home/jsirois/.nce/85b1348e197f83cc6c0e607849dc8fa2ac96ffc0885d3f506edbe5d714670adc/ptex-linux-x86_64"...
Downloaded 27555856 of 27555856 bytes (100%) from https://github.com/indygreg/python-build-standalone/releases/download/20221106/cpython-3.10.8+20221106-x86_64-unknown-linux-gnu-install_only.tar.gz
[DEBUG TimerFinished] digest_reader(), Elapsed=45.468181ms
[DEBUG jump::installer] The tar.gz destination /home/jsirois/.nce/6c8db44ae0e18e320320bbaaafd2d69cde8bfea171ae2d651b7993d1396260b7/cpython-3.10.8+20221106-x86_64-unknown-linux-gnu-install_only.tar.gz of size 27555856 had expected hash
[DEBUG TimerFinished] unpack_tar(), Elapsed=237.829069ms
[DEBUG TimerFinished] unpack_archive(), Elapsed=5.597074065s
[DEBUG TimerFinished] install(), Elapsed=5.616264025s
[DEBUG TimerFinished] prepare_boot(), Elapsed=5.616678143s
Hello World!

Subsequent run with a warm cache:

$ RUST_LOG=debug ./skinny-scie
[DEBUG TimerFinished] assemble(), Elapsed=1.616µs
[DEBUG TimerFinished] digest(), Elapsed=3.423µs
[DEBUG TimerFinished] load_scie(), Elapsed=74.288µs
[DEBUG TimerFinished] new(), Elapsed=2.058µs
[DEBUG jump::atomic] The atomic file at /home/jsirois/.nce/066d340e05b4c1200817152a7ab9dbf2f0b0661925566cffcdf03887316646be/lift.json has already been established.
[DEBUG TimerFinished] digest_reader(), Elapsed=6.578561ms
[DEBUG jump::installer] The zip destination /tmp/.tmpaWJPLo/scie-tote of size 4268804 had expected hash
[DEBUG TimerFinished] unpack_archive(), Elapsed=8.538348ms
[DEBUG jump::atomic] The atomic file at /home/jsirois/.nce/85b1348e197f83cc6c0e607849dc8fa2ac96ffc0885d3f506edbe5d714670adc/ptex-linux-x86_64 has already been established.
[DEBUG TimerFinished] unpack_blob(), Elapsed=5.037µs
[DEBUG jump::atomic] The atomic directory at /home/jsirois/.nce/6c8db44ae0e18e320320bbaaafd2d69cde8bfea171ae2d651b7993d1396260b7/cpython-3.10.8+20221106-x86_64-unknown-linux-gnu-install_only.tar.gz has already been established.
[DEBUG TimerFinished] unpack_archive(), Elapsed=16.87µs
[DEBUG TimerFinished] install(), Elapsed=9.097267ms
[DEBUG TimerFinished] prepare_boot(), Elapsed=9.375412ms
Hello World!

Note that 8.538348ms is spent needlessly unpacking the scie-tote to /tmp/.tmpaWJPLo/scie-tote only to just after determine that the ptex-linux-x86_64 contained within was already extracted to the ~/.nce.

If no file is needed from the scie-tote, the scie-tote should never extract. The eager extraction happens here:

jump/jump/src/installer.rs

Lines 233 to 248 in 69f70de

FileEntry::ScieTote((tote_file, entries)) => {
let scie_tote_tmpdir = tempfile::TempDir::new().map_err(|e| {
format!(
"Failed to create a temporary directory to extract the scie-tote to: {e}"
)
})?;
let scie_tote_path = scie_tote_tmpdir.path().join(&tote_file.name);
let bytes = &payload[location..(location + tote_file.size)];
unpack(
tote_file.file_type,
tote_file.executable.unwrap_or(false),
|| Ok((Cursor::new(bytes), ())),
tote_file.hash.as_str(),
&scie_tote_path,
)?;
for (file, dst) in entries {

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.