Coder Social home page Coder Social logo

Comments (22)

tsibley avatar tsibley commented on June 6, 2024 2

This looks resolved by the just-released nextstrain-base 20230901T214523Z.

from mpox.

tsibley avatar tsibley commented on June 6, 2024 1

Weird. I can reproduce this locally. Notably, when I run nextstrain update conda after initial setup, we do install the latest version (20230830T164409Z). This is what I'd expect since we explicitly resolve the version to update to ourselves. So signs point to Micromamba not resolving to the latest version on the initial micromamba create for some reason.

It would be good to understand what's going on here—is it a Micromamba bug? are we using it wrong?—but I imagine regardless of what's happening, we might still want to change nextstrain setup conda to use the same logic for figuring out the latest version as nextstrain update does rather than leaving it to Micromamba.

from mpox.

joverlee521 avatar joverlee521 commented on June 6, 2024 1

Quick fix for the CI while we figure out the underlying issue.

from mpox.

tsibley avatar tsibley commented on June 6, 2024 1

Anaconda appears to have incorrect indexing of (both builds of) suitesparse 5.10.1.

Compare the metadata for the distribution (used post-install to solve deps for subsequent installs):

$ curl https://api.anaconda.org/release/conda-forge/suitesparse/5.10.1 | jq '
>   .distributions | map(
>       select(.attrs.subdir == "linux-64")
>     | [
>       .full_name,
>       (.attrs.depends | map(select(startswith("metis "))) | .[0])
>     ]
>   )
> '
[
  [
    "conda-forge/suitesparse/5.10.1/linux-64/suitesparse-5.10.1-h9e50725_1.tar.bz2",
    [
      "metis >=5.1.0,<5.2.0a0"
    ]
  ],
  [
    "conda-forge/suitesparse/5.10.1/linux-64/suitesparse-5.10.1-hd8046ac_0.tar.bz2",
    [
      "metis >=5.1.0,<5.2.0a0"
    ]
  ]
]

with the metadata in the channel index (used pre-install to solve deps):

$ curl https://conda.anaconda.org/conda-forge/linux-64/repodata.json.zst | zstdcat | jq '
>   .packages | to_entries | map(
>       select(.key | contains("suitesparse-5.10.1"))
>     | [
>       .key,
>       (.value.depends | map(select(startswith("metis "))))
>     ]
>   )
> '
[
  [
    "suitesparse-5.10.1-h9e50725_1.tar.bz2",
    [
      "metis >=5.1.0,<5.1.1.0a0"
    ]
  ],
  [
    "suitesparse-5.10.1-hd8046ac_0.tar.bz2",
    [
      "metis >=5.1.0,<5.1.1.0a0"
    ]
  ]
]

This is why initial install of nextstrain-base ==20230830T164409Z fails but upgrade to that same version succeeds: the former uses the channel index metadata for suitesparse, the latter the locally installed distribution metadata (e.g. ${prefix}/conda-meta/suitesparse-5.10.1-h9e50725_1.json).

I confirmed that the distribution metadata API is indeed returning the metadata from the actual distribution:

$ curl https://conda.anaconda.org/conda-forge/linux-64/suitesparse-5.10.1-h9e50725_1.tar.bz2 \
> | tar -xjO info/index.json \
> | jq '.depends | map(select(startswith("metis ")))'
[
  "metis >=5.1.0,<5.2.0a0"
]

$ curl https://conda.anaconda.org/conda-forge/linux-64/suitesparse-5.10.1-h9e50725_1.tar.bz2 \
> | tar -xjO info/recipe/meta.yaml \
> | yq '.requirements.run | map(select(startswith("metis ")))'
[
  "metis >=5.1.0,<5.2.0a0",
  "metis >=5.1.0,<5.2.0a0"
]

and that it's all the same as what's in the local install:

$ jq '.depends | map(select(startswith("metis ")))' "$NEXTSTRAIN_HOME"/runtimes/conda/env/conda-meta/suitesparse-5.10.1-h9e50725_1.json
[
  "metis >=5.1.0,<5.2.0a0"
]

In short:

There are 2 hard problems in computer science: cache invalidation, naming things, and off-by-1 errors.
Leon Bambrick, via Martin Fowler

and we're hitting caching issues. (An index is a cache.)

from mpox.

tsibley avatar tsibley commented on June 6, 2024 1

It gets messier: the difference in the index vs. distribution metadata is not accidental, but intentional.

I went to read about channel indexing and noticed this step (emphasis mine):

For each subdir:

  1. Look at all the packages that exist in the subdir.
  2. Generate a list of packages to add/update/remove.
  3. Remove all packages that need to be removed.
  4. For all packages that need to be added/updated:
    • Extract the package to access metadata, including full package name, file modification time (mtime), size, and index.json.
    • Aggregate package metadata to repodata collection.
  5. Apply repodata hotfixes (patches).
  6. Compute and save the reduced current_index.json index.

That raised my eyebrows. So I read further about repodata patching, which mentioned how conda-forge applies repodata patches using https://github.com/conda-forge/conda-forge-repodata-patches-feedstock/.

Any sign of suitesparse or metis in there? Oh, you bet!

conda-forge/conda-forge-repodata-patches-feedstock@2a2c288

Committed just a couple days ago. So this is intentional, to fix an actual ABI breakage, but it has the side-effect of breaking a previously-working combination of packages. This unfortunate risk is noted by the Conda docs linked to above:

Hotfixing is tricky, as it has the potential to break environments that have worked, but it is also sometimes necessary to fix environments that are known not to work.

I think if we rebuild conda-base again, now that the hotfixing is in place, we'll be ok for new installs again. Will confirm that next.

from mpox.

tsibley avatar tsibley commented on June 6, 2024 1

So assuming that rebuild mostly* resolves the issue for now, how do we avoid similar issues in the future?

One way might be having scheduled CI in conda-base that regularly tests if the latest package version is still initially installable (similar to how Nextstrain CLI regularly tests if its standalone installers still work, since they're also dependent on external resources). If that test breaks, we get an early warning to see what's up. If we're really fancy, we could potentially even try to detect certain kinds of breakages like this kind here and automatic remediate it by kicking off another package build.

* nextstrain-base versions between (20230717T174555Z, 20230830T164409Z] are still forever broken for initial installs, but could be upgraded to.

from mpox.

victorlin avatar victorlin commented on June 6, 2024

Hmm, that's helpful info. It doesn't explain the behavior in the ncov run though? That one resolved to nextstrain-base 20230830T164409Z during nextstrain setup conda.

from mpox.

tsibley avatar tsibley commented on June 6, 2024

Yeah. Weird.

from mpox.

joverlee521 avatar joverlee521 commented on June 6, 2024

Seeing the same issue in the seasonal-flu CI now, so no longer just limited to this repo.

from mpox.

tsibley avatar tsibley commented on June 6, 2024

I can reproduce this locally even with the standalone install of Nextstrain CLI by setting up a new Conda runtime from scratch, which makes sense given we think this is a Micromamba issue.

from mpox.

tsibley avatar tsibley commented on June 6, 2024

Thanks for the hot fix for CI!

I started digging into what's going on inside Micromamba by doing roughly this:

$ cd $(mkdir -dt)
$ export NEXTSTRAIN_HOME=$PWD
$ nextstrain debugger
(Pdb) interact
>>> from nextstrain.cli.runner.conda import micromamba, setup_micromamba
>>> setup_micromamba()
>>> micromamba("create", "-vvv", "--dry-run", "nextstrain-base")

I confirmed that the package index it's using, https://conda.anaconda.org/nextstrain/linux-64/repodata.json, contains the latest package version. It does. Then I diffed the two index entries to see if anything stood out, but nothing does.

Next to dig into the actual solver logs.

from mpox.

tsibley avatar tsibley commented on June 6, 2024

The solver starts by considering the latest version, 20230830T164409Z. It finds some conflict when solving deps between the suitesparse 5.10.1 and metis 5.1.1 packages, even though it should be fine to just install the exact versions listed in the nextstrain-base spec. That conflict is resolved by the solver by ruling out 20230830T164409Z and repeating the process with the next highest version all the way down the line until it gets to 20230717T174555Z, which is the latest version with metis 5.1.0.

Since nextstrain-base is the only package not fully-constrained by a version and build in this solving operation, it's likely the only flexibility the solver has to address dep resolution conflicts.

I'm guessing that some difference in the solver or resolution algorithm between the conda-base builds and this version of Micromamba are causing the former to produce something the latter thinks is in conflict. Since they should use broadly the same solver/algo (libmamba → libsolv), this would imply that using a newer Micromamba version might fix this.

But also, I expect pinning the nextstrain-base version on setup would also do the trick, and is more explicitly what we want anyway. Not doing it was kind of an oversight on my part in the nextstrain/cli#280 work. (Very understandable oversight though!)

from mpox.

tsibley avatar tsibley commented on June 6, 2024

But also, I expect pinning the nextstrain-base version on setup would also do the trick, and is more explicitly what we want anyway.

…but actually is not sufficient on its own:

>>> micromamba("create", "-vvv", "--dry-run", "nextstrain-base ==20230830T164409Z hb0f4dca_0_locked")
…
    Encountered problems while solving:
      - package nextstrain-base-20230830T164409Z-hb0f4dca_0_locked requires suitesparse ==5.10.1 h9e50725_1, but none of the providers can be installed
    
    The environment can't be solved, aborting the operation

…
info     libsolv  number of solvables: 642231, memory used: 35122 K
info     libsolv  number of ids: 234927 + 362461
info     libsolv  string memory used: 917 K array + 3573 K data,  rel memory used: 4247 K array
info     libsolv  string hash memory: 2048 K, rel hash memory : 4096 K
info     libsolv  provide ids: 31552
info     libsolv  provide space needed: 673785 + 724922
info     libsolv  shrunk whatprovidesdata from 673785 to 673785
info     libsolv  shrunk whatprovidesauxdata from 673785 to 642230
info     libsolv  whatprovides memory used: 2337 K id array, 5463 K data
info     libsolv  whatprovidesaux memory used: 917 K id array, 2508 K data
info     libsolv  createwhatprovides took 32 ms
info     libmamba Parsing MatchSpec nextstrain-base ==20230830T164409Z hb0f4dca_0_locked
info     libmamba Parsing MatchSpec nextstrain-base ==20230830T164409Z hb0f4dca_0_locked
info     libmamba Adding job: nextstrain-base ==20230830T164409Z hb0f4dca_0_locked
info     libsolv  solver started
info     libsolv  dosplitprovides=0, noupdateprovide=0, noinfarchcheck=0
info     libsolv  allowuninstall=1, allowdowngrade=1, allownamechange=1, allowarchchange=0, allowvendorchange=0
info     libsolv  dupallowdowngrade=1, dupallownamechange=1, dupallowarchchange=1, dupallowvendorchange=1
info     libsolv  promoteepoch=0, forbidselfconflicts=0
info     libsolv  obsoleteusesprovides=0, implicitobsoleteusesprovides=0, obsoleteusescolors=0, implicitobsoleteusescolors=0
info     libsolv  dontinstallrecommended=0, addalreadyrecommended=0 onlynamespacerecommended=0
info     libsolv  obsoletes data: 1 entries
info     libsolv  added 0 pkg rules for installed solvables
info     libsolv  added 0 pkg rules for updaters of installed solvables
info     libsolv  added 6019942 pkg rules for packages involved in a job
info     libsolv  added 0 pkg rules because of weak dependencies
info     libsolv  28943 of 642230 installable solvables considered for solving
info     libsolv  pruned rules from 6019943 to 6008658
info     libsolv    binary: 5881377
info     libsolv    normal: 127280, 9351992 literals
info     libsolv  pkg rule memory used: 140827 K
info     libsolv  pkg rule creation took 3585 ms
info     libsolv  job: install providing nextstrain-base ==20230830T164409Z hb0f4dca_0_locked
info     libsolv    - job Rule #6008666:
info     libsolv      nextstrain-base-20230830T164409Z-hb0f4dca_0_locked [5] (w1)
info     libsolv      next rules: 0 0
info     libsolv  choice rule creation took 3526 ms
info     libsolv  6008657 pkg rules, 2 * 4 update rules, 1 job rules, 0 infarch rules, 0 dup rules, 0 choice rules, 0 best rules, 0 yumobs rules
info     libsolv  0 black rules, 0 recommends rules, 104 repo priority rules
info     libsolv  overall rule memory used: 140830 K
info     libsolv  solving...
info     libsolv  ANALYZE UNSOLVABLE ----------------------
info     libsolv  Rule #3297886:
info     libsolv      !suitesparse-5.10.1-h9e50725_1 [281208] Install.level1
info     libsolv      metis-5.1.0-0 [141705] (w2) Conflict.level1
info     libsolv      metis-5.1.0-1 [141706] (w1) Conflict.level1
info     libsolv      metis-5.1.0-2 [141707] Conflict.level1
info     libsolv      metis-5.1.0-3 [141708] Conflict.level1
info     libsolv      metis-5.1.0-h470a237_3 [141709] Conflict.level1
info     libsolv      metis-5.1.0-h58526e2_1006 [141710] Conflict.level1
info     libsolv      metis-5.1.0-he1b5a44_1004 [141711] Conflict.level1
info     libsolv      metis-5.1.0-he1b5a44_1005 [141712] Conflict.level1
info     libsolv      metis-5.1.0-he1b5a44_1006 [141713] Conflict.level1
info     libsolv      metis-5.1.0-hf484d3e_1003 [141714] Conflict.level1
info     libsolv      metis-5.1.0-hfc679d8_3 [141715] Conflict.level1
info     libsolv      metis-5.1.0-h59595ed_1007 [349083] Conflict.level1
info     libsolv      next rules: 0 3297913
info     libsolv  Rule #2908664:
info     libsolv      !metis-5.1.1-h59595ed_1 [349085] (w1) Install.level1
info     libsolv      !metis-5.1.0-0 [141705] (w2) Conflict.level1
info     libsolv      next rules: 0 2908677
info     libsolv  Rule #2908663:
info     libsolv      !metis-5.1.1-h59595ed_1 [349085] (w1) Install.level1
info     libsolv      !metis-5.1.0-1 [141706] (w2) Conflict.level1
info     libsolv      next rules: 2908664 2908676
info     libsolv  Rule #2908662:
info     libsolv      !metis-5.1.1-h59595ed_1 [349085] (w1) Install.level1
info     libsolv      !metis-5.1.0-2 [141707] (w2) Conflict.level1
info     libsolv      next rules: 2908663 2908675
info     libsolv  Rule #2908661:
info     libsolv      !metis-5.1.1-h59595ed_1 [349085] (w1) Install.level1
info     libsolv      !metis-5.1.0-3 [141708] (w2) Conflict.level1
info     libsolv      next rules: 2908662 2908674
info     libsolv  Rule #2908660:
info     libsolv      !metis-5.1.1-h59595ed_1 [349085] (w1) Install.level1
info     libsolv      !metis-5.1.0-h470a237_3 [141709] (w2) Conflict.level1
info     libsolv      next rules: 2908661 2908673
info     libsolv  Rule #2908659:
info     libsolv      !metis-5.1.1-h59595ed_1 [349085] (w1) Install.level1
info     libsolv      !metis-5.1.0-h58526e2_1006 [141710] (w2) Conflict.level1
info     libsolv      next rules: 2908660 2908672
info     libsolv  Rule #2908658:
info     libsolv      !metis-5.1.1-h59595ed_1 [349085] (w1) Install.level1
info     libsolv      !metis-5.1.0-he1b5a44_1004 [141711] (w2) Conflict.level1
info     libsolv      next rules: 2908659 2908671
info     libsolv  Rule #2908657:
info     libsolv      !metis-5.1.1-h59595ed_1 [349085] (w1) Install.level1
info     libsolv      !metis-5.1.0-he1b5a44_1005 [141712] (w2) Conflict.level1
info     libsolv      next rules: 2908658 2908670
info     libsolv  Rule #2908656:
info     libsolv      !metis-5.1.1-h59595ed_1 [349085] (w1) Install.level1
info     libsolv      !metis-5.1.0-he1b5a44_1006 [141713] (w2) Conflict.level1
info     libsolv      next rules: 2908657 2908669
info     libsolv  Rule #2908655:
info     libsolv      !metis-5.1.1-h59595ed_1 [349085] (w1) Install.level1
info     libsolv      !metis-5.1.0-hf484d3e_1003 [141714] (w2) Conflict.level1
info     libsolv      next rules: 2908656 2908668
info     libsolv  Rule #2908654:
info     libsolv      !metis-5.1.1-h59595ed_1 [349085] (w1) Install.level1
info     libsolv      !metis-5.1.0-hfc679d8_3 [141715] (w2) Conflict.level1
info     libsolv      next rules: 2908655 2908667
info     libsolv  Rule #2908653:
info     libsolv      !metis-5.1.1-h59595ed_1 [349085] (w1) Install.level1
info     libsolv      !metis-5.1.0-h59595ed_1007 [349083] (w2) Conflict.level1
info     libsolv      next rules: 2908654 2908666
info     libsolv  Rule #6008481:
info     libsolv      !nextstrain-base-20230830T164409Z-hb0f4dca_0_locked [5] (w1) Install.level1
info     libsolv      metis-5.1.1-h59595ed_1 [349085] (w2) Install.level1
info     libsolv      next rules: 6008482 0
info     libsolv  Rule #6008406:
info     libsolv      !nextstrain-base-20230830T164409Z-hb0f4dca_0_locked [5] (w1) Install.level1
info     libsolv      suitesparse-5.10.1-h9e50725_1 [281208] (w2) Install.level1
info     libsolv      next rules: 6008407 0
info     libsolv  JOB Rule #6008666:
info     libsolv      nextstrain-base-20230830T164409Z-hb0f4dca_0_locked [5] (w1) Install.level1
info     libsolv      next rules: 0 0
info     libsolv  enabledisablelearntrules called
info     libsolv  resolving job rules
info     libsolv  resolving installed packages
info     libsolv  deciding unresolved rules
info     libsolv  installing recommended packages
info     libsolv  deciding orphaned packages
info     libsolv  solver statistics: 0 learned rules, 1 unsolvable, 0 minimization steps
info     libsolv  done solving.

info     libsolv  solver took 28 ms
info     libsolv  final solver statistics: 1 problems, 0 learned rules, 1 unsolvable
info     libsolv  solver_solve took 7192 ms
info     libmamba Problem count: 1
error    libmamba Could not solve for environment specs
    Encountered problems while solving:
      - package nextstrain-base-20230830T164409Z-hb0f4dca_0_locked requires suitesparse ==5.10.1 h9e50725_1, but none of the providers can be installed

    The environment can't be solved, aborting the operation

from mpox.

tsibley avatar tsibley commented on June 6, 2024

Ok, my reading of the libsolv details in the previous comment and double checking the two suitesparse 5.10.1 packages available on conda-forge has me thinking that boa (used in conda-base builds) is producing a bad solve for the versions of suitesparse and metis. Micromamba seems correct here. (But then again, it also does the upgrade to the latest conda-base just fine?? I'm still confused by that still.)

I upgraded Micromamba to 1.5.0 (latest version), and it still doesn't like the latest package, but at least it has a better error message:

error    libmamba Could not solve for environment specs
    The following package could not be installed
    └─ nextstrain-base ==20230830T164409Z hb0f4dca_0_locked is not installable because it requires
       ├─ metis ==5.1.1 h59595ed_1, which can be installed;
       └─ suitesparse ==5.10.1 h9e50725_1, which requires
          └─ metis >=5.1.0,<5.1.1.0a0 , which conflicts with any installable versions previously reported.

This matches my reading of libsolv above.

from mpox.

tsibley avatar tsibley commented on June 6, 2024

Still very confused how "install old, update to latest" works and how other CI jobs installed the latest just fine (e.g.). This feels like something changing at a distance.

from mpox.

tsibley avatar tsibley commented on June 6, 2024

I'd thought maybe Nextstrain CLI 7.2.0's relatively recent upgrade of Micromamba 1.0.0 → 1.1.0 might have been implicated, but 1.0.0 exhibits the same issues locally and besides, 7.2.0 was released 2 weeks ago, well before recent CI jobs like the one linked above passed.

from mpox.

joverlee521 avatar joverlee521 commented on June 6, 2024

This feels like something changing at a distance.

Looks like new builds of metis 5.1.0 and 5.1.1 were released a couple days ago, maybe some changes in dependencies there?

Edit: Oh wait, I see. It is using the latest metis build but still able to install suitesparse. Huh...

from mpox.

tsibley avatar tsibley commented on June 6, 2024

I think I have it figured out. Don't think it's our fault. Let me confirm.

from mpox.

tsibley avatar tsibley commented on June 6, 2024

That conda-forge repodata patches change was merged 30 Aug at about 10:19 US/Pacific. To take effect it then would have to be built, uploaded, and finally used by Anaconda during index update.

Our latest nextstrain-base version (20230830T164409Z) starting building at 9:44 and finished by around 9:55, so wouldn't have seen the new hotfix patch to the repodata. orz

from mpox.

tsibley avatar tsibley commented on June 6, 2024

Rebuild is looking promising already.

image

from mpox.

tsibley avatar tsibley commented on June 6, 2024

Closing this as this reported issue is resolved. We'd maybe like to do more to prevent it from happening in the future, but I opened a conda-base issue for that: nextstrain/conda-base#41

from mpox.

victorlin avatar victorlin commented on June 6, 2024

#177 (comment): we might still want to change nextstrain setup conda to use the same logic for figuring out the latest version as nextstrain update does rather than leaving it to Micromamba.

nextstrain/cli#318

from mpox.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.