Coder Social home page Coder Social logo

nextstrain / flu_frequencies Goto Github PK

View Code? Open in Web Editor NEW
3.0 3.0 3.0 119.79 MB

Flu clade and mutation frequencies

Home Page: https://flu-frequencies.vercel.app

Python 21.66% TeX 1.55% Jupyter Notebook 1.48% Shell 0.50% Dockerfile 1.11% JavaScript 7.99% TypeScript 62.26% SCSS 3.44%
clades flu influenza mutations

flu_frequencies's Introduction

Logo

This repository is archived and contains the content used to build the documentation and splash page found in nextstrain.org. This content can now be found here.

License and copyright

Copyright 2014-2018 Trevor Bedford and Richard Neher.

Source code to Nextstrain is made available under the terms of the GNU Affero General Public License (AGPL). Nextstrain is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

flu_frequencies's People

Contributors

artpoon avatar corneliusroemer avatar huddlej avatar ivan-aksamentov avatar rneher avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

flu_frequencies's Issues

Unexpected polars panic exception when creating day count column in `fit_single_frequencies.py`

Current Behavior

When the flu frequencies workflow gets to the fit_single_frequencies.py step, recent versions of polars throw a panic exception with the following error message:

$ python scripts/fit_single_frequencies.py             --metadata data/vic/combined_na.tsv             --geo-categories region             --frequency-category clade             --min-date 2021-01-01             --days 14             --inclusive-clades flu             --output-csv results/vic_na/region-frequencies.csv
thread '<unnamed>' panicked at crates/polars-core/src/series/iterator.rs:74:9:
assertion `left == right` failed: impl error
  left: 4
 right: 1
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
--- PyO3 is resuming a panic after fetching a PanicException from Python. ---
Python stack trace below:
Traceback (most recent call last):
  File "/Users/jlhudd/miniconda3/envs/flu_frequencies/lib/python3.10/site-packages/polars/expr/expr.py", line 3976, in __call__
    result = self.function(*args, **kwargs)
  File "/Users/jlhudd/miniconda3/envs/flu_frequencies/lib/python3.10/site-packages/polars/expr/expr.py", line 4299, in wrap_f
    return x.map_elements(
  File "/Users/jlhudd/miniconda3/envs/flu_frequencies/lib/python3.10/site-packages/polars/series/series.py", line 5270, in map_elements
    self._s.apply_lambda(function, pl_return_dtype, skip_nulls)
pyo3_runtime.PanicException: assertion `left == right` failed: impl error
  left: 4
 right: 1
Traceback (most recent call last):
  File "/Users/jlhudd/projects/nextflu-reports/who-2024-02/flu_frequencies/scripts/fit_single_frequencies.py", line 163, in <module>
    data, totals, counts, time_bins = load_and_aggregate(d, args.geo_categories, freq_cat,
  File "/Users/jlhudd/projects/nextflu-reports/who-2024-02/flu_frequencies/scripts/fit_single_frequencies.py", line 44, in load_and_aggregate
    d = d.with_columns([pl.col('date').map_elements(lambda x: to_day_count(x, start_date)).alias("day_count")])
  File "/Users/jlhudd/miniconda3/envs/flu_frequencies/lib/python3.10/site-packages/polars/dataframe/frame.py", line 8270, in with_columns
    return self.lazy().with_columns(*exprs, **named_exprs).collect(_eager=True)
  File "/Users/jlhudd/miniconda3/envs/flu_frequencies/lib/python3.10/site-packages/polars/lazyframe/frame.py", line 1730, in collect
    return wrap_df(ldf.collect())
pyo3_runtime.PanicException: assertion `left == right` failed: impl error
  left: 4
 right: 1

I don't see any obvious changes to our input data between when this used to work and now. Downgrading polars to 0.20.3 allows the frequencies script to run without an error, suggesting that the issue first appeared in polars 0.20.4 (release Jan 12, 2024). This is all with Python 3.10.13 on an Intel Mac (OS version 12.6).

I confirmed that the error only occurs when calling the map_elements section of the failing expression above.

Possible solutions

As a band-aid, we could pin polars to 0.20.3 in the Conda environment.

As a longer-term solution, we might try to replace the officially discouraged map_elements call with a different approach.

Or we could switch to pandas.

All variant frequencies for South Asia region are 1

This seems to be due to a lack of data in this region. One way around this would be to add a level in the region-country hierarchy, so that this region would inherit global frequency estimates. Alternatively, we could allow frequency estimates for proximal countries to affect estimates for countries with completely missing data.

Suggested modifications to web interface

Some possible enhancements to interface:

  • toggle between by-country and by-clade views - I think currently user has to go up one level to do this there is not a one-to-one map between views (region versus variant)
  • change how clades are ordered - this can follow hierarchical nomenclature or decreasing order of frequency?
  • display clade frequencies as streamgraph?
  • collapse unselected clades into an "other" category?
  • clade frequency colour should map to location (region/country) checkbox interface
  • resize points by sample size?
  • it might be nice to have some deterministic system for assigning colours to clades and countries, such that related clades and adjacent countries have similar but distinguishable colours

Build local instance

  • Fairly naive build environment running Ubuntu 20.04.6, i.e., no npm or snakemake
  • cloned repo with git clone https://github.com/neherlab/flu_frequencies.git
  • checked out web branch (wanting to focus on contributing to front-end development)
  • created virtual environment for Python with python3 -m venv venv; source venv/bin/activate
  • used pip to install Python dependencies (pandas, matplotlib, polars)
  • unable to run frequencies.py, expecting different data files - seems that web branch is behind master
  • sudo apt install snakemake
  • installed Nextclade by following instructions at docs/dev/developer-guide.md - ran into problems with missing data folder, see nextstrain/nextclade#1140
  • sudo apt install npm installed npm 6.14.4
  • sudo npm install --global yarn installed yarn v1.22.19
  • running yarn attempted to install package dependencies, but this threw a large number of warnings for version discordance in dependencies, i.e., X has unmet peer dependency Y
  • I could run yarn add postcss etc. to manually add these dependencies to the project, but this modifies tracked files, i.e., yarn.lock and package.json, so it does not seem to be the right way to go about it
  • running npm run test also threw an error:
(venv) art@Kestrel:~/git/flu_frequencies/web$ npm run test

> [email protected] test /home/art/git/flu_frequencies/web
> yarn test:nowatch --watch --verbose 

yarn run v1.22.19
$ jest --config=config/jest/jest.config.js --passWithNoTests --watch --verbose
/home/art/git/flu_frequencies/web/node_modules/jest-cli/build/run.js:129
    if (error?.stack) {
              ^

SyntaxError: Unexpected token .
    at Module._compile (internal/modules/cjs/loader.js:723:23)
  • at this point I gave up with manual installation and attempted to build package from Dockerfile
  • installed docker with sudo apt install docker.io
  • sudo docker run hello-world runs OK
  • sudo docker build -f docker/docker-dev.dockerfile . fails at step 23 of 24:
The command 'bash -euxo pipefail -c set -euxo pipefail >/dev/null &&   if [ -z "$(getent group ${GID})" ]; then  
...
returned a non-zero code: 2

China region vs China country

In the web app, currently there are 2 "China"s in the list of locations. One is region ("China") and another is country ("CHN"). We need to disambiguate these.

Are these still different?

Options:

  • remove region
  • remove country
  • rename region to "China region" or similar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.