Coder Social home page Coder Social logo

ag1000g-phase1-vgsc-report's People

Contributors

alimanfoo avatar cclarkson avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

ag1000g-phase1-vgsc-report's Issues

Methods

Text for methods section sufficient for partner review.

Quantify EHH differences between core haplotypes

Add numbers and possibly a supp fig to address question of whether EHH is significantly different between different core haplotypes.

E.g., compute integrated haplotype homozygosity (IHH) for each core haplotype, compute confidence intervals somehow, maybe plot with error bars.

Supplementary Tables: Data for assay design

Produce supplementary data tables as described in sub-section about assay design. Two tables envisaged: (1) table of SNPs with information to help primer design and initial SNP selection; (2) table with all haplotypes and alleles sufficient for building a classifier.

Network F5 missing R254K

I think a label is missing for a red edge into the big node. No urgency to fix for partner review but raising so we don't forget.

Further map work

Following on from work done in #94, there are still some unresolved points, including:

  • Improving aesthetics of F1 origin area and all arrows
  • Figure need labels for Norris/Clarkson
  • Figure labels for the three Cameroon sites (?)
  • There area arrows on both ends of the spreading outbreaks other than F1. This is ambiguous as it could mean we infer movement in both directions, or it could mean we aren't sure. Better way to represent uncertainty?

Quantify cluster correspondence

Add number(s) quantifying the correspondence between the clusters obtained via the networks and the clusters previously obtained via hierarchical clustering.

Nuance "20 SNPs"

In the assay design section we end with how great it is that we can evaluate these haplotypes in the field using just 20 SNPs. Make clear that this is possible because of the preceding genomics.

Revise text and supp figs refining haplotype clusters (i.e., S4/S5, L1)

  • Cut all text and figs relating to haplotype age analyses.
  • Replace Dxy supp figs with haplotype homozygosity supp figs.
  • Rewrite text on further analyses of haplotype clustering to explain haplotype homozygosity analysis, what it shows re S4/S5.
  • Cut text on I1527T haplotypes from results section on genetic backgrounds.

Remove data/phase2_samples.meta.txt

The notebook Figure_1_RelativeAges.ipynb uses sample metadata from phase 2. If this is needed, load from external release directory rather than copy into data folder.

Figure(s): Assay design

Figure(s) showing results for assay design sub-section. Possibly (1) figure showing information gain per SNP over the region, and/or (2) figure showing cross-validation results for decision trees with increasing numbers of SNPs.

Conda environment upgrades

There are new releases of graphviz Python wrapper (0.6.0) and SciPy (0.19.0). Worth updating at some point.

Methods

Initial pass on methods, break down into further issues as required.

Workaround haplotypes bug in phase 1 AR3.1

Nick has found a bug in the AR3.1 release such that the Zarr format files for the haplotypes are not correct and are not the same as the HDF5 format files for the haplotypes.

Currently our analyses use the Zarr format files. We can work around this by using the HDF5 format files instead everywhere. This will require modifying the setup modules to use the HDF5 files, then rerunning all notebooks that depend on the haplotypes (table 1, figure 1, ... everything basically).

Final edits to abstract and intro

Make any edits after revisions to results and discussion, e.g., soften language about predicting phenotype (e.g., say discuss putative phenotype).

Network Multi-allelics and cluster membership

Currently network code breaks if haplotypes with multiallelics are used in conjunction with network_method='mjn'

Being able to discern network cluster/node membership would be useful to check concordance with other methods.

References

Check through text, any missing references needed before partner review?

Figure: Map

Revisit the map, make some tweaks to convey some additional uncertainties.

hapclust.locate_recombinants

Hi @alimanfoo

Here's the error from when I tried to used locate_recombinants with the largest haplogroup. All other haplogroups worked fine.

IndexError Traceback (most recent call last)
in ()
1 #remove recombinants - F3 won't build if max-dist is >1 - maybe too complex
----> 2 idx_rec = hapclust.locate_recombinants(cluster_haps, debug=False)
3 #idx_rec

~/Git/agam-vgsc-report/agam-report-base/src/python/hapclust.py in locate_recombinants(h, debug)
816 def locate_recombinants(h, debug=False):
817 """Locate recombinant haplotypes via the four gamete test."""
--> 818 count = count_gametes(np.asarray(h, dtype='i1'))
819 d = np.all(count > 0, axis=(2, 3))
820 # indices of recombinant haplotypes - N.B., keep track of different possible solutions

~/Git/agam-vgsc-report/agam-report-base/src/python/hapclust_opt.pyx in hapclust_opt.count_gametes (/home/chris/.pyxbld/temp.linux-x86_64-3.5/pyrex/hapclust_opt.c:2346)()

IndexError: Out of bounds on buffer access (axis 3)

Soften language on phenotype prediction throughout

E.g., DK mentioned sentence, "large nodes consistent with positive selection, suggestive of a functional role in pyrethroid resistance...". Our work raises questions about functionality that need a concerted effort to resolve experimentally.

Discussion: DDT

Discussion paragraph on DDT. If prior selection for DDT resistance, how might this complicate the picture? Do we see any evidence for multiple phases of selection?

Results: Resistance outbreaks

Review text of the "Insecticide resistance outbreaks" sub-section, any changes to suggest prior to partner review?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.