sgravel / tracts Goto Github PK

A set of tools for modelling ancestry patterns along the genome.

License: GNU General Public License v2.0

Python 64.62% Shell 0.07% Mathematica 35.32%

tracts's Introduction

Tracts

Tracts is a set of classes and definitions used to model migration histories based on ancestry tracts in admixed individuals. Time-dependent gene-flow from multiple populations can be modeled.

Changes in Tracts 2

Tracts is now a python package rather than a single .py file. Follow the installation instructions below.
Tracts now uses the matrix exponential form of the Phase-Type Distribution to calculate the tractlength distribution. This should not have resulted to changes to the interface. If it has, please report it in the issues.
Tracts no longer requires writing your own driver script. Instead, details about the simulation are read from a YAML file (examples below).
Demographic models also do not have to be handcoded anymore. They are now specified by a Demes-like YAML file (examples below).
Minor Patches: Fixed an issue with fixing multiple parameters from ancestry.

Examples

Examples contains sample hapmap data and scripts to analyze them, including two different gene flow models. It also contains a 3-population model for 1000 genomes puerto Rican data

Installation

To install:

Clone this repository
In your local copy, open a terminal.
Run pip install .

You can now import tracts as a python package.

Tracts is currently not distributed on PyPi or Conda.

Setting up a demographic model

Tracts attempts to predict a population's migration history from the distribution of ancestry tracts. The space of all migration matrices is very large: if we have p migrant populations over g generations, there can be n*g different migration rates. In tracts, demographic models are used to describe the migration matrix as a reduced number of migration events with flexible parameters. For example, a model can contain only a founding pulse of migration from two ancestral populations. In that case, the only parameters are the time of the pulse, and the migration rate from each population. The yaml file for such a model would look like:

demes:
  - name: EUR
  - name: AFR
  - name: X
  ancestors: [EUR, AFR]
  proportions: [R, 1-R]
  start_time: tx

Here, tracts deduces the sample population to be X. The parameter for the time of founding is named tx. Since migration at the founding pulse must add to 1, there is only one other parameter for model: R. The founding proportion from EUR is equal to R, and the founding proportion from AFR is 1-R.

To add more migration pulses to the model, add a pulses field to the YAML file:

pulses:
  - sources: [EUR]
    dest: X
    proportions: [P]
    time: t2

This represents a single pulse of migration from EUR to X. It occurs at time t2 with proportion P.

The full model would then look like:

demes:
  - name: EUR
  - name: AFR
  - name: X
  ancestors: [EUR, AFR]
  proportions: [R, 1-R]
  start_time: tx
pulses:
  - sources: [EUR]
    dest: X
    proportions: [P]
    time: t2

The pulses field can also contain more than one pulse:

pulses:
  - sources: [EUR]
    dest: X
    proportions: [P]
    time: t2
  - sources: [EUR]
    dest: X
    proportions: [P]
    time: t3

Here, the proportion of both pulse migrations is the same, but they occur at different times. Tracts allows for the linking of parameters in this way. This model would have 5 parameters: R, tx, P, t2, t3. If the pulses had different rates, the model would have 6 parameters instead.

Similar to pulses, continuous migrations can be specified in the migrations field:

migrations:
  - source: EUR
    dest: X
    rate: K
    start_time: t1
    end_time: t2

Driver File

Tracts is used by passing a driver yaml file to the method tracts.run_tracts(). The first part of the driver file tells tracts how to load the sample data:

samples:
  directory: .\G10\
  filename_format: "{name}_{label}.bed"
  individual_names: [
    "NA19700", "NA19701", "NA19704", "NA19703", "NA19819", "NA19818",
    "NA19835", "NA19834", "NA19901", "NA19900", "NA19909", "NA19908",
    "NA19917", "NA19916", "NA19713", "NA19982", "NA20127", "NA20126",
    "NA20357", "NA20356"
  ]
  labels: [A, B]
  chromosomes: 1-22

In this example, the samples are located in the 'G10' directory. The individual 'NA19700' has sample data in the files 'NA19700_A.bed' and 'NA19700_B.bed'.
The 'chromosomes' field tells tracts to use data from chromosomes 1 to 22. You can also specify a single chromosome or a list of chromosomes.

The details of the model are specified as a different YAML file. The model_filename field is used to tell tracts where to find this model YAML.

model_filename: pp.yaml

Tracts optimizes the parameters of the model to best match the distribution of ancestry tracts. Starting values for the parameters can be specified as numbers or ranges. Multiple repetitions can be run on the same data, and a seed can be used for repeatability.

start_params:
  R: 0.1-0.2
  tx: 10-11
  P:  0.03-0.05
  t2: 5.5
repetitions: 2
seed: 100

Tracts also allows for the time parameter to be scaled, as some optimizers run better when all parameters are on the same scale:

time_scaling_factor: 100

Likewise, tracts below a certain length (in centimorgans) can be excluded from the analysis.

exclude_tracts_below_cM: 10

Ancestry Fixing

Input

Tracts input is a set bed-style file describing the local ancestry of segments along the genome. The file has 2 extra columns for the cM positions of the segments. There are two input files per individuals (for each haploid genome copy).

chrom		begin		end			assignment	cmBegin	cmEnd
chr13		0			18110261	UNKNOWN	0.0			0.19
chr13		18110261	28539742	YRI			0.19		22.193
chr13		28539742	28540421	UNKNOWN	22.193		22.193
chr13		28540421	91255067	CEU		22.193		84.7013

Output

The 3-population exemple files produce 5 output files, e.g.

 boot0_-252.11_bins	boot0_-252.11_liks	boot0_-252.11_ord	boot0_-252.11_pred
 boot0_-252.11_dat	boot0_-252.11_mig	boot0_-252.11_pars

boot0 means that this is bootstrap iteration 0, which in the convention used here means the fit with the real data (in the two-population example, there is no bootstrap, so the output is named "out" and "out2" instead) -252.11 is the likelihood of the best-fit model

_bins: the bins used in the discretization
_dat: the observed counts in each bins
_pred: the predicted counts in each bin, according to the model
_mig: the inferred migration matrix, with the most recent generation at the top, and one column per migrant population. Entry i,j in the matrix represent the proportion of individuals in the admixed population who originate from the source population j at generation i in the past.
_pars: the optimal parameters. I.e., if these models are passed to the admixture model, it will return the inferred migration matrix.
_liks: the likelihoods in the model parameter space in the output format of scipy.optimizes' "brute" function: the first number is the best likelihood, the top matrices define the grid of parameters usedin the search, and the last matrix defines the likelihood at all grid points. see http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.brute.html

Contact

See the example files for example usage. If something isn't clear, please let me know by filing an "new issue", or emailing me.

FAQ

The distribution of tract lengths decreases as a function of tract length, but increases at the very last bin. This was not seen in the original paper. What is going on?

In tracts, the last bin represents the number of chromosomes with no ancestry switches. It does not correspond to a specific length value, and for this reason was not plotted in the tracts paper.

When I have a single pulse of admixture, I would expect an exponential distribution of tract length, but the distribution of tract lengths shows steps in the expected length. Why is that?

"Tracts" takes into account the finite length of chromosomes. Since ancestry tracts cannot extend beyond chromosomes, we expect this departure from an exponential distribution

I have migrants from the last generation. "tracts" tells me that migrants in the last two generations are not allowed. Why is that?

Haploid genomes from the last two generations have no ancestry switches and should be easy to identify in well-phased data--they should be removed from the sample before running tracts. If this is impossible (e.g., because of inaccurate phasing across chromosomes), tracts will likely attempt to assign last-generation migrants to two generations ago. This should be observable by an excess of very long tracts in the data compared to the model.

Individuals in my population vary considerably in their ancestry proportion. Is that a problem?

It is not a problem as long as the population was close to random mating. If admixture is recent, random mating is not inconsistent with ancestry variance. If admixture is ancient, however, variation in ancestry proportion may indicate population structure, and the random mating assumption may fail.

I ran the optimization steps many times, and found different optimal likelihoods. Why is that?

Optimizing functions in many dimensions is hard, and sometimes optimizers get stuck in local maxima. If you haven tried already, you can attempt to fix the ancestry proportions a priori (see the _fix examples in the documentation). In most cases, the optimization will converge to the global maximum a substantial proportion of the time: running the optimization a few times from random starting positions and comparing the best values may help control for this.

If you fail to revisit the same minimum after running say, 10 optimizations, then something else might be going on. If the model is not continuous as a function of a parameter, it could make the optimization much harder. Defining a continuous model would help, or you could try the brute-force optimization method if the number of parameters is small.

tracts's People

Contributors

Stargazers

Watchers

Forkers

domnelson xtmgah riddhishb falcaraz oasisye guzhongru chenyangsu ivan-krukov apragsdale mauerjh general-solution gonzalez-delgado santiago1234

tracts's Issues

Problems of installing tracts and running example files

Dear Prof. Gravel,
I am a researcher studying the genetic ancestry of the indigenous Siraya people in Taiwan. I have greatly benefited from your papers working on inferring admixture history based on the information of local genetic ancestry. I want to thank you for your brilliant and invaluable work!
We recently attempted to download and install the tracts program but encountered several error messages while running the example files. We are very eager to use tracts to understand the admixture history of the Siraya people (Austronesian-speaking) with the Taiwanese Sino-Tibetan populations.
Could you kindly assist us with these issues? Thank you very much for your time and assistance.
I really appreciate it.

Best regards,
Wen-Ya

As I am not proficient in Python, I will provide a detailed description of my actions. Please forgive the lengthiness of this message.

Running environment:
MacBook Pro (Chip:Apple M3 Pro) ; macOS Sonoma Version 14.3.1
Shell:zsh
Python: Virtual environment (Python 3.12.4)

Issue description:

The installation went successfully by typing "pip install .". However, error messages appeared when I type: "import tracts" under the python interact console - virtual environment (Python 3.12.4)

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/wenko/tracts/tracts/__init__.py", line 9, in <module>
from tracts import logs
ImportError: cannot import name 'logs' from partially initialized module 'tracts' (most likely due to a circular import) (/Users/wenko/tracts/tracts/__init__.py)

I somehow fixed the problem and executed "import tracts" successfully by the following steps as below. However, I still encountered error messages while running the example files. So I am not sure if I really fixed the problem properly or not.

Here are the steps I did to fix the problem:

add the following lines onto the "init.py" file

def initialize():
    from tracts import logs
    logs.setup_logging()

created a "tracts/tracts/logs.py" file with the following lines in the file:

def setup_logging():
    from tracts import some_function
    some_function()

def show_INFO():
    # Your show_INFO implementation
    pass

created a "tracts/tracts/utlis.py" file with the following lines in the file:

def utility_function():
    pass

I tried to run the examples files stored in the "3Pops" folder, without much success.. here is the error message I got (see the last couple lines at the bottom):

(venv) wenko@Wen-Yas-MacBook-Pro tracts % python taino_ppx_xxp.py
['taino_ppx_xxp.py']
/Users/wenko/tracts/taino_ppx_xxp.py:95: UserWarning: some files in the bed directory were ignored, since they do not end with `.bed`.
  warn("some files in the bed directory were ignored, since they do not "
booted data sample [np.int64(1775), np.int64(1376), np.int64(1108), np.int64(888), np.int64(767), np.int64(630), np.int64(482), np.int64(393), np.int64(344)]
evaluating at params [0.07 0.01]
1       , -4009.37    , array([ 0.07       ,  0.01       ])
evaluating at params [0.07 0.02]
2       , -2865.36    , array([ 0.07       ,  0.02       ])
evaluating at params [0.07 0.03]
3       , -2197.52    , array([ 0.07       ,  0.03       ])
evaluating at params [0.07 0.04]
4       , -1744.82    , array([ 0.07       ,  0.04       ])
evaluating at params [0.07 0.05]
5       , -1417.27    , array([ 0.07       ,  0.05       ])
evaluating at params [0.07 0.06]
6       , -1172.37    , array([ 0.07       ,  0.06       ])
evaluating at params [0.07 0.07]
7       , -1.11762e+32, array([ 0.07       ,  0.07       ])
evaluating at params [0.07 0.08]
/Users/wenko/tracts/tracts/legacy_models/models_3pop.py:151: RuntimeWarning: The iteration is not making good progress, as measured by the 
  improvement from the last ten iterations.
  (prop3, prop1) = scipy.optimize.fsolve(fun, (.2, .2)) #(.2,.2) is just the starting point for the optimization function, it should not be sensitive to this, but it's better to start with reasonable parameter values.
8       , -2e+32      , array([ 0.07       ,  0.08       ])
evaluating at params [0.07 0.09]
9       , -2e+32      , array([ 0.07       ,  0.09       ])
evaluating at params [0.07 0.1 ]
10      , -2e+32      , array([ 0.07       ,  0.1        ])
evaluating at params [0.07 0.11]
11      , -2e+32      , array([ 0.07       ,  0.11       ])
evaluating at params [0.08 0.01]
12      , -3460.7     , array([ 0.08       ,  0.01       ])
evaluating at params [0.08 0.02]
13      , -2383.69    , array([ 0.08       ,  0.02       ])
evaluating at params [0.08 0.03]
14      , -1773.48    , array([ 0.08       ,  0.03       ])
evaluating at params [0.08 0.04]
15      , -1370.87    , array([ 0.08       ,  0.04       ])
evaluating at params [0.08 0.05]
16      , -1087.26    , array([ 0.08       ,  0.05       ])
evaluating at params [0.08 0.06]
17      , -881.197    , array([ 0.08       ,  0.06       ])
evaluating at params [0.08 0.07]
18      , -730.392    , array([ 0.08       ,  0.07       ])
evaluating at params [0.08 0.08]
Pulse less than 0
19      , -8.73918e+33, array([ 0.08       ,  0.08       ])
evaluating at params [0.08 0.09]
20      , -2e+32      , array([ 0.08       ,  0.09       ])
evaluating at params [0.08 0.1 ]
21      , -2e+32      , array([ 0.08       ,  0.1        ])
evaluating at params [0.08 0.11]
22      , -2e+32      , array([ 0.08       ,  0.11       ])
evaluating at params [0.09 0.01]
23      , -3031.3     , array([ 0.09       ,  0.01       ])
evaluating at params [0.09 0.02]
24      , -2007.92    , array([ 0.09       ,  0.02       ])
evaluating at params [0.09 0.03]
25      , -1444.52    , array([ 0.09       ,  0.03       ])
evaluating at params [0.09 0.04]
26      , -1083.12    , array([ 0.09       ,  0.04       ])
evaluating at params [0.09 0.05]
27      , -836.049    , array([ 0.09       ,  0.05       ])
evaluating at params [0.09 0.06]
28      , -662.592    , array([ 0.09       ,  0.06       ])
evaluating at params [0.09 0.07]
29      , -541.032    , array([ 0.09       ,  0.07       ])
evaluating at params [0.09 0.08]
30      , -458.784    , array([ 0.09       ,  0.08       ])
evaluating at params [0.09 0.09]
31      , -1.11762e+32, array([ 0.09       ,  0.09       ])
evaluating at params [0.09 0.1 ]
32      , -2e+32      , array([ 0.09       ,  0.1        ])
evaluating at params [0.09 0.11]
33      , -2e+32      , array([ 0.09       ,  0.11       ])
evaluating at params [0.1  0.01]
34      , -2697.2     , array([ 0.1        ,  0.01       ])
evaluating at params [0.1  0.02]
35      , -1717.7     , array([ 0.1        ,  0.02       ])
evaluating at params [0.1  0.03]
36      , -1193.05    , array([ 0.1        ,  0.03       ])
evaluating at params [0.1  0.04]
37      , -866.123    , array([ 0.1        ,  0.04       ])
evaluating at params [0.1  0.05]
38      , -649.87     , array([ 0.1        ,  0.05       ])
evaluating at params [0.1  0.06]
39      , -504.116    , array([ 0.1        ,  0.06       ])
evaluating at params [0.1  0.07]
40      , -407.563    , array([ 0.1        ,  0.07       ])
evaluating at params [0.1  0.08]
41      , -347.986    , array([ 0.1        ,  0.08       ])
evaluating at params [0.1  0.09]
42      , -317.966    , array([ 0.1        ,  0.09       ])
evaluating at params [0.1 0.1]
Pulse less than 0
43      , -8.73918e+33, array([ 0.1        ,  0.1        ])
evaluating at params [0.1  0.11]
44      , -2e+32      , array([ 0.1        ,  0.11       ])
evaluating at params [0.11 0.01]
45      , -2441.44    , array([ 0.11       ,  0.01       ])
evaluating at params [0.11 0.02]
46      , -1498.46    , array([ 0.11       ,  0.02       ])
evaluating at params [0.11 0.03]
47      , -1006.38    , array([ 0.11       ,  0.03       ])
evaluating at params [0.11 0.04]
48      , -708.642    , array([ 0.11       ,  0.04       ])
evaluating at params [0.11 0.05]
49      , -518.684    , array([ 0.11       ,  0.05       ])
evaluating at params [0.11 0.06]
50      , -396.711    , array([ 0.11       ,  0.06       ])
evaluating at params [0.11 0.07]
51      , -321.753    , array([ 0.11       ,  0.07       ])
evaluating at params [0.11 0.08]
52      , -281.85     , array([ 0.11       ,  0.08       ])
evaluating at params [0.11 0.09]
53      , -269.797    , array([ 0.11       ,  0.09       ])
evaluating at params [0.11 0.1 ]
54      , -281.118    , array([ 0.11       ,  0.1        ])
evaluating at params [0.11 0.11]
55      , -1.11762e+32, array([ 0.11       ,  0.11       ])
evaluating at params [0.12 0.01]
56      , -2251.55    , array([ 0.12       ,  0.01       ])
evaluating at params [0.12 0.02]
57      , -1339.4     , array([ 0.12       ,  0.02       ])
evaluating at params [0.12 0.03]
58      , -875.015    , array([ 0.12       ,  0.03       ])
evaluating at params [0.12 0.04]
59      , -602.275    , array([ 0.12       ,  0.04       ])
evaluating at params [0.12 0.05]
60      , -434.962    , array([ 0.12       ,  0.05       ])
evaluating at params [0.12 0.06]
61      , -333.577    , array([ 0.12       ,  0.06       ])
evaluating at params [0.12 0.07]
62      , -277.392    , array([ 0.12       ,  0.07       ])
evaluating at params [0.12 0.08]
63      , -254.657    , array([ 0.12       ,  0.08       ])
evaluating at params [0.12 0.09]
64      , -258.384    , array([ 0.12       ,  0.09       ])
evaluating at params [0.12 0.1 ]
65      , -284.183    , array([ 0.12       ,  0.1        ])
evaluating at params [0.12 0.11]
66      , -329.239    , array([ 0.12       ,  0.11       ])
evaluating at params [0.13 0.01]
67      , -2118.14    , array([ 0.13       ,  0.01       ])
evaluating at params [0.13 0.02]
68      , -1232.3     , array([ 0.13       ,  0.02       ])
evaluating at params [0.13 0.03]
69      , -791.708    , array([ 0.13       ,  0.03       ])
evaluating at params [0.13 0.04]
70      , -540.557    , array([ 0.13       ,  0.04       ])
evaluating at params [0.13 0.05]
71      , -392.901    , array([ 0.13       ,  0.05       ])
evaluating at params [0.13 0.06]
72      , -309.464    , array([ 0.13       ,  0.06       ])
evaluating at params [0.13 0.07]
73      , -269.707    , array([ 0.13       ,  0.07       ])
evaluating at params [0.13 0.08]
74      , -262.046    , array([ 0.13       ,  0.08       ])
evaluating at params [0.13 0.09]
75      , -279.596    , array([ 0.13       ,  0.09       ])
evaluating at params [0.13 0.1 ]
76      , -318.153    , array([ 0.13       ,  0.1        ])
evaluating at params [0.13 0.11]
77      , -375.055    , array([ 0.13       ,  0.11       ])
evaluating at params [0.14 0.01]
78      , -2033.92    , array([ 0.14       ,  0.01       ])
evaluating at params [0.14 0.02]
79      , -1170.75    , array([ 0.14       ,  0.02       ])
evaluating at params [0.14 0.03]
80      , -750.763    , array([ 0.14       ,  0.03       ])
evaluating at params [0.14 0.04]
81      , -518.398    , array([ 0.14       ,  0.04       ])
evaluating at params [0.14 0.05]
82      , -387.916    , array([ 0.14       ,  0.05       ])
evaluating at params [0.14 0.06]
83      , -320.219    , array([ 0.14       ,  0.06       ])
evaluating at params [0.14 0.07]
84      , -294.926    , array([ 0.14       ,  0.07       ])
evaluating at params [0.14 0.08]
85      , -300.604    , array([ 0.14       ,  0.08       ])
evaluating at params [0.14 0.09]
86      , -330.444    , array([ 0.14       ,  0.09       ])
evaluating at params [0.14 0.1 ]
87      , -380.444    , array([ 0.14       ,  0.1        ])
evaluating at params [0.14 0.11]
88      , -447.904    , array([ 0.14       ,  0.11       ])
evaluating at params [0.12 0.08]
89      , -254.657    , array([ 0.12       ,  0.08       ])
evaluating at params [0.126 0.08 ]
90      , -255.175    , array([ 0.126      ,  0.08       ])
evaluating at params [0.12  0.084]
91      , -252.318    , array([ 0.12       ,  0.084      ])
evaluating at params [0.114 0.084]
92      , -260.462    , array([ 0.114      ,  0.084      ])
evaluating at params [0.123 0.081]
93      , -252.852    , array([ 0.123      ,  0.081      ])
evaluating at params [0.123 0.085]
94      , -253.713    , array([ 0.123      ,  0.085      ])
evaluating at params [0.12225 0.08375]
95      , -252.478    , array([ 0.12225    ,  0.08375    ])
evaluating at params [0.11925 0.08675]
96      , -253.775    , array([ 0.11925    ,  0.08675    ])
evaluating at params [0.1220625 0.0824375]
97      , -252.265    , array([ 0.122063   ,  0.0824375  ])
evaluating at params [0.1198125 0.0826875]
98      , -252.57     , array([ 0.119812   ,  0.0826875  ])
evaluating at params [0.12164063 0.08348437]
99      , -252.22     , array([ 0.121641   ,  0.0834844  ])
evaluating at params [0.12370313 0.08192187]
100     , -252.937    , array([ 0.123703   ,  0.0819219  ])
evaluating at params [0.12092578 0.08348047]
101     , -252.159    , array([ 0.120926   ,  0.0834805  ])
evaluating at params [0.12050391 0.08452734]
102     , -252.367    , array([ 0.120504   ,  0.0845273  ])
evaluating at params [0.12167285 0.08295996]
103     , -252.18     , array([ 0.121673   ,  0.08296    ])
evaluating at params [0.12095801 0.08295605]
104     , -252.169    , array([ 0.120958   ,  0.0829561  ])
evaluating at params [0.12021094 0.08347656]
105     , -252.257    , array([ 0.120211   ,  0.0834766  ])
evaluating at params [0.12130737 0.08308911]
106     , -252.15     , array([ 0.121307   ,  0.0830891  ])
evaluating at params [0.12127515 0.08361353]
107     , -252.186    , array([ 0.121275   ,  0.0836135  ])
evaluating at params [0.12103729 0.08312042]
108     , -252.151    , array([ 0.121037   ,  0.0831204  ])
evaluating at params [0.12141888 0.08272906]
109     , -252.174    , array([ 0.121419   ,  0.0827291  ])
evaluating at params [0.12104906 0.08329262]
110     , -252.149    , array([ 0.121049   ,  0.0832926  ])
evaluating at params [0.12131914 0.08326131]
111     , -252.154    , array([ 0.121319   ,  0.0832613  ])
evaluating at params [0.12110775 0.08315564]
112     , -252.148    , array([ 0.121108   ,  0.0831556  ])
evaluating at params [0.12084944 0.08335915]
113     , -252.158    , array([ 0.120849   ,  0.0833591  ])
evaluating at params [0.12119289 0.08315662]
114     , -252.147    , array([ 0.121193   ,  0.0831566  ])
evaluating at params [0.12125159 0.08301965]
115     , -252.151    , array([ 0.121252   ,  0.0830196  ])
evaluating at params [0.12109969 0.08322438]
116     , -252.147    , array([ 0.1211     ,  0.0832244  ])
evaluating at params [0.12118482 0.08322535]
117     , -252.148    , array([ 0.121185   ,  0.0832254  ])
evaluating at params [0.12116556 0.08320793]
118     , -252.147    , array([ 0.121166   ,  0.0832079  ])
evaluating at params [0.12125876 0.08314017]
119     , -252.148    , array([ 0.121259   ,  0.0831402  ])
evaluating at params [0.12113946 0.08320332]
120     , -252.147    , array([ 0.121139   ,  0.0832033  ])
evaluating at params [0.12111212 0.08325463]
121     , -252.148    , array([ 0.121112   ,  0.0832546  ])
evaluating at params [0.1211727  0.08318112]
122     , -252.147    , array([ 0.121173   ,  0.0831811  ])
(array([0.12113946, 0.08320332]), (np.float64(252.14712890712022), array([[[0.07, 0.07, 0.07, 0.07, 0.07, 0.07, 0.07, 0.07, 0.07, 0.07,
         0.07],
        [0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08,
         0.08],
        [0.09, 0.09, 0.09, 0.09, 0.09, 0.09, 0.09, 0.09, 0.09, 0.09,
         0.09],
        [0.1 , 0.1 , 0.1 , 0.1 , 0.1 , 0.1 , 0.1 , 0.1 , 0.1 , 0.1 ,
         0.1 ],
        [0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11,
         0.11],
        [0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12,
         0.12],
        [0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13,
         0.13],
        [0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14,
         0.14]],

       [[0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1 ,
         0.11],
        [0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1 ,
         0.11],
        [0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1 ,
         0.11],
        [0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1 ,
         0.11],
        [0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1 ,
         0.11],
        [0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1 ,
         0.11],
        [0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1 ,
         0.11],
        [0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1 ,
         0.11]]]), array([[4.00937164e+03, 2.86535944e+03, 2.19751656e+03, 1.74481974e+03,
        1.41726543e+03, 1.17236593e+03, 1.11762423e+32, 1.99999999e+32,
        1.99999999e+32, 1.99999999e+32, 1.99999999e+32],
       [3.46069971e+03, 2.38369493e+03, 1.77347622e+03, 1.37087103e+03,
        1.08725574e+03, 8.81197252e+02, 7.30391549e+02, 8.73918033e+33,
        1.99999999e+32, 1.99999999e+32, 1.99999999e+32],
       [3.03129581e+03, 2.00791790e+03, 1.44451546e+03, 1.08312296e+03,
        8.36048848e+02, 6.62592434e+02, 5.41032336e+02, 4.58783952e+02,
        1.11762423e+32, 1.99999999e+32, 1.99999999e+32],
       [2.69720497e+03, 1.71770145e+03, 1.19305416e+03, 8.66123377e+02,
        6.49870230e+02, 5.04115951e+02, 4.07563224e+02, 3.47986448e+02,
        3.17966296e+02, 8.73918033e+33, 1.99999999e+32],
       [2.44144033e+03, 1.49845942e+03, 1.00637807e+03, 7.08642175e+02,
        5.18683935e+02, 3.96711371e+02, 3.21752867e+02, 2.81850080e+02,
        2.69796979e+02, 2.81117640e+02, 1.11762423e+32],
       [2.25155263e+03, 1.33939821e+03, 8.75014534e+02, 6.02274586e+02,
        4.34962250e+02, 3.33576821e+02, 2.77392169e+02, 2.54657012e+02,
        2.58384106e+02, 2.84183226e+02, 3.29238627e+02],
       [2.11814134e+03, 1.23230011e+03, 7.91707774e+02, 5.40557363e+02,
        3.92900964e+02, 3.09464106e+02, 2.69706916e+02, 2.62045757e+02,
        2.79596065e+02, 3.18153383e+02, 3.75054737e+02],
       [2.03392325e+03, 1.17075133e+03, 7.50763045e+02, 5.18398481e+02,
        3.87916310e+02, 3.20218782e+02, 2.94925679e+02, 3.00603943e+02,
        3.30444291e+02, 3.80444099e+02, 4.47904490e+02]])))
Traceback (most recent call last):
  File "/Users/wenko/tracts/taino_ppx_xxp.py", line 183, in <module>
    for line in optmod.mig:
                ^^^^^^^^^^
AttributeError: 'demographic_model' object has no attribute 'mig'
(venv) wenko@Wen-Yas-MacBook-Pro tracts %

From rfmix2 to Tracts

Dear all,
I am sharing a tailored script for converting RFMix2 output into TRACTS input.

# rfmix2tracts.py

import pandas as pd
import sys
import os

def find_header_row(file_path):
    """
    Find the header row in the file. It is the first line that starts with '#'.
    """
    with open(file_path, 'r') as file:
        for i, line in enumerate(file):
            if line.startswith('#'):
                return i
    return None

def process_rfmix_to_tracts(input_file, output_folder):
    """
    Process an RFMix output file and generate TRACTS input files.
    """
    # Find the header row
    header_row = find_header_row(input_file)
    if header_row is None:
        print("Header row could not be found.")
        return

    # Read the input file
    data = pd.read_csv(input_file, sep='\t', header=header_row, skiprows=1)

    # Get the list of samples (excluding the first 6 columns)
    samples = data.columns[6:]

    # Process each sample (in case of a file with multiple samples)
    for sample in samples:
        # Determine the output file name based on the haplotype (.0 or .1)
        file_suffix = '_B.viterbi.bed.cm' if sample.endswith('.1') else '_A.viterbi.bed.cm'
        output_file = os.path.join(output_folder, f"{sample.split('.')[0]}{file_suffix}")

        # Extract the relevant columns for the sample and reorder them
        sample_data = data[['#chm', 'spos', 'epos', sample, 'sgpos', 'egpos']]
        sample_data.columns = ['chrom', 'begin', 'end', 'assignment', 'cmBegin', 'cmEnd']

        # Save to file
        sample_data.to_csv(output_file, sep='\t', index=False)
    
    print("Process completed. The files have been generated.")

if __name__ == "__main__":
    if len(sys.argv) != 3:
        print("Usage: python script.py <input_file> <output_folder>")
    else:
        input_file_path = sys.argv[1]
        output_folder_path = sys.argv[2]
        process_rfmix_to_tracts(input_file_path, output_folder_path)

# Contact: [email protected]
# GitHub: marsicoFL

# Example of how to run the script:
# python rfmix2tracts.py /path/to/rfmix2output.msp.tsv /path/to/output/folder/
# Or if you want to run it in the same folder:
# python rfmix2tracts.py toy.msp.tsv .

Best,
Franco

func_args is not used

func_arg is a leftover argument from the original dadi code. It was provided as a way to pass additional arguments to the optimization function. It was never implemented, and should probably be removed.

Implement YAML representation for sex-biased admixture

We want to add sex-biased admixture capability to tracts, and we will need to generalize the YAML format to handle this.

Erlang should be static function, not method

Generalize parameterized demography to handle sex-biased demography

3 migration w/ 3 population Models

Simon

I'm trying to run tracts using a 3-migration pulses model.
Can you provide a driver file to help me with the parameters I should change? I'm having problems with "slices" and "bounds".

Thanks in advance.

Make python 3 compatible

Use phase-type distributions to speed things up and generalize

issue w\ input file

Dear all,
thank you for this tool, I would like to use it on my data but unfortunately I am getting the error

ValueError: chromosome pairs of different lengths!

the example is running well, I am thinking that it might be a problem of input file but I am not able to figure out the issue.
Thank you in advance,
Alessandro

Make python3 the main branch once it is tested thoroughly

ll_scale not used in many optimizers, and description unclear

How to cite your software?

Hello,

If I used your software, what article should I cite you？

Best wishes

Make model specification clearer

Use a demes-like YAML file to specify demographic model

Add Travis CL for testing builds with PRs

We should have automatic testing of the PR's by adding a Travis CL with proper dependencies. It makes the code review simpler.

Add tests

Add more tests throughout

4 populations sample models required?

Dear Simon,

I have tried to run my admixed population with 2 and 3-population models in your document. However, structure analyses indicated 4 ancestral populations in my cohort.

Could you suggest me how I can find a sample script for 4-population tracts analyses?

Thankyou for your great software and help.

Cheers,
James

chrom argument in ancestry_at_pos function

The ancestry_at_pos function is not used in most tracts applications. It is meant to compute the proportion of ancestry at a given position. It uses bad naming conventions. In particular, it seems to redefine the argument chrom in the loop, and the argument chrom itself shadows a global definition.

Variable naming convention

Class names do not follow pep8 (CamelCase).

Add demes support

Specifying demographic models is always a bit of a pain in tracts. It would be easier is we could specify demographic models using the demes specification. For this, we will need:
-To port tracts.py to python 3
-To add a function translating a yaml file to a migration matrix. (this could be done in tracts or in demes)
-Find a way to parameterize models in the yaml file (i.e., we want to build a function that takes in parameters and a yaml model and outputs a migration matrix. I think the we can draw inspiration from the demes support in moments.

len used as variable name in chopair

should be renamed to length to avoid shadowing builtin

Error when attempting to fix multiple parameters using ancestry proportions.

Tracts gives the following error when attempting to fix multiple parameters using ancestry proportions.

    Traceback (most recent call last):
      File "/share/hennlab/projects/tracts2_testing/at2.py", line 14, in <module>
        tracts.run_tracts('test_driver.yaml')
      File "/share/hennlab/projects/tracts2_testing/tracts/tracts/driver.py", line 59, in run_tracts
        params_found, likelihoods = run_model_multi_params(func, bound, pop, population_labels, parse_start_params(driver_spec['start_params'], driver_spec['repetitions'], driver_spec['seed'], model, time_scaling_factor), exclude_tracts_below_cM=exclude_tracts_below_cM)
      File "/share/hennlab/projects/tracts2_testing/tracts/tracts/driver.py", line 157, in run_model_multi_params
        params_found, likelihood_found = run_model(model_func, bound_func, population, population_labels, start_params, exclude_tracts_below_cM=exclude_tracts_below_cM)
      File "/share/hennlab/projects/tracts2_testing/tracts/tracts/driver.py", line 168, in run_model
        xopt = tracts.optimize_cob(startparams, bins, Ls, data, nind, model_func, outofbounds_fun=bound_func, cutoff=cutoff, epsilon=1e-2)
      File "/home/awreynolds/.local/lib/python3.10/site-packages/tracts/core.py", line 1846, in optimize_cob
        outputs = scipy.optimize.fmin_cobyla(
      File "/home/awreynolds/.local/lib/python3.10/site-packages/scipy/optimize/_cobyla_py.py", line 34, in wrapper
        return func(*args, **kwargs)
      File "/home/awreynolds/.local/lib/python3.10/site-packages/scipy/optimize/_cobyla_py.py", line 181, in fmin_cobyla
        sol = _minimize_cobyla(func, x0, args, constraints=con,
      File "/home/awreynolds/.local/lib/python3.10/site-packages/scipy/optimize/_cobyla_py.py", line 34, in wrapper
        return func(*args, **kwargs)
      File "/home/awreynolds/.local/lib/python3.10/site-packages/scipy/optimize/_cobyla_py.py", line 264, in _minimize_cobyla
        f = c['fun'](x0, *c['args'])
      File "/share/hennlab/projects/tracts2_testing/tracts/tracts/driver.py", line 46, in <lambda>
        bound = lambda params: model.out_of_bounds(scale_select_indices(params, model.is_time_param(), time_scaling_factor))
      File "/home/awreynolds/.local/lib/python3.10/site-packages/tracts/parametrized_demography.py", line 387, in out_of_bounds
        return super().out_of_bounds(self.get_full_params(params))
      File "/home/awreynolds/.local/lib/python3.10/site-packages/tracts/parametrized_demography.py", line 409, in get_full_params
        solved_params = scipy.optimize.fsolve(lambda params_to_solve: _param_objective_func(self, params_to_solve), (.2,))
      File "/home/awreynolds/.local/lib/python3.10/site-packages/scipy/optimize/_minpack_py.py", line 162, in fsolve
        res = _root_hybr(func, x0, args, jac=fprime, **options)
      File "/home/awreynolds/.local/lib/python3.10/site-packages/scipy/optimize/_minpack_py.py", line 228, in _root_hybr
        shape, dtype = _check_func('fsolve', 'func', func, x0, args, n, (n,))
      File "/home/awreynolds/.local/lib/python3.10/site-packages/scipy/optimize/_minpack_py.py", line 25, in _check_func
        res = atleast_1d(thefunc(*((x0[:numinputs],) + args)))
      File "/home/awreynolds/.local/lib/python3.10/site-packages/tracts/parametrized_demography.py", line 409, in <lambda>
        solved_params = scipy.optimize.fsolve(lambda params_to_solve: _param_objective_func(self, params_to_solve), (.2,))
      File "/home/awreynolds/.local/lib/python3.10/site-packages/tracts/parametrized_demography.py", line 403, in _param_objective_func
        full_params = self.insert_params(full_params, params_to_solve)
      File "/home/awreynolds/.local/lib/python3.10/site-packages/tracts/parametrized_demography.py", line 440, in insert_params
        raise ValueError('Incorrect number of parameters to be solved')
    ValueError: Incorrect number of parameters to be solved```

Refactor driver script

Driver scripts currently mix parameters and operations. Get rid of that.

defining model and run tracts

Hello,
This is a basic question about how to run tracts. However, I am a little confused about how to run this by using the example files. Has tracts a command line interface or options (e.g., python tracts.py --input-file or something)? or do I need to modify some of the examples to run my desired model?

I wan to infer migration based on the ancestry tracts that I have inferred from RFMIX, so it is a very basic model for a Latin American population (e.g., PEL Peruvians from Lima from 1000Genomes), in which I want to know at what time European, African and Native American ancestries meet back in time (migration starting back in time), but it is not clear to me how run tracts in python? please, could you help with this by let me know how to run the examples?

issues running tracts2

Hi all,
Thank you for developing this amazing software
I am trying to run tracts2 but I am running into some issues

after creating a conda env and installing tracts, I tried to run the example
The first problem was :

AttributeError: module 'tracts' has no attribute 'driver

This was solved by including from tracts import driver in the Python script

Same problem with ParametrizedDemography (I need it to change the name of parametrized_demography to ParametrizedDemography)

However, now I have this issue (on Mac) that I don't know how to solve:

ImportError: cannot import name 'PhaseTypeDistribution' from 'tracts'

and on my server (linux), I have the following:

ImportError: cannot import name 'logs' from partially initialized module 'tracts' (most likely due to a circular import) (/local/chib/toconnor_grp/victor/miniconda3/envs/RunningJasmine/envs/tracts/lib/python3.12/site-packages/tracts/init.py)

Please can you help me with this?
Have a great day!

Method for model selection

After reading the paper, and some source code, it is still not clear to me what the method of model selection is.