Coder Social home page Coder Social logo

betascan's People

Contributors

ksiewert avatar wharvey31 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

betascan's Issues

substitution data input format

Hey Katie,

Are you able to provide an example of the input file required for calculating B2? I'm generating the input files myself as I'm unable to use glactools (I'm using a non-model and the ancestral alleles have been called in a kinda homebrew way - i.e. are not in an epo file)

Thank you!

Josie

TypeError: type NoneType doesn't define __round__ method

Hello,

I am running Betascan for 20 chromosomes without outgroup information. While running it with all chromosomes, an error pops up (below here), but it computes the values when run with individual chromosomes separately and generates Beta* values for each chromosome. Is it normal to use individual chromosomes to calculate Beta*? Could you help me with this?

I would greatly appreciate it.

Here's the command I used for Betascan:

python BetaScan.py -i transformed_PL.beta.txt -fold -w  2000 -o betascan_output/pb_betascores_transformed.txt

Error

Traceback (most recent call last):
File "/home/edelab/asim_work/vcf_analysis/files_for_vcf/gdgeno_snps_data/nucleotide_diversity/tajimaD/vcftools_new/vcftools/betascan/BetaScan.py", line 685, in
main()
File "/home/edelab/asim_work/vcf_analysis/files_for_vcf/gdgeno_snps_data/nucleotide_diversity/tajimaD/vcftools_new/vcftools/betascan/BetaScan.py", line 673, in main
output.write(str(loc)+"\t"+str(round(B, 6))+"\n") # Remove thetas
^^^^^^^^^^^
TypeError: type NoneType doesn't define round method

ZeroDivisionError: float division by zero

when I use --freq of vcftools, then "awk -F "\t|:" '(NR>1) && ($8!='0') && ($8!='1') && ($3=='2') {OFS="\t"; print$2,$8*$4,$4}' chr5.southAmerica.frq > chr5.southAmerica.frq.txt"
and "python betascan.py -i chr5.USAMS.frq.txt > chr5.USAMS.betascore" count one chromosome of ten individuals
the fllowing promblem occour:
Traceback (most recent call last):
File "betascan.py", line 613, in
main()
File "betascan.py", line 529, in main
freq = freqCount/sampleN
ZeroDivisionError: float division by zero

Betascore within windows

Hi @ksiewert and all BetaScan users,

Any clue to get the BetaScan value within a window, e.g., a genome region of 2000bp? One way I can think of is to choose the highest value of all the snp within the window.

Thanks!

TypeError: a float is required

Hi Katie,

I'm having a problem with running BetaScan. My command is:

python BetaScan.py -i snps.beta.txt.gz -fold -o out.betascores.txt

and the error I'm receiving:

Traceback (most recent call last): File "BetaScan.py", line 613, in <module> main() File "BetaScan.py", line 602, in main output.write(str(loc)+"\t"+str(round(B,6))+"\n") #Remove thetas TypeError: a float is required

I'm using Python 2.7.17. I'd appreciate any suggestions you have please.

Cheers

How to calculate the theta and Watterson's theta?

Hi, I'm trying to calculate B2 using Betascan. This needs the theta value as a parameter, and how should I calculate it based on vcf file or beta input file? if I set DivTime parameter, this also need the 4Neu in BALLET. Thank you!

problems about sample size

Hi,
I'm having a problem with running BetaScan. My command is:

python BetaScan.py -i LRS_chr3.beta.txt.gz -fold -o LRS_chr3.betascores.txt

and the error I'm receiving:

Error: Sample size must be greater than 3 haploid individuals to make inference, or else theta_beta will always equal theta_watterson's. You may wish to increase the m paramter value to exclude this SNP from being a core SNP.

But actually, my sample size was including 24 individuals, which is more than 3, I think?
May I have some help from anybody? Thanks!

Outgroup selection

Hi,

Thanks for designing this awesome software.

I have some questions regarding how to choose the outgroup.

  1. What is the principle of outgroup selection and what is the maximum divergence time?
  2. Should I take the mapping ratio (when I map the outgroup to my species) into account when I choose the outgroup?
  3. If my species has a sister species, the assumption should be the balancing selection is only acting on my species, not the sister species, is it correct?

Best,
Xiaomeng

Error, missing data in the EPO file

Hello.

I have an issue with calling ancestral alleles.

The command line was same as the tutorial:
glactools vcfm2acf --onlyGT --epo all.epo.gz --fai human_g1k_v37.fasta.fai test.vcf > test.acf.gz

The error message was
Error, missing data in the EPO file

Could you explain what this means?

Thanks.

chromosome information and beta score is none

Hi
I have two questions when I use the software.

  1. For the calculation of beta, if it is the whole genome, does it need to be calculated by chromosome? Because the output does not contain chromosome information, only location, should not be distinguishable? Or does this statistic just use the position, it doesn't need chromosome information?
  2. The following error occurred while running, some beta score in the output is NONE. How can I solve it?
    image

image

Thanks

Negative value

Hi,
Thank you for the friendly script!
I found a large proportion of negative values were in the output of Beta1*, is this normal?
Thank you!

high beta score values

Dear Katie,

Thank you very much for BetaScan program! It's super fast and really handy.

I've recently used the program to detect balancing selection in some SNPs of interest, knowing that they have non-synonumous substitutions.
Since, I was struggling to use an outgroup, I used the beta1 score.

I have a VCF of 60 samples of a single population. I filtered my VCF for i) remove-indels, ii) biallelic, iii) maf 0.05 and missing data up to 20% of the population.

Following your tutorial I used glactools to get the input file for each chromosome. After running the program I got some really high betascores in intergenic regions, but not for my SNPs of interest. I used several windows but still the extreme high scores are still there. The high scores are ~ 40 to 50 for a w 1000. The scores of my SNP of interest are around 4 to 5 using w 1000, values that are quite similar to your paper (and others), but when I use the suggested 1% of the whole scores, my SNPs' score of interest are way very down to the list, due to the others high scores.
When I use lower window size (e.g. 200bp) the high scores are dropped to ~7 to 8, but also my SNP's scores are also dropped (~1.8).

Also, I tried to calculate the z-score for significance, but these high scores are throwing my SNPs under the bus.

Any suggestions please for these high values? Or is it normal to get such values?

Thanks!
best,
George

Unexpected results

Dear Katherine,

We recently used Betascan2 on the datasets generated by Singhal et al. 2015 (https://science.sciencemag.org/content/350/6263/928.abstract). This dataset is composed of 19 Zebra finch individuals sequenced at a coverage >10X.

When we applied Betascan2 (both the folded and unfolded version), I was surprised to see that the SNP with the highest Beta* statistics had, on average, a frequency that were lower than the rest of the genome (typically a frequence = 3% vs 5%).

Do you have any idea of what is going one? Do you have any advice on what should we check to see if the program is running correctly?

Thank-you for your help,
Benoit Nabholz

B2 error: TypeError: a float is required

Hi ksiewert,

I met the question in calculating B2 about 'TypeError: a float is required':
Traceback (most recent call last):
File "BetaScan.py", line 613, in
main()
File "BetaScan.py", line 604, in main
output.write(str(loc)+"\t"+str(round(B,6))+"\t"+str(round(T,6))+"\n")
TypeError: a float is required

and I have try to add the following right before line 604 in the code,
print loc
print B
print type(loc)
print type(B)

and get the outputs:
<type 'int'>
<type 'numpy.float64'>
463810
0
<type 'int'>
<type 'int'>
464483
0
<type 'int'>
<type 'int'>
464542
0
<type 'int'>
<type 'int'>
465000
None
<type 'int'>
<type 'NoneType'>
Traceback (most recent call last):
File "BetaScan2.py", line 618, in
main()
File "BetaScan2.py", line 607, in main
output.write(str(loc)+"\t"+str(round(B,6))+"\n") #Remove thetas
TypeError: a float is required

Also, I found that some B2 value are too high in the later part of output:
340886 0.406094
341043 -0.703357
341255 -1.737664
341366 -1.158691
341514 3997.273195
342070 9092.164337
342131 9775.129148

I would appreciate if you can make any suggestion.

Thanks for your help,

G.L

the value of Beta1*_std is miss

Hi
I run the code: python2 BetaScan.py -i output.beta.txt.gz -onewin -std -theta 0.157 -fold -o betascores-std.txt, and the result show as:
Position Beta1* Beta1*_std
315639.0 -0.761294
315663.0 0.829734
315677.0 1.213546
315681.0 -0.746679
315685.0 1.253881
315686.0 1.159774
315689.0 0.97459
315690.0 1.25847
315705.0 1.256603
315715.0 1.249427
315725.0 -0.893911
315740.0 0.600665
315953.0 1.309721
316007.0 -0.764854
316008.0 1.27593

The value of Beta1*_std is miss.Only show the value of Beta1*.

Best

Heng

divtime unit

Hi,
What is the unit of differentiation time (DivTime) between outgroup and target group? Like -DivTime 12.5 means that the differentiation time is 125,000 years ago? Thanks

input file

Hello,I meet some question when i install glactools, can you provide some script to transform VCF to SNPfreq file,thanks

compare two populations

Hello,
If I want to compare the level of balanced selection between two populations, can I use a mean comparison of beta 1 or beta 2?

Thanks

Beta score threshold

Hi,
Very impressive method to detect balancing selection. I am trying to use it on the drosophila data.
I wonder if there exists a threshold for Beta score, above which an allele can be considered under balancing selection?
Thanks!

Problems encountered in calculating B2

Hi, I'm trying to calculate B2 using Betascan. However, when I did the last step, I encountered the following error:

Traceback (most recent call last):
  File "/public/home/fan_lab/shali/BetaScan/BetaScan/BetaScan.py", line 613, in <module>
    main()
  File "/public/home/fan_lab/shali/BetaScan/BetaScan/BetaScan.py", line 548, in main
    freq = freqCount/sampleN
ZeroDivisionError: float division by zero

The commands I used were basically the same as the tutorials you provided, except that I replaced the file with my own file.

python BetaScan.py -B2 -DivTime 12.5 -i chr22.beta.gz -o chr22.betascores.txt

Any suggestions for this error? I'm looking forward to your reply. Thanks in advance!

question about input file

hi,I know that glactools can translate vcf into input file,but I can't install the glactools because the version of my computer,so I want to write a script to translate vcf into input file,but I don't know the firt colum of the input file,can you tell me what's the meaning of 'the coordinate of each variant',is that the variant ID in vcf file?
thank you very much.

awk code for generating input files

Hey Katie!

Am using your awk code (you emailed me before) for generating the input files from the vcfs and think there's a typo:

awk -F "\t|:" '(NR>1) && ($6!='0') && ($6!='1') && ($3=='2') {OFS="\t"; print$2,$6*$4,$4}' chr1.AA.APHP.out.frq

should be:

awk -F "\t|:" '(NR>1) && ($6!='0') && ($6!='1') && ($3=='2') {OFS="\t"; print$2,$8*$4,$4}' chr1.AA.APHP.out.frq

just to demonstrate:
the vcf line looks like this:
chr1 4827 chr1_4827 G A 38420.2 PASS AA=G

the vcftools freqs line looks like this:
chr1 4827 2 24 G 0.416667 A 0.583333

so A is the derived allele, and therefore the code should be column 8* sample size?

4827 14 24

not
4827 10 24

thanks!
josie

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.