Comments (11)
This is one of the weird thing about bigWig files. Statistics often don't function on the actual values, but on values in a zoom level. If you use bigWigSummary
from the command line, what sort of value do you get?
from pybigwig.
I guess I can just check myself:
$ bigWigSummary http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeMapability/wgEncodeCrgMapabilityAlign75mer.bigWig chr1 89294 91629 1
0.201209
And in python:
>>> import pyBigWig
>>> bw = pyBigWig.open("http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeMapability/wgEncodeCrgMapabilityAlign75mer.bigWig")
>>> bw.stats('chr1', 89294, 91629)
[0.20120902053804418]
So I get the same values for both.
from pybigwig.
Stumbled upon the same - I think the expected behavior from a naive user is that if I ask for an array of 100 basepairs from a bigwig, the stats command should give me the mean of those 100 basepairs and not anything else dependant on zoom levels. It should at least be made very clear in the documentation.
For now I just ask for the array and wrap it in numpy.mean, no real loss of performance.
Real nice library aside from that, 10 times the performance I was getting with bx-python :)
from pybigwig.
I'll definitely update the documentation for that, since I too found this weird when I wrote libBigWig. I've actually been meaning to create an exact_stats()
function, which would do what you want. Given how libBigWig is implemented that should be fairly straight forward. I'll leave this open as a reminder to do that!
from pybigwig.
BTW, if you have a better name in mind then just let me know. I could alternatively just add an option to the current stats function to do this (whatever others find the most useful is fine by me).
from pybigwig.
I have an exact_stats
branch that adds the exact
option to the stats
command. That needs some more testing (and documentation), but should hopefully handle this. The invocation would be:
bw.stats("chr1", 1, 1000000, exact=True)
from pybigwig.
Thanks for the quick response and clarifications (I indeed expected the naive behavior) and the patch. I ran a few tests and it works fine for me. This runs significantly faster (3-4x) than wrapping with np.mean, a nice improvement when running over many regions.
from pybigwig.
New branch working for me too - 2-3 times faster than wrapping with np.mean, thanks a lot!
from pybigwig.
I've pushed a small change that should increase the precision of this a bit (I'd be surprised if there's any speed change). I've also added a section to the readme. I'll continue mulling this over for a day or two and then merge into the master branch and make a new release if there's nothing new that needs to be added.
from pybigwig.
I'm hoping to release version 0.2.6 today (assuming Travis CI starts working again), which will include this.
from pybigwig.
This is now release 0.2.6. I'll try to get this release on pypi and in conda tonight. Thanks for the suggestion.
from pybigwig.
Related Issues (20)
- RuntimeError: Invalid interval bounds
- Document performance considerations? HOT 4
- Cannot add entries of value type int, but only float HOT 2
- support for osx-arm64 HOT 2
- numpy support broken in 0.3.18? HOT 1
- Create a BedGraph file using addEntries() throws segmentation fault HOT 2
- library import error HOT 4
- pyBigWig fails to find numpy installation when installing from PyPI HOT 5
- Writing a nan value should leave a gap HOT 7
- Can't enforce numpy features when pyBigWig is used as a dependency in downstream package HOT 5
- pip installation broken HOT 4
- Installing through pip not working HOT 9
- addHeader does not support multiple calls HOT 1
- Support for python >=3.11 HOT 1
- Issue Downloading pyBigWig HOT 1
- Simple patch to resolve conflict with roundup() macro
- Stats Sum Not Working as Expected
- 'zsh: segmentation fault ' HOT 1
- Out of memory listing entries on one human chromosome on a machine with 300 GB ram and 165 GB BigBed file HOT 1
- pyBigWig.entries() should return empty array, not None when no entries are found
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pybigwig.