Comments (9)
This would be pretty awesome. It would potentially save a lot of hard drive space.
from mne-python.
also +1 for this
On 07.10.2012, at 18:37, Eric89GXL [email protected] wrote:
This would be pretty awesome. It would potentially save a lot of hard drive space.
—
Reply to this email directly or view it on GitHub.
from mne-python.
I had the same idea a while ago. My hope was that using compression would also make reading and writing faster when using NFS volumes. I did an experiment , see
In my experiments it was very slow, maybe there is a way to make it faster :).
from mne-python.
looks like a cool sunday hack as I like to call this :)
how much slower ? do you know?
from mne-python.
Looks like bzip2 is an order of magnitude slower than gzip, so gzip might be the way to go even though it results in slightly larger file sizes, e.g.,:
http://tukaani.org/lzma/benchmarks.html
Although it might also make sense just to implement both while we're at it, and suggest that users use gzip for speed. Thoughts?
from mne-python.
Since that site used a stronger compression option for bzip (which is unfair), consider this site which didn't:
http://blog.terzza.com/linux-compression-comparison-gzip-vs-bzip2-vs-lzma-vs-zip-vs-compress/
Looks like compression is pretty similar, but decompression (where I'd imagine using it most for raw files and the like) is faster for gzip and zip...
from mne-python.
you should read this:
http://gael-varoquaux.info/blog/?p=159
from mne-python.
That is interesting. After reading it, I lean toward using gzip. It looks like zlib1 or zlib3 tended to do the best, and from what I understand those are abstractions of the same compression scheme used in the gzip format. The conclusions drawn from that site were:
...
4. Depending on the size of the data, it may be more efficient to store subsets in different files: it introduces ‘chunk’ that avoid filling in the memory too much (parameter cache_size in joblib’s code). In addition, data of a same nature tends to compress better.
5. The I/O stream or file object interfaces are abstractions that can hide the data movement and the creation of large temporaries. After experiments with GZipFile and StringIO/BytesIO I found it more efficient to fall back to passing around big buffer object, numpy arrays, or strings.
6. For reasons 4 and 5, I ended up avoiding the gzip module: raw access to the zlib with buffers gives more control. This explains a good part of the differences in read speed for pure arrays with numpy’s
...
However, for us, I imagine 4 won't be a big issue. When data are loaded with preload=True (which I imagine would satisfy most use cases), issue 5 shouldn't matter. We might have to see what happens without it...
In any case, the thing I like about gzip compared to some of the other options (using numpy's routines or compressing matrices individually and saving those) is that it maintains compatibility in the sense that, if I want, I can use a GUI to decompress the file and look at it. If we stored it in some other format, then a user wouldn't be able to do that. I would have no problem saving my data long-term in .gz format, but I'm not sure I'd be comfortable doing it with a format I'd have to go into python (or more likely, mne-python) to extract.
from mne-python.
I would investigate 2 approaches
1/ indeed .gz files that you can expand on any system
2/ or just gzip the data buffers and create a fif tag for compressed float
buffer matrices.
the issue with 2 is that mne and neuromag tools won't be able to read it.
I guess let's see if 1/ works (compress good and is fast enough on real
data)
from mne-python.
Related Issues (20)
- Reduce `Expected .fdt file format` error to warning HOT 5
- BUG: get_coef in a pipeline does not work for CSP
- Building the MNE doc fails with Python 3.12 and 3.11 (?) HOT 5
- _extract_sampling_rate function for io.read_raw_snirf incorrectly estimates sample rate due to floating point error. HOT 3
- BUG: EpochsTFR bug popping up in MNE-BIDS-Pipeline HOT 1
- Read_raw_eyelink failure: missing headpos data in file HOT 4
- GDF reader bug HOT 6
- raw.compute_psd("multitaper", ...) on a raw with "bad_xxx" annotations yields all np.nan HOT 1
- BUG: EDF export failing on latest edfio HOT 3
- mne.time_frequency.read_tfrs() did not work correctly when MNE.Raw.compute_tfr(method='multitaper', output='phase') HOT 1
- Complete uninstall documentation missing HOT 3
- set_annotations changes key:value mapping in event_id dict HOT 5
- Be more explicit about what is returned by read_epochs_eeglab and read_epochs_kit HOT 3
- Unicode support in the method add.text() of mne.viz.brain HOT 3
- Can't download the fsaverage data by mne.datasets.fetch_fsaverage(verbose=0) HOT 3
- Bad channels do not show up at all in `raw.plot()` HOT 4
- Mouse Offset in Brain Plot HOT 14
- Incorrect rendering of embedded HTML title (?) HOT 4
- Layout issues in documentation (horizontal navbar) HOT 26
- Add SOUND and SSP-SIR algorithms HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mne-python.