juliaio / hdf5.jl Goto Github PK
View Code? Open in Web Editor NEWSave and load data in the HDF5 file format from Julia
Home Page: https://juliaio.github.io/HDF5.jl
License: MIT License
Save and load data in the HDF5 file format from Julia
Home Page: https://juliaio.github.io/HDF5.jl
License: MIT License
Hi,
Why is memory mapping not usable along with compression? Is it a limitation of HDF5?
It seems to me that that is the best use case for very large files. You want both compression and lazy loading with memory mapping.
Thank you,
Cássio
This is mostly just an enquiry. Is it possible to write to ranges of on disk hdf5 arrays? I need to create a very large array inside an hdf5 file, then write specific portions to it, something like:
file["big_dataset"][:,i,j] = Array
I can do this in python/c using hyperslabs. I'm currently trying to read you code to work out how to do it in julia, but I was wondering if you had thought about this?
The basic steps needed are to select the correct dataspace, and then use this to write from the correct memory space. It looks like your wrapper exposes the necessary interfaces, but I'm still getting my head around reading julia code.
Thanks for any help,
John
Trying to save data using the excellent new @save
macro, I can't use syntax like @save $i/file.jld x
within a for loop, as I get i is not defined
.
Creating a variable with the content beforehand doesn't help either. path = "$i/file.jld"
yields the same problem when trying to @save path x
.
To workaround this I have to create path before and then use @save :($path) x
.
Be nice if this could be done automagically.
Perhaps the DataFrames test can be conditionally run if DataFrames is actually installed?
You can use Pkg.installed("DataFrames")
to check.
I'm seeing something very similar to issue #29 when trying to read a DataFrame from a JLD file. The error I get upon read(jldfile, objname)
is the following
[thebe:skim@jp/skimming]$ julia -F load.jl tw.jld names=df HDF5-DIAG: Error detected in HDF5 (1.8.12) thread 0:
#000: H5A.c line 557 in H5Aopen(): unable to load attribute info from object header for attribute: 'TypeParameters'
major: Attribute
minor: Unable to initialize object
#001: H5Oattribute.c line 537 in H5O_attr_open_by_name(): can't locate attribute: 'TypeParameters'
major: Attribute
minor: Object not found
HDF5-DIAG: Error detected in HDF5 (1.8.12) thread 0:
#000: H5A.c line 1400 in H5Aget_name(): not an attribute
major: Invalid arguments to routine
minor: Inappropriate type
ERROR: Error getting attribute name
in h5a_get_name at /home/joosep/.julia/HDF5/src/plain.jl:1825
while loading /home/joosep/singletop/stpol2/src/skim/load.jl, in expression starting on line 8
Mysteriously, this appears only when the file is written on SL6 with a specific environment which does not affect which libhdf5 gets loaded, and reading only fails on SL6. When transferred to an OSX machine, reading the file works.
I have attached a file exhibiting this behaviour here: http://hep.kbfi.ee/~joosep/hdf5_read_fail.jld.bz2
Any suggestions on where to start looking? It seems like the jld file is missing some expected some structure in this case.
Here is the message:
julia> Pkg.build("HDF5")
INFO: Building Homebrew
From https://github.com/staticfloat/homebrew
* branch kegpkg -> FETCH_HEAD
HEAD is now at d92b125 Quash warning about Fink/Macports
From https://github.com/staticfloat/homebrew-juliadeps
* branch master -> FETCH_HEAD
HEAD is now at 5b22ea4 Update bottles for glpk 4.52
INFO: Building HDF5
================================[ ERROR: HDF5 ]=================================
None of the selected providers can install dependency hdf5
at /Users/dhlin/.julia/HDF5/deps/build.jl:33
================================================================================
================================[ BUILD ERRORS ]================================
WARNING: HDF5 had build errors.
- packages with build errors remain installed in /Users/dhlin/.julia
- build a package and all its dependencies with `Pkg.build(pkg)`
- build a single package by running its `deps/build.jl` script
================================================================================
For some reason, Homebrew doesn't install hdf5.
This might be another 32 bit issue that I have run into, but I simply cannot get this library to read in some of the serialized files I have. I would have stuck with the MAT file support in the affiliated library, but I appear to have one of the unsupported file versions. (assuming I'm using the lib right).
At this point I can manage to get some segfaults which usually appear in the GC stage.
Looking at some gdb asm output, it looked like it was trying to traverse a NULL pointer, so perhaps some important data structures are getting smashed.
The closest I have gotten to identifying the dangerous call (all generated via read(file["var"]["value"])) is the following valgrind trace.
I have still not been able to decipher the responsible julia level read() as I'm still quite a newbie in the julia environment.
==10261== Syscall param read(buf) points to unaddressable byte(s)
==10261== at 0x539030E: __read_nocancel (in /lib/libpthread-2.15.so)
==10261== by 0x7DAC00A: ??? (in /usr/lib/libhdf5.so.7.0.4)
==10261== by 0x7DA3D16: H5FD_read (in /usr/lib/libhdf5.so.7.0.4)
==10261== by 0x7D90395: H5F_accum_read (in /usr/lib/libhdf5.so.7.0.4)
==10261== by 0x7D94680: H5F_block_read (in /usr/lib/libhdf5.so.7.0.4)
==10261== by 0x7D65ECF: ??? (in /usr/lib/libhdf5.so.7.0.4)
==10261== by 0x7F2E94F: H5V_opvv (in /usr/lib/libhdf5.so.7.0.4)
==10261== Address 0x7646be0 is 0 bytes after a block of size 8,208 alloc'd
==10261== at 0x402928A: memalign (vg_replace_malloc.c:694)
==10261== by 0x40292D8: posix_memalign (vg_replace_malloc.c:835)
==10261== by 0x415C73A: allocobj (in /home/mark/mytmp/julia/usr/lib/libjulia-release.so)
==10261== by 0x41515CE: jl_alloc_array_1d (in /home/mark/mytmp/julia/usr/lib/libjulia-release.so)
==10261== by 0x5BEB9AB: ???
==10261== by 0x4115948: jl_apply_generic (in /home/mark/mytmp/julia/usr/lib/libjulia-release.so)
==10261== by 0x5BE2D2C: ???
==10261== by 0x5BE0844: ???
==10261== by 0x4115948: jl_apply_generic (in /home/mark/mytmp/julia/usr/lib/libjulia-release.so)
==10261== by 0x5BF2FA4: ???
==10261== by 0x4115948: jl_apply_generic (in /home/mark/mytmp/julia/usr/lib/libjulia-release.so)
==10261== by 0x5BEA2B3: ???
Other noted issue:
h5g_get_objname_by_idx uses C_int as the type of its second argument, which works on 64 bit, but fails on 32, as its real type is defined to be unsigned long long in the headers. I have not spotted any functions with the same issue, but is is possible that they are the source of the issues.
Example file:
http://fundamental-code.com/tmp/octave.hdf5
(contains [1; 2; 3; 4] in what appears to be "/foobar/value")
reading this data results in a 0x1 array
Relevant versions:
julia - git updated about a day ago
HDF5 - git current
libhdf5 - 1.8.10-patch1
octave - 3.6.3
On case-insensitive file systems, it's a problem that hdf5.jl
and HDF5.jl
differ only by case. On OS X, I only got the former when I cloned the repository.
Is possible make a package LittleJLDReader.jl
(faster and little version of JLD only for read data) ?
The idea is use LittleJLDReader.jl
to speed up load of packages with a lot of data: https://groups.google.com/forum/?hl=es&fromgroups=#!searchin/julia-dev/load/julia-dev/_FmHG9UAls8/YaZuW70AVU4J
using HDF5
Is taken right now almost 2 seconds only for load.
The amount of time I'm trying to avoid is generate a parser and using it on file (all takes 0.5 seconds)
For an empty array, Matlab writes the array dimensions as the dataset, and then adds an attribute MATLAB_empty. Check for this and do the right thing.
julia> using HDF5, JLD
julia> f = jldopen("test.jld", "w");
julia> g = g_create(f, "group1");
julia> write(g, "x", {1})
julia> close(g)
julia> g = g_create(f, "group2");
julia> write(g, "x", {2})
HDF5-DIAG: Error detected in HDF5 (1.8.9) thread 139760863500096:
#000: ../../../src/H5G.c line 303 in H5Gcreate2(): unable to create group
major: Symbol table
minor: Unable to initialize object
#001: ../../../src/H5Gint.c line 194 in H5G__create_named(): unable to create and link to group
major: Symbol table
minor: Unable to initialize object
#002: ../../../src/H5L.c line 1638 in H5L_link_object(): unable to create new link to object
major: Links
minor: Unable to initialize object
#003: ../../../src/H5L.c line 1882 in H5L_create_real(): can't insert link
major: Symbol table
minor: Unable to insert object
#004: ../../../src/H5Gtraverse.c line 861 in H5G_traverse(): internal path traversal failed
major: Symbol table
minor: Object not found
#005: ../../../src/H5Gtraverse.c line 755 in H5G_traverse_real(): component not found
major: Symbol table
minor: Object not found
ERROR: Error creating group ///_refs/group2/x
in h5g_create at /home/simon/.julia/HDF5/src/plain.jl:1715
in write at /home/simon/.julia/HDF5/src/jld.jl:466
in write at /home/simon/.julia/HDF5/src/jld.jl:507
It appears that recent versions of julia fail to load plain.jl properly due to a type difference in a default constructor.
Minimal patch required to get HDF5 to load is below:
http://fundamental-code.com/tmp/0001-Fixes-no-method-Hvl_t-Uint64-Ptr-None-error.patch
Would be nice to have Matlab-like h5read
and h5write
commands (in module HDF5), to read/write a single numeric-array dataset in an HDF5 file, analgous to dlmread
/dlmwrite
, without having to go to the trouble of h5open
etcetera.
When I (dumbly) tried to save an unbound array:
using HDF5
using JLD
fid = jldopen("/tmp/test.jld","w")
@write fid rand(0:1,50000)
close(fid)
It worked fine (no errors reported and the file size indicates that the numbers are in there. But I could not manage to retrieve it again, and couldn't even see it's existence:
fidr = jldopen("/tmp/test.jld","r+")
dump(fidr)
@read fidr rand(0:1,50000)
close(fidr)
results in:
julia> include("/home/bana/GSP/code/julia/h5.jl")
JldFile
id: Int32 16777216
filename: ASCIIString "/tmp/test.jld"
version: ASCIIString "0.0.0"
toclose: Bool true
writeheader: Bool true
ERROR: syntax: malformed function argument (: 0 1)
in include_from_node1 at loading.jl:76
at /home/bana/GSP/code/julia/h5.jl:10
I realize this is more user error, but perhaps the @write
macro should refuse to write unless it is a valid string? Or maybe print a warning?
Thanks again!
(Repost from julia-users)
Hi all. I'm running Julia 0.2.0 on Mac OS X 10.9, and I'm having this issue building HDF5:
julia> Pkg.build("HDF5")
INFO: Building Homebrew
HEAD is now at c588ffb Remove git rebasing code that slipped through
HEAD is now at b6300f3 Update nettle bottle
INFO: Building HDF5
============================================================[ ERROR: HDF5 ]=============================================================
Provider PackageManager failed to satisfy dependency libhdf5
at /Users/john/.julia/HDF5/deps/build.jl:30
============================================================[ BUILD ERRORS ]============================================================
WARNING: HDF5 had build errors.
Pkg.build(pkg)
deps/build.jl
scriptThe offending dependency is nowhere to be found:
julia> dlopen("libhdf5")
ERROR: could not load module libhdf5: dlopen(libhdf5.dylib, 1): image not found
in dlopen at c.jl:29
Any help appreciated!
When trying to load a file with @load
I'm getting:
WARNING: contains(collection, item) is deprecated, use in(item, collection) instead
in contains at reduce.jl:238
with julia master/1d8228c
and HDF5 5b468093
.
If I download via browser and place the file manually than it works on the second time.
Joaquim
julia> Pkg.add("HDF5")
INFO: Cloning cache of HDF5 from git://github.com/timholy/HDF5.jl.git
INFO: Installing HDF5 v0.2.14
...
Connecting to ia601003.us.archive.org|207.241.227.33|:443... connected.
ERROR: Certificate verification error for ia601003.us.archive.org: self signed certificate in certificate chain
To connect to ia601003.us.archive.org insecurely, use `--no-check-certificate'.
Unable to establish SSL connection.
Currently, attempting to write an empty string fails because it's illegal to call H5Tset_size
with size == 0
. AFAICT, we can store an empty string as a null-terminated string consisting only of a null character or using the null dataspace. If we use the null dataspace, then we serialize ""
the same way we presently serialize ASCIIString[]
, although we could use the julia type
attribute to distinguish the two in JLD. Thoughts?
I'm trying to use the HDF5 package to write an array of DataFrames, but I'm having some problems.
Trying to run the jld_dataframe.jl test gives me the following error on OS X 10.8.4 with julia master/9c392b7*
, HDF5 f27612c88
and DataFrames 859f3272
.
Cassios-iMac:test cassio$ julia jld_dataframe.jl
HDF5-DIAG: Error detected in HDF5 (1.8.11) thread 0:
#000: H5A.c line 557 in H5Aopen(): unable to load attribute info from object header for attribute: 'TypeParameters'
major: Attribute
minor: Unable to initialize object
#001: H5Oattribute.c line 537 in H5O_attr_open_by_name(): can't locate attribute: 'TypeParameters'
major: Attribute
minor: Object not found
HDF5-DIAG: Error detected in HDF5 (1.8.11) thread 0:
#000: H5A.c line 1400 in H5Aget_name(): not an attribute
major: Invalid arguments to routine
minor: Inappropriate type
ERROR: Error getting attribute name
in h5a_get_name at /Users/cassio/.julia/HDF5/src/plain.jl:1743
in h5a_get_name at /Users/cassio/.julia/HDF5/src/plain.jl:1793
in h5a_open at /Users/cassio/.julia/HDF5/src/plain.jl:1743
in a_read at /Users/cassio/.julia/HDF5/src/plain.jl:948
in read at /Users/cassio/.julia/HDF5/src/jld.jl:320
in read at /Users/cassio/.julia/HDF5/src/jld.jl:173
in read at /Users/cassio/.julia/HDF5/src/plain.jl:960
in include_from_node1 at loading.jl:92
in process_options at client.jl:274
in _start at client.jl:352
at /Users/cassio/.julia/HDF5/test/jld_dataframe.jl:16
On Mac OS 10.7, when I run Pkg.build("HDF5") I get:
julia> Pkg.build("HDF5")
INFO: Building HDF5
==> Installing hdf5 dependency: szip
==> Downloading http://archive.org/download/julialang/bottles/szip-2.1.lion.bottle.tar.gz
==> Pouring szip-2.1.lion.bottle.tar.gz
? /Users/bjohnson/.julia/Homebrew/deps/usr/Cellar/szip/2.1: 9 files, 136K
==> Installing hdf5
==> Downloading http://archive.org/download/julialang/bottles/hdf5-1.8.11.lion.bottle.tar.gz
==> Pouring hdf5-1.8.11.lion.bottle.tar.gz
? /Users/bjohnson/.julia/Homebrew/deps/usr/Cellar/hdf5/1.8.11: 119 files, 9.9M
================[ ERROR: HDF5 ]===================
Provider PackageManager failed to satisfy dependency hdf5
at /Users/bjohnson/.julia/HDF5/deps/build.jl:33
==================================================
I suppose there are good reasons for restricting the variable names to ASCIIString
(possibly the HDF5 format specs), but I figured I'd ask anyway:
julia> jldopen("tst.jld", "w") do f
write(f, "a", randn(10,10)); # works
end
julia> jldopen("tst.jld", "w") do f
write(f, "ä", randn(10,10)); # oops
end
ERROR: no method write(JldFile, UTF8String, Array{Float64,2})
Any chance this could work by simply relaxing the signature to ByteString
?
using HDF5
function write_macro(max_dim)
x = rand(max_dim)
fid = h5open("test.h5","w")
@write fid x
close(fid)
end
function write_simple(max_dim)
fid = h5open("test.h5","w")
d = d_create(fid, "b", datatype(Float64), dataspace((max_dim,)))
d[1:max_dim]=rand(max_dim)
close(fid)
end
for j = 1:10
print("write macro $j: ")
@time(write_macro(int(10^6)))
end
for j = 1:10
print("write simple $j: ")
@time(write_simple(int(10^6)))
end
gives output
write macro 1: elapsed time: 0.32677965 seconds (15224664 bytes allocated)
write macro 2: elapsed time: 0.013648108 seconds (8044228 bytes allocated)
write macro 3: elapsed time: 0.034962227 seconds (8006536 bytes allocated)
write macro 4: elapsed time: 0.03506796 seconds (8006536 bytes allocated)
write macro 5: elapsed time: 0.048584303 seconds (8006536 bytes allocated)
write macro 6: elapsed time: 0.044131843 seconds (8006536 bytes allocated)
write macro 7: elapsed time: 0.03688556 seconds (8006536 bytes allocated)
write macro 8: elapsed time: 0.035140289 seconds (8006536 bytes allocated)
write macro 9: elapsed time: 0.06822681 seconds (8006536 bytes allocated)
write macro 10: elapsed time: 0.0788429 seconds (8006536 bytes allocated)
write simple 1: elapsed time: 0.092585827 seconds (9083076 bytes allocated)
write simple 2: elapsed time: 0.041490742 seconds (8012336 bytes allocated)
write simple 3: elapsed time: 0.038722462 seconds (8012336 bytes allocated)
write simple 4: elapsed time: 0.037058108 seconds (8012336 bytes allocated)
write simple 5: HDF5-DIAG: Error detected in HDF5 (1.8.11) thread 0:
#000: H5D.c line 437 in H5Dget_space(): not a dataset
major: Invalid arguments to routine
minor: Inappropriate type
ERROR: Error getting dataspace
in h5d_get_space at /Users/oneilg/.julia/HDF5/src/plain.jl:1758
in hyperslab at /Users/oneilg/.julia/HDF5/src/plain.jl:1388
in setindex! at /Users/oneilg/.julia/HDF5/src/plain.jl:1373
in write_simple at /Users/oneilg/github/mass_prep/hdf5_crash.jl:12
in anonymous at no file:44
in include_from_node1 at loading.jl:92
at /Users/oneilg/github/mass_prep/hdf5_crash.jl:24
Why does write_simple fail after working 4 times? Also if I remove write_macro
and it's loop from the file, it often fails after working only two times.
As long as there is even a single local function defined, @save
fails
julia> using HDF5, JLD
julia> x = 10
10
julia> @save "test.jld"
julia> y(m) = m+1
y (generic function with 1 method)
julia> @save "test.jld"
ERROR: This is the write function for CompositeKind, but the input is of type Ptr{None}
in write_composite at /home/marcusps/.julia/HDF5/src/jld.jl:613
in write at /home/marcusps/.julia/HDF5/src/jld.jl:608
in write at /home/marcusps/.julia/HDF5/src/jld.jl:546
in write_composite at /home/marcusps/.julia/HDF5/src/jld.jl:651
in anonymous at no file
Not really sure what's going on here but using recent Julia (today) and Pkg.update() recent HDF5:
julia> include("/home/bana/.julia/HDF5/test/jld.jl")
WARNING: randi(n,...) is deprecated, use rand(1:n,...) instead.
WARNING: strcat is deprecated, use string instead.
ERROR: Error reading x
in include_from_node1 at loading.jl:76
at /home/bana/.julia/HDF5/test/jld.jl:38
And trying a minimal testcase:
using HDF5
using JLD
fid = jldopen("/tmp/test.jld","w")
A = rand(0:1, 50000)
@write fid A
close(fid)
fidr = jldopen("/tmp/test.jld","r")
dump(fidr)
dump(fidr["A"])
@read fidr A
close(fidr)
results in:
julia> include("/home/bana/GSP/code/julia/h5.jl")
JldFile
id: Int32 16777216
filename: ASCIIString "/tmp/test.jld"
version: ASCIIString "0.0.0"
toclose: Bool true
writeheader: Bool true
HDF5Dataset{JldFile}
id: Int32 83886080
file: JldFile
id: Int32 16777216
filename: ASCIIString "/tmp/test.jld"
version: ASCIIString "0.0.0"
toclose: Bool true
writeheader: Bool true
toclose: Bool true
ERROR: no method ref(Expr,Int64)
in julia_type at /home/bana/.julia/HDF5/src/jld.jl:673
in read at /home/bana/.julia/HDF5/src/jld.jl:165
in read at /home/bana/.julia/HDF5/src/plain.jl:858
in include_from_node1 at loading.jl:76
at /home/bana/GSP/code/julia/h5.jl:12
using HDF5
fid = h5open("test.h5", "w")
d = d_create(fid, "foo", datatype(Float64), ((10,20),(100,200)), "chunk", (1,1))
d[1,1]=4 # Segmentation fault: 11
This should fail more gracefully.
I get the following output when when running test/jld_dataframe.jl
.
I am on the latest HDF5.jl (0d1d50d) and 02e71d9-Linux-x86_64 (2013-06-21 12:47:45)
julia> write(file, "df2", df2)
ERROR: access to undefined reference
in write_composite at /home/chris/.julia/HDF5/src/jld.jl:584
in write at /home/chris/.julia/HDF5/src/jld.jl:545
in write at /home/chris/.julia/HDF5/src/jld.jl:487
in write_composite at /home/chris/.julia/HDF5/src/jld.jl:586
in write at /home/chris/.julia/HDF5/src/jld.jl:545
in write at /home/chris/.julia/HDF5/src/jld.jl:487
in write at /home/chris/.julia/HDF5/src/jld.jl:506
in write at /home/chris/.julia/HDF5/src/jld.jl:487
in write_composite at /home/chris/.julia/HDF5/src/jld.jl:586
in write at /home/chris/.julia/HDF5/src/jld.jl:514
I believe this will reach all who have "starred" this repository...
In preparation for packaging, the name of this repository is changing to HDF5.jl. I believe you can edit your .git/config file (the url line), and you will start tracking it by its new name.
Trying to write nothing
to a HDF5 dataset that does not exist yet fails with
dset not defined
while loading In[5], in expression starting on line 2
in write at /Users/rene/.julia/HDF5/src/jld.jl:478
in write at /Users/rene/.julia/HDF5/src/jld.jl:481
Minimal test case:
using HDF5, JLD
jldopen("/tmp/test","w") do a
write(a, "/a/b/c", nothing)
end
The problem is that write(parent::Union(JldFile, JldGroup), name::ASCIIString, n::Nothing)
in jld.plain
tries to call HDF5Dataset
directly, and not through d_create
as it is being done for other data types.
d_create
would ensure that the entire path /a/b/c
gets created first.
I tried to come up with a fix for this but got lost...
Just to illustrate, the following code works:
using HDF5, JLD
jldopen("/tmp/test","w") do a
write(a, "/a/b/dummy", 1)
write(a, "/a/b/c", nothing)
end
Hi,
would it be ok to make write implicitly delete/overwrite the dataset when it is already present? The following snippet yields a name already exists
error when commenting out the o_delete
:
h5open("test.jld", "w") do file
write(file, "a", 1)
o_delete(file, "a")
write(file, "a", 2)
dump(file)
end
This would make the behavior more closely mimic the file-system like character of HDF5 as well as make file["a"] = 2
behave more like a Dict
, which would be great!
I can prepare a PR for this, just wanted to check first whether that would be ok.
Just putting this out here while I attempt a fix.
The following code
using DataFrames, HDF5, JLD
fi = jldopen("bad_hdf5.jld")
df = read(fi, "df")
println(size(df))
fails with
ERROR: no method start(Index)
in read at /Users/joosep/.julia/HDF5/src/jld.jl:347
in read at /Users/joosep/.julia/HDF5/src/jld.jl:207
in read at /Users/joosep/.julia/HDF5/src/jld.jl:196
in include_from_node1 at loading.jl:120
while loading /Users/joosep/Dropbox/kbfi/top/stpol/src/analysis/dftest.jl, in expression starting on line 3
on DataFrames.jl 3b269760093542d972436f008cd07e742f9556f2, HDF5.jl b83cea4.
It does not seem consistent, i.e. I can open some older files. Most likely related to the efforts at pruning DataFrames.
Test files (sorry, ~150 MB each)
http://hep.kbfi.ee/~joosep/good_hdf5.jld => succeeds
http://hep.kbfi.ee/~joosep/bad_hdf5.jld => fails
At the moment, we can write, but not read immutables from JLD. While it would be pretty trivial to copy the code for creating new immutables from serialize.jl
, I wonder if we can use compound types instead. It seems like there would be massive performance and disk space advantages to storing arrays of immutables contiguously on disk as opposed to using HDF5 references for each field, even if the on-disk representation isn't necessarily the same as the in-memory representation because of padding.
The fid
in the @write
should be file
I think.
Hi,
I have text files of generally 100k that have data such as:
NA NA NA
NA NA NA
-4.11554869953487 NA NA
-4.49517142619306 NA NA
-4.62434879575859 NA NA
-4.85365577849306 NA NA
-4.83319566688069 NA NA
-4.62021998272287 NA NA
-4.38650861894108 NA NA
-4.33796653562191 NA NA
...
using the code
using HDF5
using JLD
x = readdlm("1.dat", ' ')
file = jldopen("teste.jld", "w")
@write file x
close(file)
to write a 117K file of that form to teste.jld
generates a 13Mb file... Even if compression is not being used, I don't understand the size difference. I have to process 270 files of this kind, which ended up generating a 3.5Gb file.
Am I doing something wrong? If it helps, I can e-mail a sample file for testing.
I'm on OS X using julia master/9c392b7*, HDF5 f27612 and hdf5 installed from homebrew:
Cassios-iMac:~ cassio$ brew info hdf5
hdf5: stable 1.8.11
http://www.hdfgroup.org/HDF5
/usr/local/Cellar/hdf5/1.8.11 (119 files, 9.8M) *
Built from source
From: https://github.com/mxcl/homebrew/commits/master/Library/Formula/hdf5.rb
==> Dependencies
Required: szip
==> Options
--enable-cxx
Compile C++ bindings
--enable-fortran
Compile Fortran bindings
--enable-fortran2003
Compile Fortran 2003 bindings. Requires enable-fortran.
--enable-parallel
Compile parallel bindings
--enable-threadsafe
Trade performance and C++ or Fortran support for thread safety
--universal
Build a universal binary
Thanks,
Cássio
Just starting to play with HDF5.jl
, looks fantastic!
I wanted to get a feel for how the .jld
files look internally, and quickly ran into this:
a = {1=>"a", 2=>"b"}
@save "/tmp/a.jld" a
h5open("/tmp/a.jld", "r") do fid
dump(fid)
end
resulting in
HDF5File len 3
_refs: HDF5Group len 1
a: HDF5Group len 4
1: HDF5Dataset (2,) :
Dataset indexing (hyperslab) is available only for bits types
while loading In[38], in expression starting on line 3
in getindex at /Users/rene/.julia/HDF5/src/plain.jl:1387
in dump at /Users/rene/.julia/HDF5/src/plain.jl:880
in dump at /Users/rene/.julia/HDF5/src/plain.jl:893 (repeats 3 times)
in dump at show.jl:536
in anonymous at show.jl:542
in dump at show.jl:542
in anonymous at no file:4
in h5open at /Users/rene/.julia/HDF5/src/plain.jl:504
hdf5 is now in brew/science,
so the installation instructions should be
brew tap homebrew/science
brew install hdf5
using HDF5
fid = h5open("test.h5", "w")
b = d_create(fid, "b", Int, ((1000,),(-1,)), "chunk", (100,)) #-1 is equivalent to typemax(Hsize) as far as I can tell
b[:] # ERROR: no method endof(HDF5Dataset{PlainHDF5File},)
Recently (maybe because of the merge of the static compile branch in julia), many functions are outputting this warning:
"warning: literal address used in ccall for (null); code cannot be statically compiled"
Calling @save outputs this warning ~30 times. Any way we can fix and/or suppress this warning to avoid the warning spam?
It seems like the search routine is not permissive enough...
In [1]:
using HDF5
WARNING:
backtraces on your platform are often misleading or partially incorrect
Library not found. See the README for installation instructions.
at C:\Users\inorton\AppData\Roaming\Julia\packages\HDF5\src\plain.jl:39
at C:\Users\inorton\AppData\Roaming\Julia\packages\HDF5\src\HDF5.jl:1
at In[1]:1
in findlibhdf5 at C:\Users\inorton\AppData\Roaming\Julia\packages\HDF5\src\plain.jl:37
In [2]:
dlopen("hdf5.dll")
Out[2]:
Ptr{Void} @0x000000001e04b010
In [ ]:
julia> A = "uniçº∂e"
"uniçº∂e"
julia> file = jldopen("mydata.jld", "w")
Julia data file version 0.0.1: mydata.jld
julia> write(file, "A", A)
julia> close(file)
julia> file = jldopen("mydata.jld", "r")
Julia data file version 0.0.1: mydata.jld
julia> c = read(file, "A")
ERROR: invalid UTF-8 sequence
in convert at utf8.jl:110
in read at /Users/westley/.julia/HDF5/src/plain.jl:1007
in read at /Users/westley/.julia/HDF5/src/jld.jl:255
in read at /Users/westley/.julia/HDF5/src/jld.jl:176
in read at /Users/westley/.julia/HDF5/src/plain.jl:952
I have a dataset with a compound type and inside this compound type an array. Is there a way to read this compound type into Julia? It seems that Array types are not supported yet, but perhaps it's possible to patch this scenario with the low-level routines?
x=[utf8("Jon"), utf8("Tim")]
Pkg.update()
using HDF5
using JLD
@save "test" x
@load "test"
Error reading dataset /x
at In[4]:1
in h5d_read at /Users/malmaud/.julia/HDF5/src/plain.jl:1751
in read at /Users/malmaud/.julia/HDF5/src/plain.jl:1031
in read at /Users/malmaud/.julia/HDF5/src/jld.jl:290
in read at /Users/malmaud/.julia/HDF5/src/jld.jl:291
in read at /Users/malmaud/.julia/HDF5/src/jld.jl:205
in read at /Users/malmaud/.julia/HDF5/src/jld.jl:194
in anonymous at no file
HDF5-DIAG: Error detected in HDF5 (1.8.11) thread 0:
#000: H5Dio.c line 182 in H5Dread(): can't read data
major: Dataset
minor: Read failed
#001: H5Dio.c line 438 in H5D__read(): unable to set up type info
major: Dataset
minor: Unable to initialize object
#002: H5Dio.c line 939 in H5D__typeinfo_init(): unable to convert between src and dest datatype
major: Dataset
minor: Feature is unsupported
#003: H5T.c line 4525 in H5T_path_find(): no appropriate function for conversion path
major: Datatype
minor: Unable to initialize object
versioninfo()
Julia Version 0.2.0+22
Commit 30fb816* (2013-11-18 10:18 UTC)
Platform Info:
System: Darwin (x86_64-apple-darwin13.0.0)
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY)
LAPACK: libopenblas
LIBM: libopenlibm
malmaud@malbook ~/tmp> brew install hdf5
Warning: hdf5-1.8.11 already installed
I noticed the following bug (?). Essentially, variables loaded with the load macro can not be used by worker processes. By chance, I noticed that this could be fixed by copying to a variable of a different name.
$ cat bug.jl
using HDF5,JLD
function setup()
x = 10
@save "bug.jld" x
end
function doesnt_work()
@load "bug.jld" x
@spawn println(x)
end
function works()
@load "bug.jld" x
y = copy(x)
@spawn println(y)
end
julia> include("bug.jl");
julia> setup();
julia> doesnt_work();
exception on 2: ERROR: x not defined
in anonymous at multi.jl:1278
in anonymous at multi.jl:827
in run_work_thunk at multi.jl:575
in run_work_thunk at multi.jl:584
in anonymous at task.jl:88
julia> works();
From worker 2: 10
The following code leads to a segfault when trying to slice a string array
using HDF5
# create file
f = h5open("testjl.hdf5", "w")
f["MyStrings"] = ["1231","asdasdad","ffsfd"]
f["MyNumbers"] = [1,2,3]
close(f)
# try to read back
f = h5open("testjl.hdf5", "r")
# seems we can read whole data set
println(read(f["MyStrings"]))
println(read(f["MyNumbers"]))
# but slicing
# works for numbers
println(f["MyNumbers"][1:2])
# but not for strings - segfault
println(f["MyStrings"][1:2])
close(f)
For compatibility with https://github.com/simonster/MAT.jl
Currently, the following yields a no setindex!(JldFile, Int64, ASCIIString)
error for jldopen
, but it works for h5open
:
using HDF5, JLD
jldopen("test.jld", "w") do file
file["a"] = 1
dump(file)
end
Is there a design consideration I am overlooking or can I work on a PR to make this work for JldFile as well? Thanks!
Context at julia-user, including files to reproduce.
To reproduce, run:
include("segf.jl")
tt=DDCM.segf();
tt.tpi[1][1,1]
With include("old_types.jl")
in segf.jl
, this returns 0.0
.
With include("new_types.jl")
in segf.jl
, it throws sometimes a segfault, sometimes:
ERROR: no method getindex(SYSTEM: show(lasterr) caused an error
ERROR: no method Enumerate{I}(
in showerror at repl.jl:111
in showerror at repl.jl:66
in anonymous at client.jl:93
in with_output_color at util.jl:444
in display_error at client.jl:91SYSTEM: show(lasterr) caused an error
WARNING: it is likely that something important is broken, and Julia will not be able to continue normally
Building HDF5..jl on windows machines fails currently since the binaries have disappeared from archive.org. Is there an alternative location for the binaries? Is there a backup? I believe @ihnorton has been replacing the downloads on to s3 if we can find a backup.
Just a quick idea...
We can currently organize things into groups within a .jld file, but only using the lower-level HDF-like interface.
I think it would be great to extend the @load
and @save
macros to allow for .jld files to be organized into groups. I would think that the interface could look something like this:
@load "file_name" "group_name" # loads all items in that group
@load "file_name" "group_name" a b c # loads a b and c from the group
@save "file_name" "group_name" # save all items from current module in group
@save "file_name" "group_name" a b c # loads a b and c from the group
Sorry, but the comments below are a bit stream of consciousness
I just thought maybe it would make more sense to have @load_group
and @save_group
macros. However, multiple dispatch could make the above ideas work because we could dispatch on two strings followed by optional symbols.
Another thought that maybe when we do this automatic creation of groups we would have /group_name/_refs
and /group_name/_types
also.
The reason I want this feature is that I really like how easy it is to @load
and @save
many variables into/out of modules with a quick one liner. I have projects where I have different values of underlying parameters that are used to generate different datasets. I am currently getting this easy functionality by creating different jld files for each parameterization, but I would prefer to do this in a single file with groups. Maybe I'm crazy... feedback on the idea as well as implementation would be great.
With the latest Julia HEAD, doing show(x)
and show(y)
in test/jld_dataframe.jl
, the rows and columns of the DataFrames read from the file are transposed wrt. the one written:
2x2 DataFrame:
x1 x2
[1,] "x1" [2,3,4,5,6]
[2,] "x2" [3.141592653589793,6.283185307179586,9.42477796076938,12.566370614359172,15.707963267948966]
2x2 DataFrame:
x1 x2
[1,] "a" [1,2,3,4,5]
[2,] "b" [3.141592653589793,6.283185307179586,9.42477796076938,12.566370614359172,15.707963267948966]
Thus, one does not read back with read
what is written with write
.
I'm not sure if I've stumbled on an HDF5 bug or a julia one. This is similar to JuliaLang/julia#3884
I'm on OS X 10.8.4.
Consider this write code:
using HDF5
using JLD
x = Dict{Int64,Array{Float64}}()
for i in 1:10
x[i] = rand(1000000)
end
file = jldopen("x.jld", "w")
@write file x
close(file)
and the read code:
module Test
using HDF5
using JLD
for i in 1:10
file = jldopen("x.jld", "r") ; x = read(file, "x") ; close(file)
end
end
Evaluating the read code on the REPL with include("read.jl")
, I can consistently get:
DF5-DIAG: Error detected in HDF5 (1.8.11) thread 0:
#000: H5Dio.c line 140 in H5Dread(): not a dataset
major: Invalid arguments to routine
minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.8.11) thread 0:
#000: H5I.c line 2271 in H5Iget_name(): can't retrieve object location
major: Object atom
minor: Can't get value
#001: H5Gloc.c line 224 in H5G_loc(): invalid data ID
major: Invalid arguments to routine
minor: Bad value
ERROR: Error getting object name
in h5i_get_name at /Users/cassio/.julia/HDF5/src/plain.jl:1743
in h5i_get_name at /Users/cassio/.julia/HDF5/src/plain.jl:1805
in h5d_read at /Users/cassio/.julia/HDF5/src/plain.jl:1743
in h5d_read at /Users/cassio/.julia/HDF5/src/plain.jl:1526
in read at /Users/cassio/.julia/HDF5/src/plain.jl:994
in read at /Users/cassio/.julia/HDF5/src/jld.jl:251
in read at /Users/cassio/.julia/HDF5/src/jld.jl:254
in read at /Users/cassio/.julia/HDF5/src/jld.jl:173
in getrefs at /Users/cassio/.julia/HDF5/src/jld.jl:355
in read at /Users/cassio/.julia/HDF5/src/jld.jl:292
in read at /Users/cassio/.julia/HDF5/src/jld.jl:173
in getrefs at /Users/cassio/.julia/HDF5/src/jld.jl:355
in read at /Users/cassio/.julia/HDF5/src/jld.jl:306
in read at /Users/cassio/.julia/HDF5/src/jld.jl:173
in read at /Users/cassio/.julia/HDF5/src/plain.jl:960
in anonymous at no file:7
in include_from_node1 at loading.jl:92
at /Users/cassio/Desktop/test/read.jl:6
If I add a call to gc()
right before close()
the problem disappears...
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.