Coder Social home page Coder Social logo

unidata / netcdf-c Goto Github PK

View Code? Open in Web Editor NEW
486.0 44.0 259.0 64.43 MB

Official GitHub repository for netCDF-C libraries and utilities.

License: BSD 3-Clause "New" or "Revised" License

CMake 1.46% C 89.30% Shell 1.91% C++ 1.66% HTML 0.51% Awk 0.01% Yacc 0.48% Lex 0.20% Makefile 0.56% M4 3.37% Roff 0.53% Batchfile 0.01%
netcdf unidata unidata-netcdf c hacktoberfest

netcdf-c's Introduction

Unidata NetCDF

latest packaged version(s)

About

The Unidata network Common Data Form (netCDF) is an interface for scientific data access and a freely-distributed software library that provides an implementation of the interface. The netCDF library also defines a machine-independent format for representing scientific data. Together, the interface, library, and format support the creation, access, and sharing of scientific data. The current netCDF software provides C interfaces for applications and data. Separate software distributions available from Unidata provide Java, Fortran, Python, and C++ interfaces. They have been tested on various common platforms.

Properties

NetCDF files are self-describing, network-transparent, directly accessible, and extendible. Self-describing means that a netCDF file includes information about the data it contains. Network-transparent means that a netCDF file is represented in a form that can be accessed by computers with different ways of storing integers, characters, and floating-point numbers. Direct-access means that a small subset of a large dataset may be accessed efficiently, without first reading through all the preceding data. Extendible means that data can be appended to a netCDF dataset without copying it or redefining its structure.

Use

NetCDF is useful for supporting access to diverse kinds of scientific data in heterogeneous networking environments and for writing application software that does not depend on application-specific formats. For information about a variety of analysis and display packages that have been developed to analyze and display data in netCDF form, see

More information

For more information about netCDF, see

Latest releases

You can obtain a copy of the latest released version of netCDF software for various languages:

Copyright

Copyright and licensing information can be found here, as well as in the COPYRIGHT file accompanying the software

Installation

To install the netCDF-C software, please see the file INSTALL in the netCDF-C distribution, or the (usually more up-to-date) document:

Documentation

A language-independent User's Guide for netCDF, and some other language-specific user-level documents are available from:

A mailing list, [email protected], exists for discussion of the netCDF interface and announcements about netCDF bugs, fixes, and enhancements. For information about how to subscribe, see the URL

Feedback

We appreciate feedback from users of this package. Please send comments, suggestions, and bug reports to [email protected].

netcdf-c's People

Contributors

brtnfld avatar ckhroulev avatar czender avatar d70-t avatar dave-allured avatar dennisheimbigner avatar dwesl avatar e4t avatar edhartnett avatar edwardhartnett avatar gdsjaar avatar gsjaardema avatar k20shores avatar lesserwhirls avatar lprox2020 avatar magnusumet avatar mathstuf avatar mjwoods avatar nehaljwani avatar nschloe avatar opoplawski avatar oxelson avatar qkoziol avatar rouault avatar seanm avatar sebastic avatar thehesiod avatar wardf avatar wkliao avatar zedthree avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

netcdf-c's Issues

nc_test4/tst_large2 fails

The tst_large2 test fails on OSX and Linux (Ubuntu 13.10 64-bit) when testing netcdf-4 files. I've confirmed this test worked in NetCDF v4.3.1.1. I'm currently working through git bisect to narrow down where the bug was introduced. This may take a bit, since this test takes a little while to run. So, the 4.3.2-rc1 release will be delayed.

nc_test4_tst_parallel3 failing

If configured with

-DENABLE_PARALLEL:BOOL=ON

the parallel test nc_test4_tst_parallel3 will fail with

ctest -R nc_test4_tst_parallel3 -VV
UpdateCTestConfiguration  from :/home/nschloe/software/netcdf/build/DartConfiguration.tcl
Parse Config file:/home/nschloe/software/netcdf/build/DartConfiguration.tcl
 Add coverage exclude regular expressions.
UpdateCTestConfiguration  from :/home/nschloe/software/netcdf/build/DartConfiguration.tcl
Parse Config file:/home/nschloe/software/netcdf/build/DartConfiguration.tcl
Test project /home/nschloe/software/netcdf/build
Constructing a list of tests
Done constructing a list of tests
Checking test dependency graph...
Checking test dependency graph end
test 106
    Start 106: nc_test4_tst_parallel3

106: Test command: /home/nschloe/software/netcdf/build/nc_test4/nc_test4_tst_parallel3
106: Test timeout computed to be: 1500
106: 
106: *** Testing more advanced parallel access.
106: *** Testing parallel IO for raw-data with MPI-IO (driver)...ok.
106: *** Testing parallel IO for meta-data with MPI-IO (driver)...ok.
106: *** Testing parallel IO for different hyperslab selections with MPI-IO (driver)...ok.
106: Sorry! Unexpected result, /home/nschloe/software/netcdf/dev/nc_test4/tst_parallel3.c, line: 740
106: Sorry! Unexpected result, /home/nschloe/software/netcdf/dev/nc_test4/tst_parallel3.c, line: 128
106: *** Testing parallel IO for extending variables with MPI-IO (driver)...
1/1 Test #106: nc_test4_tst_parallel3 ...........***Failed    0.27 sec

0% tests passed, 1 tests failed out of 1

Total Test time (real) =   0.27 sec

The following tests FAILED:
    106 - nc_test4_tst_parallel3 (Failed)
Errors while running CTest

This is on Ubuntu 13.10, with HDF5-openmpi.

gcc 4.6.3-1ubuntu5 dies when compiling tst_camrun.c

Showing up on Ubuntu 12.04 LTS, both 32 & 64 bit. Not sure what to do about it but will be downgrading gcc to see if this might be the result of a recent update.

Not going to hold the 4.3.2-rc1 release on this, since it would appear the issue is with the compiler, not our code.

Failure using unlimited dimension

The code in line 704 in the file nc4hdf.c does not work properly.

The extend size array has hsize_t as basic datatype, which stands for an unsigned number. The algorithm uses MPI_INT as basic datatype for the Allreduce call, which stands for a signed number. This does not work, when an unsigned number is interpreted as a negative number.

I believe using MPI_UNSIGNED instead of MPI_INT should fix it.

BUILD_DOCS, ENABLE_DOXYGEN

There are two options (maybe more) that handle the generation of documentation, namely BUILD_DOCS and ENABLE_DOXYGEN.

Which parts of the documentation are build if those are set to ON and OFF, respectively? The option BUILD_DOCS should then probably be renamed to something more descriptive, or the option ENABLE_DOXYGEN should be removed and at the same time doxygen be made a requirement if BUILD_DOCS is true.

CHECK_LIBRARY_EXISTS Macro invoked with incorrect arguments

With

cmake \                                                                         
  -D CMAKE_C_COMPILER=mpicc \                                                   
  -D ENABLE_PARALLEL:BOOL=ON \                                                  
  ../dev/

I'm getting the configuration error

CMake Error at CMakeLists.txt:734 (CHECK_LIBRARY_EXISTS):
  CHECK_LIBRARY_EXISTS Macro invoked with incorrect arguments for macro
  named: CHECK_LIBRARY_EXISTS

nc_inq_nvars returns incorrect result for groups in 4.3.1rc5

Noticed this when running the test suite for netcdf4-python after updating from github master today.

The following test program illustrates the problem:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <netcdf.h>
#define FILE_NAME "test.nc"
#define GRP_NAME "phony_group"

void
check_err(const int stat, const int line, const char *file) {
    if (stat != NC_NOERR) {
        (void)fprintf(stderr,"line %d of %s: %s\n", line, file, nc_strerror(stat));
        fflush(stderr);
        exit(1);
    }
}

int
main()
{

   int ncid, varid, grpid, numvars, retval;

   if ((retval = nc_open(FILE_NAME, NC_NOWRITE, &ncid)))
   check_err(retval,__LINE__,__FILE__);

   if ((retval = nc_inq_grp_ncid(ncid, GRP_NAME, &grpid)))
   check_err(retval,__LINE__,__FILE__);

   if ((retval = nc_inq_nvars(grpid, &numvars)))
   check_err(retval,__LINE__,__FILE__);

   (void)fprintf(stdout,"number of vars %d\n", numvars);

   return 0;
}

The test file (test.nc) is created by one of the python tests. I don't see any way to upload files to github, so here's a link to the file on my Google Drive.

https://drive.google.com/file/d/0B8_SExLApAk1UWJORmRJRVc4SWs/edit?usp=sharing

Running the above test program with rc4 yields:

number of vars 1

but rc5 yields

number of vars 0

USE_PARALLEL vs. ENABLE_PARALLEL vs. STATUS_PARALLEL vs. HDF5_IS_PARALLEL vs. ...

The number of booleans indicating the parallel status in netCDF is large, probably larger than necessary. This leads to confusion and wrong usage. One example is in source/libsrc4/nc4file.c where the function H5Pset_fapl_mpiposix is used inside an #ifdef USE_PARALLEL fence. It should however be HDF5_IS_PARALLEL or one of its relatives.

A cleanup of the parallel booleans would certainly be worthwhile.

target_link_libraries with public interface

netCDF uses a plain TARGET_LINK_LIBRARIES(netcdf ${TLL_LIBS}) call to register the dependent libraries with netCDF. While this is valid, it introduces the dependent libraries appear in the exported CMake files. That is because CMake treats all dependent libraries as interface libraries by default, i.e., it does as if netCDF provides an interface to, say HDF5. While this is the case if another library's headers are included in netCDF's headers, this is probably not universally true.
To avoid superfluous entries in the export files, use the PRIVATE keyword in target_link_libraries; cf. http://www.cmake.org/cmake/help/v2.8.12/cmake.html#command:target_link_libraries.

netCDF still contains `-W1` ("one") statements

Please replace by the correct "-Wl" (ell).

$ grep "W1" * 2> /dev/null 
CMakeLists.txt:  # Check to see if -W1,--no-undefined is supported.
CMakeLists.txt:  CHECK_C_LINKER_FLAG("-W1,--no-undefined" LIBTOOL_HAS_NO_UNDEFINED)
CMakeLists.txt:    SET(CMAKE_SHARED_LINKER_FLAGS_DEBUG "${CMAKE_SHARED_LINKER_FLAGS_DEBUG} -W1,--no-undefined")
Binary file libncdap.tar.gz matches

List of remote tests wrong

The file https://github.com/Unidata/netcdf-c/blob/master/ncdap_test/CMakeLists.txt lists a number of DAP remote tests, but that list is at least incomplete. test_partvar fails for me with

$ grep findtestser * -r
[...]
ncdap_test/test_partvar2.c:    svc = NC_findtestserver("dts");
ncdap_test/test_partvar.c:    svc = NC_findtestserver("dts");
ncdap_test/test_varm3.c:        const char* svc = NC_findtestserver("thredds");
ncdap_test/nctestserver.c:    url = NC_findtestserver(argv[1]);

In fact, the only test that does fail if I turn off network connectivity is ncdap_test_partvar.

authenticated opendap over proxy

Hello unidata,
I hope that this is the right place to post this (for sure it's the easier way for me hehe):

I'm having problems trying to access an opendap server that require authentication. The problem is i'm behind a proxy server and when i request the url, ncdump returns me an error. On another machine without the proxy everything works as expected.

Example:

export http_proxy="http://username1:[email protected]:8080/"
ncdump "http://username2:[email protected]/thredds/dodsC/test.nc"

returns this:

syntax error, unexpected WORD_WORD, expecting SCAN_ATTR or SCAN_DATASET or SCAN_ERROR
context: <!DOCTYPE^ html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><title>Access Denied</title><link rel="stylesheet" type="text/css"
...
(omited proxy and login informations).

since I don't have control over the proxy I would like to ask if anyone have a solution for this or if you can access opendap servers through proxy servers that both requires authentication. I think this is related to the curl library and the plain password, but i think that you guys should have something to say about it.

so-version unnecessarily has three numbers

This is connected with issue #42.

NetCDF diverges from standard GNU conventions (http://www.faqs.org/docs/Linux-HOWTO/Program-Library-HOWTO.html#AEN46) in that the shared-object name contains three numbers (7.2.0). I know of no other package that does that; the typical thing would be for the SOVERSION to consist of one integer only (mostly corresponding with the package's major version, as discussed in issue #42).

Having a three-numbered SOVERSION doesn't really make sense. The SOVERSION is there to indicate ABI breakage, so it doesn't make a difference if the major so-version number is incremented (8.0.0) or the last one (7.2.1): The piece of information that the OS gets from this ("The ABI has changed.") is the same.

On top of that, a three-numbered so-version makes packaging really weird. In fact, the current Debian package works around this and cuts off the so-version to "7".

travis-ci hook

The latest commit baade3e breaks building/tests, so probably it should be amended or reverted.
Considering the excellent test coverage of netcdf-c, it'd be a potent candidate for GitHub's integration with travis-ci, enabling automatic testing per commit such that these things are less likely to occur in the future.

remove default CMAKE_C_FLAGS settings in CMakeLists.txt

netCDF sets some nonstandard compiler flags if a certain CMake variable is not provided:

  IF(NOT ENABLE_UNUSED_VAR_PAR_WARNINGS)
    SET(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Wno-unused-variable -Wno-unused-parameter")
  ENDIF()

Since this is not default behavior, it creates problems if other distributor do want to turn this on.

The usual way for turning the warnings off would be to remove the above lines from CMakeLists.txt and simply include it in your build scripts, i.e., call cmake as

cmake \
   -DCMAKE_C_FLAGS:STRING="-Wno-unused-variable -Wno-unused-parameter" \
   ../source/

or

CFLAGS="-Wno-unused-variable -Wno-unused-parameter" \
cmake \
   ../source/

This removes the spurious variable ENABLE_UNUSED_VAR_PAR_WARNINGS, removes code from CMakeLists.txt, and ships netCDF with default options so everyone gets what he expects.

libnetcdf.settings

This libnetcdf.settings file, what is it there for? It's not for machine reading I suppose, so installing it into lib/ is really exaggerated. If anything it should be in CMAKE_INSTALL_DOCDIR.

Using nccopy, setting deflate level to 0 with shuffling results in an uncompressed, shuffled file

We noticed this recently because a co-worker of mine created compressed files with shuffling, but then decompressed them due to the I/O and CPU overhead of compression (as compression forces chunk-sized reads). The problem is, chunk-sized reads were still happening. The reason seems to be that while compression was disabled, shuffling was not; and this has the same effect of forcing chunk-sized reads.

I think it's fairly clear that setting compression to none should disable shuffling as well. I've written a patch which includes this behaviour as well as allowing enabling/disabling of shuffling independent of compression.

man4 vs. docs

The netCDF source directory contains the folder man4 which contains the HTML documenation. The folder name man4 is rather misleading, since it resembles the man4 as in /usr/share/man/man4/ where the documenation about "Devices and Special Files" (such as stuff in /dev/) is to be found.

A better folder name would be doc.

This is related to issue #57.

warnings galore: conversion to '...' from '...' may alter its value

The number of build warnings on standard netcdf-c builds is overwhelming. On amd64, we're counting 2911 https://launchpadlibrarian.net/168739307/buildlog_ubuntu-trusty-amd64.netcdf_4.3.1.2~20140316-trusty1_UPLOADING.txt.gz, most of them of the type

conversion to '...' from '...' may alter its value

While all of them seem potentially dangerous, netcdf is the type of software that might actually want to allow this. Consequently, we would either

  • fix the warnings, or
  • adapt the build script to explicitly turn off the conversion warnings.

Is my performance issue in netcdf4-python related to the netCDF C library?

Hi!

Jeff Whitaker from https://code.google.com/p/netcdf4-python/ asked me to post my original mail to him on this forum.

Greetings Werner.

Jeff's response.

Werner: This is probably an issue in the underlying C library. Would you mind posting this as an issue on the github site for the netcdf library? (http://github.com/Unidata/netcdf-c).
-Jeff

This was my original mail to Jeff, with code.

Hi Jeffrey,
First of all: thanks for your library!
We are working on a satellite observation project and work with netCDF. We work with big data files >10GB per file.

I now have the problem that when I have a file with a lot of groups, the netCDF library performance is not good.

I build 2 test programs of which one is based on the HDF5 Python library and the performance is much better:

[dierssen@machine test]$ ./groups_h5.py
File opened in write mode (0s)
Groups created (36s)
File closed (36s)
File opened in append mode (37s)
Groups created (37s)
[dierssen@machine test]$ ./groups_netcdf.py
File opened in write mode (0s)
Groups created (138s)
File closed (138s)
File opened in append mode (188s)
Groups created (203s)

I am not sure if you already know the existence of this issue, but is there any possibility how I could solve it? I prefer to use the netCDF library instead of the HDF5 library.

I hope to hear from you! Below I added the source code for both implementations.

Werner.
I made more test programs and in most cases the HDF5 library was faster, but in the above case the difference was really big.

The netCDF code:

#!/usr/bin/env python
""" Create a netCDF file with 6000 groups and print some timing information."""
from __future__ import print_function
from __future__ import division

from netCDF4 import Dataset
import numpy as np
import numpy.random as rnd
import datetime as dt

def main():
    """ Main function. Called when module is run from the command line."""
    t0=dt.datetime.now()
    nc4_name = 'test_groups.nc'
    fid = Dataset(nc4_name, 'w')
    grp = fid.createGroup('BAND1')

    t1=dt.datetime.now()
    print('File opened in write mode ({}s)'.format((t1-t0).seconds))
    data = rnd.rand(1024, 512)
    for block in range(5000):
        block_name = ('ICID_99999_GROUP_%05d' % block)
        sgrp = grp.createGroup( block_name )
        sgrp.createGroup( 'GEODATA' )
        sgrp.createGroup( 'INSTRUMENT' )
        ssgrp = sgrp.createGroup( 'OBSERVATIONS' )
        sgrp.createDimension('row', size=1024)
        sgrp.createDimension('column', size=512)
        var = ssgrp.createVariable('signal', np.float32, dimensions=('row','column'))
        var[:] = data

    t1=dt.datetime.now()
    print('Groups created ({}s)'.format((t1-t0).seconds))
    fid.close()
    t1=dt.datetime.now()
    print('File closed ({}s)'.format((t1-t0).seconds))

    fid = Dataset( nc4_name, 'a' )
    t1=dt.datetime.now()
    print('File opened in append mode ({}s)'.format((t1-t0).seconds))
    grp = fid.groups['BAND1']
    for block in range(5000, 6000):
        block_name = ('ICID_99999_GROUP_%05d' % block)
        sgrp = grp.createGroup( block_name )
        sgrp.createGroup( 'GEODATA' )
        sgrp.createGroup( 'INSTRUMENT' )
        sgrp.createGroup( 'OBSERVATIONS' )

    t1=dt.datetime.now()
    print('Groups created ({}s)'.format((t1-t0).seconds))
    fid.close()

 if __name__ == '__main__':
    main()

The HDF5 code:

#!/usr/bin/env python
""" Create a HDF5 file with 6000 groups and print some timing information."""
from __future__ import print_function
from __future__ import division

import h5py
#import numpy as np
import numpy.random as rnd
import datetime as dt

def main():
    """ Main function. Called when module is run from the command line."""
    t0=dt.datetime.now()

    h5_name = 'test_groups.h5'
    fid = h5py.File( h5_name, 'w' )
    grp = fid.create_group( 'BAND1' )

    t1=dt.datetime.now()
    print('File opened in write mode ({}s)'.format((t1-t0).seconds))
    data = rnd.rand(1024, 512)

    for block in range(5000):
        block_name = ('ICID_99999_GROUP_%05d' % block)
        sgrp = grp.create_group( block_name )
        sgrp.create_group( 'GEODATA' )
        sgrp.create_group( 'INSTRUMENT' )
        ssgrp = sgrp.create_group( 'OBSERVATIONS' )
        #sgrp.create_dimension('row', size=1024)
        #sgrp.create_dimension('column', size=512)
        var = ssgrp.create_dataset('signal', (1024, 512), dtype='f')
        var[:] = data

    t1=dt.datetime.now()
    print('Groups created ({}s)'.format((t1-t0).seconds))
    fid.close()
    t1=dt.datetime.now()
    print('File closed ({}s)'.format((t1-t0).seconds))

    fid = h5py.File( h5_name, 'a' )
    t1=dt.datetime.now()
    print('File opened in append mode ({}s)'.format((t1-t0).seconds))

    grp = fid['BAND1']
    for block in range(5000, 6000):
        block_name = ('ICID_99999_GROUP_%05d' % block)
        sgrp = grp.create_group( block_name )
        sgrp.create_group( 'GEODATA' )
        sgrp.create_group( 'INSTRUMENT' )
        sgrp.create_group( 'OBSERVATIONS' )

    t1=dt.datetime.now()
    print('Groups created ({}s)'.format((t1-t0).seconds))
    fid.close()

if __name__ == '__main__':
    main()

feature request: fixed length string data type

The old way of strong fixed length strings in netcdf3 (using a multidimensional character array, with an extra dimension for string length) is clunky. The new way in netcdf4 (vlen strings) does not map very well on python and fortran data types. The easiest way to deal with strings in numpy and fortran is with arrays of fixed-length strings (e.g. for 10 character strings you would use the numpy dtype 'S10' and in fortran you would use character(len=10)). HDF5 does support a fixed-length string data type (H5T_STRING).

I know this has been discussed many times before, but I keep getting requests from python users for this so I though I would bring it up again (now that we have this nice github interface to keep track of the discussion).

INTERFACE_LINK_LIBRARIES -> LINK_LIBRARIES

I just discovered another small issue in the target files:
The dependent libraries are listed as INTERFACE_LINK_LIBRARIES although the interfaces (of HDF5, for example) are not exposed to libraries that depend on netCDF. They should probably be listed as LINK_LIBRARIES. I'm just figuring how how to achieve this and will make another pull request about it.

installation error

Hello,

I tried installing netcdf (39a5bf5) (on Mac 10.9) but I get an error when running 'make build':

path/netcdf-git/libsrc/nc3internal.c:1547:32: error: use of
      undeclared identifier 'NC_FORMAT_NC3'
        if(formatp) *formatp = NC_FORMAT_NC3;

I have installed hdf5 and zlib. I think curl was already installed:

% type -a curl
curl is /usr/bin/curl
curl is /opt/local/bin/curl

Is this a code problem or a problem on my side?

VERSION vs. SOVERSION

This is more of a request for clarification.

In a typical unix project, versioning works such that the major revision number indicates the A{B,P}I level of the library, and consequently works as SOVERSION. When looking into /usr/lib/, you will find almost exclusively the scheme

$ ls -l /usr/lib/libpathplan.*
/usr/lib/libpathplan.so.4 -> libpathplan.so.4.0.0
/usr/lib/libpathplan.so.4.0.0

(I have no idea what "pathplan" is, it merely serves as an example here.)

For netCDF, things appear to work a little differently. The software version will soon be 4.3.2, and the SOVERSION is set to 7.2.0. Is that by mistake, for historical reasons, and are there plans to go with the mainstream flow?

CURL_STATICLIB redundant?

I wondered what the preprocessor setting -DCURL_STATICLIB=1 was about, but

$ grep -r CURL_STATICLIB *

only turns up the setting in CMakeLists.txt. It seems redundant.

Low performance with some files caused by reading of entire field

Hi,

I'm working with some data that's chunked; dimensions of the chunks are (1, 510, 1068). It was originally shuffled and compressed; it was decompressed, then I modified the nccopy utility to deshuffle it following observation of miserable performance to test the hypothesis that shuffling might be the source.

Unfortunately, even without shuffling, I observed some pretty dismal performance when reading in small sections of this data; upon further digging, it seems that the NetCDF library is reading in the entire field to extract just a few values. This can be observed when using ncks to extract a small subset of the data (it's observable with other software too) by stracing the program and observing the size of the reads:

$ strace -f ncks -d lat,0,1 -d lon,0,1 -d time,0,4 ~/public_html/pr+tasmax+tasmin_day_BCCAQ+ANUSPLIN300+MRI-CGCM3_historical+rcp85_r1i1p1_19500101-21001231.nc.sub
--- CUT ---
lseek(3, 17462119, SEEK_SET)            = 17462119
read(3, "TREE\1\0\5\0\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\240>!\0\0\0\0\0"..., 3136) = 3136
brk(0x1e8d000)                          = 0x1e8d000
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 2178720) = 2178720
brk(0x20a1000)                          = 0x20a1000
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 2178720) = 2178720
brk(0x22b5000)                          = 0x22b5000
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 2178720) = 2178720
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 2178720) = 2178720
lseek(3, 30539671, SEEK_SET)            = 30539671
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 2178720) = 2178720
write(1, "time[0]=0 lat[0]=41.041666665 lo"..., 74time[0]=0 lat[0]=41.041666665 lon[0]=-140.958333335 tasmin[0]=-32768 degC
) = 74
write(1, "time[0]=0 lat[0]=41.041666665 lo"..., 74time[0]=0 lat[0]=41.041666665 lon[1]=-140.875000005 tasmin[1]=-32768 degC
) = 74
write(1, "time[0]=0 lat[1]=41.124999995 lo"..., 77time[0]=0 lat[1]=41.124999995 lon[0]=-140.958333335 tasmin[1068]=-32768 degC
) = 77
write(1, "time[0]=0 lat[1]=41.124999995 lo"..., 77time[0]=0 lat[1]=41.124999995 lon[1]=-140.875000005 tasmin[1069]=-32768 degC
) = 77
write(1, "time[1]=1 lat[0]=41.041666665 lo"..., 79time[1]=1 lat[0]=41.041666665 lon[0]=-140.958333335 tasmin[544680]=-32768 degC
) = 79
write(1, "time[1]=1 lat[0]=41.041666665 lo"..., 79time[1]=1 lat[0]=41.041666665 lon[1]=-140.875000005 tasmin[544681]=-32768 degC
) = 79
write(1, "time[1]=1 lat[1]=41.124999995 lo"..., 79time[1]=1 lat[1]=41.124999995 lon[0]=-140.958333335 tasmin[545748]=-32768 degC
) = 79
write(1, "time[1]=1 lat[1]=41.124999995 lo"..., 79time[1]=1 lat[1]=41.124999995 lon[1]=-140.875000005 tasmin[545749]=-32768 degC
) = 79
write(1, "time[2]=2 lat[0]=41.041666665 lo"..., 80time[2]=2 lat[0]=41.041666665 lon[0]=-140.958333335 tasmin[1089360]=-32768 degC
) = 80
write(1, "time[2]=2 lat[0]=41.041666665 lo"..., 80time[2]=2 lat[0]=41.041666665 lon[1]=-140.875000005 tasmin[1089361]=-32768 degC
) = 80
write(1, "time[2]=2 lat[1]=41.124999995 lo"..., 80time[2]=2 lat[1]=41.124999995 lon[0]=-140.958333335 tasmin[1090428]=-32768 degC
) = 80
write(1, "time[2]=2 lat[1]=41.124999995 lo"..., 80time[2]=2 lat[1]=41.124999995 lon[1]=-140.875000005 tasmin[1090429]=-32768 degC
) = 80
write(1, "time[3]=3 lat[0]=41.041666665 lo"..., 80time[3]=3 lat[0]=41.041666665 lon[0]=-140.958333335 tasmin[1634040]=-32768 degC
) = 80
write(1, "time[3]=3 lat[0]=41.041666665 lo"..., 80time[3]=3 lat[0]=41.041666665 lon[1]=-140.875000005 tasmin[1634041]=-32768 degC
) = 80
write(1, "time[3]=3 lat[1]=41.124999995 lo"..., 80time[3]=3 lat[1]=41.124999995 lon[0]=-140.958333335 tasmin[1635108]=-32768 degC
) = 80
write(1, "time[3]=3 lat[1]=41.124999995 lo"..., 80time[3]=3 lat[1]=41.124999995 lon[1]=-140.875000005 tasmin[1635109]=-32768 degC
) = 80
write(1, "time[4]=4 lat[0]=41.041666665 lo"..., 80time[4]=4 lat[0]=41.041666665 lon[0]=-140.958333335 tasmin[2178720]=-32768 degC
) = 80
write(1, "time[4]=4 lat[0]=41.041666665 lo"..., 80time[4]=4 lat[0]=41.041666665 lon[1]=-140.875000005 tasmin[2178721]=-32768 degC
) = 80
write(1, "time[4]=4 lat[1]=41.124999995 lo"..., 80time[4]=4 lat[1]=41.124999995 lon[0]=-140.958333335 tasmin[2179788]=-32768 degC
) = 80
write(1, "time[4]=4 lat[1]=41.124999995 lo"..., 80time[4]=4 lat[1]=41.124999995 lon[1]=-140.875000005 tasmin[2179789]=-32768 degC

Note that the result of the read calls is a block of 2178720 bytes, which equals the product of the sizes of the X and Y dimensions and the size of the storage type (float). It should be 8 bytes, which means about 250000 times as much I/O bandwidth is used than should be required.

I have also used strace on other data with different chunk sizes (and possibly different internal metadata); these full-field reads are not present with other data I have examined.

I have also experimented with reading in data using native HDF5 applications, specifically the rhdf5 library; the read pattern when using rhdf5 is what one would expect: reading in only the values that are required (that is, 8 bytes at a time).

I'm at a loss for where to go from here. I've dug into the NetCDF library source code, but I don't have experience with either the NetCDF library source code or the HDF5 library, so it's very slow going. I think this might have to do with non-ideal caching behaviour of the NetCDF library, but that's just a guess. However, given the apparent ubiquity of this problem, and the apparent lack of the problem when using the HDF5 library without the NetCDF library in the middle, it suggests a library bug with the NetCDF library, whatever the source.

A representative subset of the file in question is available here: http://www.pacificclimate.org/~bronaugh/pr+tasmax+tasmin_day_BCCAQ+ANUSPLIN300+MRI-CGCM3_historical+rcp85_r1i1p1_19500101-21001231.nc.sub

error: conflicting types for ‘MPI_Comm’

When configuring with -DCMAKE_C_COMPILER=mpicc, I'm getting the compilation error

include/ncdispatch.h:90:13: error: conflicting types for ‘MPI_Comm’
 typedef int MPI_Comm;
             ^
/usr/lib/openmpi/include/mpi.h:221:37: note: previous declaration of ‘MPI_Comm’ was here
 typedef struct ompi_communicator_t *MPI_Comm;

hdf5 1.8.13 support

Building with hdf5 1.8.13 I get:

../liblib/.libs/libnetcdf.so: undefined reference to `H5Pset_fapl_mpiposix'

hdf5 changes indicates:

The MPI-POSIX driver has been removed. The following C functions and the corresponding Fortran subroutines and C++ wrappers therefore are no longer included in the HDF5 distribution:

    H5Pset_fapl_mpiposix
    H5Pget_fapl_mpiposix 

Applications performing parallel I/O should use the MPI-IO driver, H5Pset_fapl_mpio.

http://www.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetFaplMpio

failing tests on Ubuntu 12.04

A number of tests are failing on Ubuntu 12.04,

The following tests FAILED:
     27 - ncdump_tst_h_scalar (Failed)
     33 - ncdump_shell_tst_h_scalar (Failed)
     48 - nc_test (Failed)
     51 - nc_test4_tst_dims (Failed)
     73 - nc_test4_tst_coords (Failed)
    100 - nc_test4_tst_h_scalar (Failed)

man page syntax

The manpages contain the one or the other syntax error. Test it yourself with

$ man --warnings -E UTF-8 -l -Tutf8 -Z <file> >/dev/null

e.g.,

$ man --warnings -E UTF-8 -l -Tutf8 -Z ./ncdump/nccopy.1 >/dev/null

Most of them are due to the fact that single quotation marks ' at the beginning of a line are interpreted as control characters rather than as elements of the text.

make check fails with --enable-hdf4

netcdf 4.3.1.1 when configured with --enable-hdf4, make check fails with:

make[3]: *** No rule to make target `tst_formatx_hdf4.sh', needed by `tst_formatx_hdf4.sh.log'.  Stop.
make[3]: Leaving directory `/builddir/build/BUILD/netcdf-4.3.1.1/build/nc_test4'

Missing files in 4.3.1.1 release tarball cause build failures

Trying to build the 4.3.1.1 for Fedora with cmake I get:

-- Detecting C compiler ABI info - done
CMake Error at CMakeLists.txt:91 (FILE):
  file COPY cannot find
  "/export/home/orion/fedora/netcdf/netcdf-4.3.1.1/CTestCustom.cmake".

Looks like neither:
./netcdf-c/CTestCustom.cmake
./netcdf-c/CTestConfig.cmake

are making it into the release tarball.

Actually, looks like lots of files are not making it into the release tarball, and the release tarball also has lots of Mac funny files like: ._README.

missing symbols, -Wl,--no-undefined

For your test builds, I would like to suggest to add

-Wl,--no-undefined

to the link line. On CMake, this is achieved by configuring the project with

-DCMAKE_SHARED_LINKER_FLAGS:STRING="-Wl,--no-undefined" \

The idea of this option is to check if all libraries are resolved at link time. If not, the library will still link, but once an executable is linked with netCDF, the missing dependencies must be pulled in manually. This just happened to me way downstream with missing symbols from H5.

Right now, such a build will result in the error

Linking C shared library libnetcdf.so
../libsrc4/CMakeFiles/netcdf4.dir/nc4file.c.o: In function `nc4_create_file':
/home/nschloe/software/netcdf-c/dev/source/libsrc4/nc4file.c:314: undefined reference to `H5Pset_fapl_mpiposix'
/home/nschloe/software/netcdf-c/dev/source/libsrc4/nc4file.c:308: undefined reference to `H5Pset_fapl_mpio'
../libsrc4/CMakeFiles/netcdf4.dir/nc4file.c.o: In function `nc4_open_file':
/home/nschloe/software/netcdf-c/dev/source/libsrc4/nc4file.c:2130: undefined reference to `H5Pset_fapl_mpio'
/home/nschloe/software/netcdf-c/dev/source/libsrc4/nc4file.c:2136: undefined reference to `H5Pset_fapl_mpiposix'
../libsrc4/CMakeFiles/netcdf4.dir/nc4hdf.c.o: In function `set_par_access':
/home/nschloe/software/netcdf-c/dev/source/libsrc4/nc4hdf.c:499: undefined reference to `H5Pset_dxpl_mpio'
/home/nschloe/software/netcdf-c/dev/source/libsrc4/nc4hdf.c:499: undefined reference to `H5Pset_dxpl_mpio'

indicating that HDF5 is missing in the dependency list of libnetcdf.so.

Releases miss Doxygen files

When compiling a netCDF release with ENABLE_DOXYGEN set to ON, the build will fail. The reason is that several Doxygen configuration files are missing. (They are present in the master tree though.)

Fix HDF4 support, broken since release 4.3.1.1

HDF4 support no longer works, failing to compile libsrc4/nc4file.c, although it worked fine in release 4.3.1.1.

To reproduce with autoconf build:

CPPFLAGS="-I${H5DIR}/include -I${H4DIR}/include" LDFLAGS="-L${H5DIR}/lib -L${H4DIR}/lib" ../configure --prefix=${NCDIR} --enable-hdf4 --enable-hdf4-file-tests && make check

which results in compile errors:

../../libsrc4/nc4file.c: In function 'nc4_open_hdf4_file':
../../libsrc4/nc4file.c:2346:16: error: 'NC_ATT_INFO_T' has no member named 'xtype'
../../libsrc4/nc4file.c:2351:48: error: 'NC_ATT_INFO_T' has no member named 'xtype'
...

Error during 'make check' [ line 2: -std=gnu11: command not found]

Error produced using:

commit 16cb63a2238bdee6c918fd43a1d6ffe12234c038
Author: Ward Fisher <[email protected]>
Date:   Tue Jul 15 12:34:43 2014 -0600

Using netcdf-4.3.2 official sources, the compilation succeed and make check completes
with 2 FAILED lines

FAIL: tst_h_scalar
FAIL: tst_h_scalar.sh

The error in the git version occurs before that point
Thank you

Those are the last lines during 'make check'

make[4]: `tst_h_scalar' is up to date.
make[4]: Leaving directory `/auto/home/gufranco/local/src/netcdf-c/ncdump'
make  check-TESTS
make[4]: Entering directory `/auto/home/gufranco/local/src/netcdf-c/ncdump'
make[5]: Entering directory `/auto/home/gufranco/local/src/netcdf-c/ncdump'
/bin/sh: line 2: -std=gnu11: command not found
make[5]: *** [run_tests.sh.log] Error 127
make[5]: Leaving directory `/auto/home/gufranco/local/src/netcdf-c/ncdump'
make[4]: *** [check-TESTS] Error 2
make[4]: Leaving directory `/auto/home/gufranco/local/src/netcdf-c/ncdump'
make[3]: *** [check-am] Error 2
make[3]: Leaving directory `/auto/home/gufranco/local/src/netcdf-c/ncdump'
make[2]: *** [check-recursive] Error 1
make[2]: Leaving directory `/auto/home/gufranco/local/src/netcdf-c/ncdump'
make[1]: *** [check] Error 2
make[1]: Leaving directory `/auto/home/gufranco/local/src/netcdf-c/ncdump'
make: *** [check-recursive] Error 1

NetCDF Configuration Summary

General

NetCDF Version:     4.3.3-rc1
Configured On:      Tue Jul 15 18:14:27 EDT 2014
Host System:        x86_64-unknown-linux-gnu
Build Directory:    /auto/home/gufranco/local/src/netcdf-c
Install Prefix:         /users/gufranco/local

Compiling Options

C Compiler:         /shared/gnu/gcc/latest/bin/gcc -std=gnu11
CFLAGS:             -L/users/gufranco/local/lib
CPPFLAGS:           
LDFLAGS:            
AM_CFLAGS           
AM_CPPFLAGS:            
AM_LDFLAGS:         
Shared Library:         yes
Static Library:         yes
Extra libraries:        -lhdf5_hl -lhdf5 -ldl -lm -lcurl 

Features:

NetCDF-2 API:           yes
NetCDF-4 API:           yes
HDF4 Support:           no
HDF5 Support:           yes
PNetCDF Support:        no
Parallel Support:       no
DAP Support:            yes
Diskless Support:       yes
MMap Support:           no
JNA Support:            no

Compiler and Autotools

gcc version 4.9.0 (GCC) 
automake (GNU automake) 1.14.1
autoconf (GNU Autoconf) 2.69.120-5dcda 
m4 (GNU M4) 1.4.17
libtool (GNU libtool) 2.4.2

make Doxygen mandatory if ENABLE_DOXYGEN is enabled

Right now, when ENABLE_DOXYGEN is true, the CMake build will error out if doxygen is not actually installed on the system. If should fail at configure time though.
The easy fix for this would be to move the FIND_PACKAGE(Doxygen) call inside the IF(ENABLE_DOXYGEN) block, and make it FIND_PACKAGE(Doxygen REQUIRED).

ENABLE_PARALLEL redundant?

It seems that the option ENABLE_PARALLEL entirely depends on the fact whether the HDF5 installation is parallel or not. The only thing that it can effectively do at this moment is turn off parallel support even if HDF5 is parallel. I'm not even sure if this is intended.

I would hence suggest to remove the option ENABLE_PARALLEL and have STATUS_PARALLEL set to the value of HDF5_IS_PARALLEL.

Using cmake results in broken nc-config script

Using cmake:

bronaugh@devel2:~/netcdf$ nc-config --libs
-L/usr/local/lib -l/usr/lib/x86_64-linux-gnu/libhdf5.so;/usr/lib/x86_64-linux-gnu/libhdf5_hl.so;/usr/lib/x86_64-linux-gnu/libm.so;/usr/lib/x86_64-linux-gnu/libz.so;/usr/lib/x86_64-linux-gnu/libcurl.so

Using ./configure:

bronaugh@devel2:~/netcdf$ /usr/bin/nc-config --libs
-L/usr/lib -lnetcdf

This pretty much breaks anything that tries to link using flags from nc-config if netcdf-c has been built using cmake.

Local version of cmake is version 2.8.12.1 -- not sure if that's relevant.

Please let me know if you need further details.

performance problems when opening and closing a lot of netCDF files which contain compound data

Hi.

Jeff Whitaker asked me to make an issue here regarding some perfomance issues I encountered. I was reading a few thousand netCDF files (opening and closing sequentially) when my Python program suddenly slowed down. Opening a file suddenly took more than 1 minute. At the same moment I noticed memory leakage. However, it only occurred when I was reading in files which contain compound data structures.

I made a similar program in C, to check if I got the same problem. I did not see the memory leakage anymore, but the significant slow down was still present after opening and closing around 3000 or 4000 times the file. It is pretty reproducable. I do not experience any problems when using the HDF library.

My C test program is:

#include <stdlib.h>
#include <stdio.h>
#include <netcdf.h>

#define FILE_NAME "test.nc"

int main()
{
    int retval;
    int ncid;
    int grp_ncid;
    int index;
    int rh_id;

    struct s1 {
      double i1;
      double i2;
    };
    struct s1 compound_data[1024][512];

    for ( index = 0; index < 8000; index++ ) {

        retval = nc_open( FILE_NAME, NC_NOWRITE, &ncid );

        (void) printf( "attempt %d\n", index);

        retval = nc_inq_ncid(ncid, "group0", &grp_ncid);
        retval = nc_inq_varid (grp_ncid, "data0", &rh_id);
        retval = nc_get_var(grp_ncid, rh_id, &compound_data[0][0]);

        printf("i1=%f\n", compound_data[0][0].i1);

        retval = nc_close( ncid );
    }

    return retval;
};

My Python code (which creates the file in the first place, but also shows the slowdown):

import netCDF4 as nc
import numpy as no
import time

#create test file
filename = "test.nc"
nc_file = nc.Dataset(filename, 'w')
nc_file.createDimension('nRows', 1024)
nc_file.createDimension('nColumns', 512)
for group_nr in range(10):
    nc_group = nc_file.createGroup("group{}".format(group_nr))
    complex128 = np.dtype([('real',np.float64),('imag',np.float64)])
    complex128_t = nc_group.createCompoundType(complex128,'complex128')
    for nr in range(20):
        cheb_data = np.ndarray(shape=(1024,512), dtype=complex128_t)
        cheb_data['real'] = np.ones((1024,512)) * 8.0
        cheb_data['imag'] =np.ones((1024,512)) * 5.0
        var = nc_group.createVariable("data{}".format(nr), complex128_t, dimensions=('nRows', 'nColumns',))
        var[:] = cheb_data
nc_file.close()

#read test file number of times
index = 0

for i in range(12000):
    start_time = int(round(time.time() * 1000))
    ds_root = nc.Dataset(filename, 'r')
    ds_root.close()
    stop_time = int(round(time.time() * 1000))
    index+=1
    print("{}: {}".format(index, stop_time-start_time))

The libraries we use are Python 2.7.3, numpy 1.7.0 and netCDF4 1.0.8 (ncdf4lib: 4.3.1.1, hdf5lib: 1.8.10).

So, I do have a workaround (in using the HDF library), but I do think the issue is important enough to mention. Reading in a lot of files which contain compound data should not give problems. And the problem is pretty is to reproduce.

Well, I just wanted to let you guys know!

Greetings Werner.

make check fails with pnetcdf

As reported in unidata esupport, netcdf 4.3.2 fails 'make check' when run with pnetcdf. I've confirmed this using pnetcdf 1.4.1 (as reported by the user). The test works in 4.3.0. I will be using git bisect to track down where this failure was introduced.

I am using --disable-dap, because dap tests are known to fail in older releases due to hard-coded server URL's.

HAVE_CONFIG_H redundant?

Similarly to bug #66,

$ grep -r HAVE_CONFIG_H *

only spits out places in configuration files. HAVE_CONFIG_H can hence be removed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.