Coder Social home page Coder Social logo

cmor crashing about cmor HOT 13 CLOSED

pcmdi avatar pcmdi commented on July 20, 2024
cmor crashing

from cmor.

Comments (13)

doutriaux1 avatar doutriaux1 commented on July 20, 2024

@pfuhe1 Could you please attach or send me a sample script that reproduces this?

from cmor.

doutriaux1 avatar doutriaux1 commented on July 20, 2024

@pfuhe1 can you compile with debug so that the trace tell us where the core dump happens? Or run it via gdb.

from cmor.

pfuhe1 avatar pfuhe1 commented on July 20, 2024

@doutriaux1 I have been having trouble reproducing this crash reliably, so will do a bit more testing myself before sending you a script.

I am also unsure if I have compiled in debug model correctly. I set the environment variable CFLAGS = '-g' when I compiled, but this doesn't seem to change the trace that is output when it crashes. Do I have to specify some other debug options or set them another way?

from cmor.

pfuhe1 avatar pfuhe1 commented on July 20, 2024

I have come back to this issue again, and have produced a simple script that uses cmor to write random data to a file. It then loops, writing over the file many times.

It seems to have a memory leak and crashes after a while from running out of memory. I'm wondering if this is due to the same issue as above.

I don't think I can attach the files here, so I'm sending you the script and some example output by email.

from cmor.

doutriaux1 avatar doutriaux1 commented on July 20, 2024

thank you so much for doing this, can you please post the script it will help debugging.

from cmor.

pfuhe1 avatar pfuhe1 commented on July 20, 2024
# This is a dummy version of the ACCESS Post Processor.
# Peter Uhe
# 24 July 2014
#
import numpy as np
import datetime
import cmor

#
#main function to post-process files
#
def app(opts):

    #
    # Set up the CMOR stuff.
    #
    print 'cmor setup'
    cmor.setup(inpath=opts['table_path'], 
            netcdf_file_action=cmor.CMOR_REPLACE_3, 
            set_verbosity=cmor.CMOR_NORMAL, 
            exit_control=cmor.CMOR_NORMAL,
            logfile=opts['logfile'], create_subdirectories=1)

    #
    # Define the dataset.
    #
    cmor.dataset(outpath=opts['outpath'],
            experiment_id=opts['experiment_id'], 
            institution=opts['institution'], 
            source=opts['source'],
            calendar=opts['calendar'], 
            realization=opts['realization'],
            contact=opts['contact'],
            history=opts['history'],
            comment=opts['comment'],
            references=opts['references'],
            model_id=opts['model_id'],
            forcing=opts['forcing'],
            initialization_method=opts['initialization_method'],
            physics_version=opts['physics_version'],
            institute_id=opts['institution_id'],
            parent_experiment_id=opts['parent_experiment_id'],
            branch_time=opts['branch_time'],
            parent_experiment_rip=opts['parent_experiment_rip'])

    #
    # Load the CMIP tables into memory.
    #
    tables=[]
    tables.append(cmor.load_table('CMIP5_grids'))
    tables.append(cmor.load_table(opts['cmip_table']))

    #manually create time axis for monthly data
    min_tvals=[]
    max_tvals=[]
    cmor_tName='time'
    tvals=[]
    axis_ids=[]
    for year in range(opts['tstart'],opts['tend']+1):
        for mon in range(1,13):
            tvals.append(datetime.date(year,mon,15).toordinal()-1)
    # set up time values and bounds
    for i,ordinaldate in enumerate(tvals):
        model_date  = datetime.date.fromordinal(int(ordinaldate)+1)
        #min bound is first day of month
        model_date=model_date.replace(day=1)
        min_tvals.append(model_date.toordinal()-1)
        #max_bound is first day of next month
        tyr=model_date.year+model_date.month/12
        tmon=model_date.month%12+1                              
        model_date=model_date.replace(year=tyr,month=tmon)
        max_tvals.append(model_date.toordinal()-1)
        #correct date to middle of month
        mid=(max_tvals[i]-min_tvals[i])/2.
        tvals[i]=min_tvals[i]+mid
    tval_bounds = np.column_stack((min_tvals, max_tvals))
    #set up cmor time axis:
    cmor.set_table(tables[1])
    time_axis_id = cmor.axis(table_entry=cmor_tName,
        units='days since 0001-01-01', length=len(tvals),
        coord_vals=tvals[:], cell_bounds=tval_bounds[:],
        interval=None)
    axis_ids.append(time_axis_id)

    #
    # Define the CMOR variable.
    #
    cmor.set_table(tables[1])
    in_missing = float(1.e20)
    print 'defining cmor variable'
    variable_id = cmor.variable(table_entry=opts['vcmip'], units=opts['in_units'], \
    axis_ids=axis_ids, type='f', missing_value=in_missing)

    #
    # Write the data 
    #
    data_vals=np.array(np.random.rand(len(tvals)),dtype=np.float32)
    try:
        print 'writing...'
        cmor.write(variable_id, data_vals[:], ntimes_passed=np.shape(data_vals)[0]) #assuming time is the first dimension
    except Exception, e:
        raise Exception("ERROR writing data!")
    #
    # Close the CMOR file.
    #
    try:
        path = cmor.close(variable_id, file_name=True)
    except:
        raise Exception("ERROR closing cmor file!")

    return path



if __name__ == "__main__":
#   from pympler import tracker
    import resource
#   from guppy import hpy

# Example dictionary containing metadata used by the post-processor
    opts={'initialization_method': 1, 'calculation': '', 'vin': ['temp_global_ave'], 'branch_time': 109207.0, 'vcmip': 'thetaoga', 'positive': '', 'tend': 1852, 'tstart': 1850, 'realization': 1, 'forcing': 'GHG, Oz, SA, Sl, Vl, BC, OC, (GHG = CO2, N2O, CH4, CFC11, CFC12, CFC113, HCFC22, HFC125, HFC134a)', 'infile': '/g/data1/p66/ACCESSDIR/har599/ACCESS/output/hg2-r11Mhd/history///ocn/ocean_scalar.nc-*', 'model_id': 'ACCESS-test', 'parent_experiment_id': 'piControl', 'cmip_table': 'CMIP5_Omon', 'in_units': 'K', 'version_number': 'v20130710', 'notes': 'branch date is 300-01-01', 'physics_version': 1, 'axes_modifier': 'dropX', 'experiment_id': 'historical', 'parent_experiment_rip': 'r1i1p1'}

    opts['source']='ACCESS-test 2011. \
        Atmosphere: AGCM v1.0 (N96 grid-point, 1.875 degrees EW x approx 1.25 degree NS, 38 levels); '+\
        'ocean: NOAA/GFDL MOM4p1 (nominal 1.0 degree EW x 1.0 degrees NS, tripolar north of 65N, '+\
         'equatorial refinement to 1/3 degree from 10S to 10 N, cosine dependent NS south of 25S, 50 levels); '+\
        'sea ice: CICE4.1 (nominal 1.0 degree EW x 1.0 degrees NS, tripolar north of 65N, '+\
        'equatorial refinement to 1/3 degree from 10S to 10 N, cosine dependent NS south of 25S); '+\
        'land: MOSES2 (1.875 degree EW x approx. 1.25 degree NS, 4 levels'
    opts['logfile']=None
    opts['institution']='CSIRO-BOM'
    opts['institution_id']='CSIRO-BOM'
    opts['calendar']='proleptic_gregorian'
    opts['contact']='dummy'
    opts['history']='dummy'
    opts['references']='dummy'
    opts['comment']='dummy'
    opts['outpath']='/short/p66/pfu599'
    opts['table_path']='/g/data1/p66/pfu599/post_processor/branches/APP1-0/cmip5-cmor-tables/Tables'

# Memory profiler setup
#   tr = tracker.SummaryTracker()
#   tr.print_diff()              
#   hp=hpy()
#   new=hp.heap()

# Loop over many times rewriting the same file. 
    for i in range(10):
        print i
        print app(opts)

# Memory profiling 
#       tr.print_diff()
#       old=new
#       new=hp.heap()
#       diff=new-old
#       print diff
#       print diff.byrcs[0].byid
        print 'Memory usage: %s (kb)' % resource.getrusage(resource.RUSAGE_SELF).ru_maxrss

from cmor.

pfuhe1 avatar pfuhe1 commented on July 20, 2024

That's it. Sorry about the length of the script.

from cmor.

doutriaux1 avatar doutriaux1 commented on July 20, 2024

perfect! Thx!

from cmor.

pfuhe1 avatar pfuhe1 commented on July 20, 2024

You need to change the lines setting opts['outpath'] and opts['tablepath'] for your machine.

Note I am running cmor 2.9.1 with python 2.7.6 and numpy 1.8.0.

I have also just ran the script on an old machine I still have access to, which had cmor 2.8.3 installed along with python 2.6 and numpy 1.3.0, and the problem with the increasing memory usage doesn't occur.

from cmor.

MartinDix avatar MartinDix commented on July 20, 2024

I use the same system for which Peter reported the memory leak.

More or less by accident I found that it's due to building with a particular copy of the uuid library that was on the machine. Using a new version built from source fixes the leak.

Now testing whether this fixes the intermittent crashes from the full processing.

from cmor.

doutriaux1 avatar doutriaux1 commented on July 20, 2024

@MartinDix this is great news! please let usknow, I will tweak to configure to make sure we use the correct uuid version.

from cmor.

MartinDix avatar MartinDix commented on July 20, 2024

This wasn't the real problem.

A lucky observation showed that the crashes occurred when writing a 4D file after a 3D file.This allowed creating an example small enough to run in totalview which then pointed to the line

  free(cmor_axes[cmor_naxes].wrapping);

at the end of cmor_axis in cmor_axes.c, https://github.com/PCMDI/cmor/blob/master/Src/cmor_axes.c#L1343

The wrapping pointer is only allocated for longitude axes. cmor_axes is an external variable so keeps the value of the freed pointer between calls. If the first file has dimensions (T,Y,X), cmor_axes[2].wrapping gets allocated and freed. If the next file then has dimensions (T,Z,Y,X) the cmor_axis call for Y still has a non-null value for cmor_axes[2].wrapping which it tries to free again.

Sometimes this gives the double free error that Peter originally reported. Other times it gives other more or less obscure crashes.

I've created an example script https://gist.github.com/MartinDix/6b2624d620da79c4e9f9

Adding a print

  printf("Wrapping %d %p\n", cmor_naxes,   cmor_axes[cmor_naxes].wrapping);

before the free then gives

% python cmor_testscript.py 
Wrapping 0 (nil)
Wrapping 1 (nil)
Wrapping 2 0x3270b50
writing...
/short/p66/mrd599/CMIP5/output/CMOR-test/CMOR-test/historical/mon/atmos/ts/r1i1p1//ts_Amon_CMOR-test_historical_r1i1p1_185001-185012.nc
Wrapping 0 (nil)
Wrapping 1 (nil)
Wrapping 2 0x3270b50
Wrapping 3 0x32b1a70
writing...
*** glibc detected *** python: corrupted double-linked list: 0x0000000003270b40 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x75e76)[0x2b2719f81e76]
/lib64/libc.so.6(+0x79caa)[0x2b2719f85caa]
/lib64/libc.so.6(__libc_malloc+0x71)[0x2b2719f866b1]
/apps/netcdf/4.3.2/lib/libnetcdf.so.7(+0x6a7ce)[0x2b274af017ce]

In this case it's crashed at some point after the actual free, but just where it crashes seems to depend on array sizes, netcdf library versions etc.

I think the fix is to add

  cmor_axes[cmor_naxes].wrapping = NULL;

after the free. This seems to have fixed things here.

from cmor.

doutriaux1 avatar doutriaux1 commented on July 20, 2024

wow! Nice catch! Will fix and add your script to the test suite! Thanks!

from cmor.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.