Coder Social home page Coder Social logo

Comments (27)

sdvillal avatar sdvillal commented on June 15, 2024

The UUIDs I cannot smooth with my non-patched pytables plus good version of libhdf5:

  • Katja's 66a392f402f811e5acb4d850e6c4a608 (20150525_180912.mainbrain.h5)
  • Matthew's edd9d0dc23f811e5a038bcee7bdac3c6 (20150706_180558.mainbrain.h5)

Right now I think these might be genuinely broken mainbrain files and show a different problem than the one happening after pytables update.

I run this command:

flydra_analysis_export_flydra_hdf5 --dest-file ~/katja.h5 /mnt/strawscience/data/auto_pipeline/raw_archive/by_uuid/66a392f402f811e5acb4d850e6c4a608/*.mainbrain.h5

And I get this log (same for Matthew's file)

STAGE 1: finding timestamps
opening file /mnt/strawscience/data/auto_pipeline/raw_archive/by_uuid/66a392f402f811e5acb4d850e6c4a608/20150525_180912.mainbrain.h5...
caching raw 2D data... done
(cached index of 32154178 frame values of dtype int64)
hostname time_gain time_offset
-------- --------- -----------
   'localhost' 1.0 -0.000182314803206

caching Kalman obj_ids...
finding unique obj_ids...
(found 16928)
(will export 16928)
finding 2d data for each obj_id...
/home/santi/Proyectos/imp/software/flydra/flydra/a2/data2smoothed.py:165: UserWarning: no host flycube6 in timestamp data. making up data.
  'data.'%remote_hostname)
STAGE 2: running Kalman smoothing operation
detected file loaded with dynamic model "EKF mamarama, units: mm"
  for smoothing, will use dynamic model "mamarama, units: mm"
/home/santi/Proyectos/imp/software/flydra/flydra/a2/core_analysis.py:1554: UserWarning: passing data_file as string to core_analysis.CachingAnalyzer.load_data()
  warnings.warn('passing data_file as string to '
/home/santi/Utils/Science/anaconda/lib/python2.7/site-packages/adskalman-0.3.4-py2.7.egg/adskalman/adskalman.py:453: RuntimeWarning: invalid value encountered in isnan
Traceback (most recent call last):
  File "/home/santi/Utils/Science/anaconda/bin/flydra_analysis_export_flydra_hdf5", line 9, in <module>
    load_entry_point('flydra==0.6.6', 'console_scripts', 'flydra_analysis_export_flydra_hdf5')()
  File "/home/santi/Proyectos/imp/software/flydra/flydra/a2/data2smoothed.py", line 311, in export_flydra_hdf5
    main(hdf5_only=True)
  File "/home/santi/Proyectos/imp/software/flydra/flydra/a2/data2smoothed.py", line 410, in main
    **kwargs)
  File "/home/santi/Proyectos/imp/software/flydra/flydra/a2/data2smoothed.py", line 259, in convert
    **kwargs)
  File "/home/santi/Proyectos/imp/software/flydra/flydra/a2/core_analysis.py", line 1663, in load_data
    elevation_up_bias_degrees=elevation_up_bias_degrees,
  File "/home/santi/Proyectos/imp/software/flydra/flydra/a2/core_analysis.py", line 943, in query_results
    allocate_space_for_direction=have_body_axis_information,
  File "/home/santi/Proyectos/imp/software/flydra/flydra/a2/core_analysis.py", line 372, in observations2smoothed
    dynamic_model_name=dynamic_model_name)
  File "/home/santi/Proyectos/imp/software/flydra/flydra/a2/core_analysis.py", line 347, in kalman_smooth
    valid_data_idx=idx)
  File "build/bdist.linux-x86_64/egg/adskalman/adskalman.py", line 565, in kalman_smoother
  File "build/bdist.linux-x86_64/egg/adskalman/adskalman.py", line 455, in kalman_filter
ValueError: cannot do Kalman filtering with nan values in parameters
Closing remaining open files:/mnt/strawscience/data/auto_pipeline/raw_archive/by_uuid/66a392f402f811e5acb4d850e6c4a608/20150525_180912.mainbrain.kh5-smoothcache...done/mnt/strawscience/data/auto_pipeline/raw_archive/by_uuid/66a392f402f811e5acb4d850e6c4a608/20150525_180912.mainbrain.h5...done

from flydra.

sdvillal avatar sdvillal commented on June 15, 2024

With the files in the previous comment I seem to have 0% success rate.

With the other files, for which smoothing crashes randomly in strawcore, I seem to have a 100% success rate.

I guess I could systematically explore the last 3 months experiments, until now when crashes happen in my side, they happen at the very beginning.

from flydra.

nzjrs avatar nzjrs commented on June 15, 2024

There is a stochastic part to this bug. I have ran the following script overnight on my machine and on strawcore

#!/bin/bash

FILES="
20150525_180912.mainbrain.h5
20150624_175622.mainbrain.h5
20150703_173108.mainbrain.h5
20150706_180558.mainbrain.h5
20150703_173318.mainbrain.h5
20150703_174525.mainbrain.h5
20150702_174748.mainbrain.h5
20150702_175828.mainbrain.h5
20150702_175709.mainbrain.h5
20150630_175211.mainbrain.h5
20150702_175121.mainbrain.h5
"

while true
do
        for f in $FILES
        do
                y=${f:0:4}
                m=${f:4:2}
                #ipath=/mnt/strawscience/
                path="/mnt/strawscience/data/auto_pipeline/raw_archive/by_date/$y/$m/$f"
                cp --update $path $f
                fn=${f:0:25}
                cachefn="$fn.kh5-smoothcache"
                rm -f $cachefn
                echo -n "$f "

                flydra_analysis_export_flydra_hdf5 $f --dest-file /dev/null >${f}.log 2>&1 

                if [ $? -eq 0 ]; then
                        echo OK
                else
                        echo FAIL
                fi

        done
done

On my machine

20150525_180912.mainbrain.h5 OK
20150624_175622.mainbrain.h5 OK
20150703_173108.mainbrain.h5 FAIL
20150706_180558.mainbrain.h5 FAIL
20150703_173318.mainbrain.h5 OK
20150703_174525.mainbrain.h5 FAIL
20150702_174748.mainbrain.h5 OK
20150702_175828.mainbrain.h5 OK
20150702_175709.mainbrain.h5 OK
20150630_175211.mainbrain.h5 FAIL
20150702_175121.mainbrain.h5 OK
20150525_180912.mainbrain.h5 OK
20150624_175622.mainbrain.h5 OK
20150703_173108.mainbrain.h5 FAIL
20150706_180558.mainbrain.h5 FAIL
20150703_173318.mainbrain.h5 FAIL
20150703_174525.mainbrain.h5 FAIL
20150702_174748.mainbrain.h5 OK
20150702_175828.mainbrain.h5 OK
20150702_175709.mainbrain.h5 OK
20150630_175211.mainbrain.h5 FAIL
20150702_175121.mainbrain.h5 OK
20150525_180912.mainbrain.h5 OK
20150624_175622.mainbrain.h5 OK
20150703_173108.mainbrain.h5 FAIL
20150706_180558.mainbrain.h5 FAIL

20150703_173318.mainbrain.h5 passes once and fails once. On strawcore

20150525_180912.mainbrain.h5 OK
20150624_175622.mainbrain.h5 OK
20150703_173108.mainbrain.h5 FAIL
20150706_180558.mainbrain.h5 FAIL
20150703_173318.mainbrain.h5 OK
20150703_174525.mainbrain.h5 FAIL
20150702_174748.mainbrain.h5 OK
20150702_175828.mainbrain.h5 OK
20150702_175709.mainbrain.h5 OK
20150630_175211.mainbrain.h5 FAIL
20150702_175121.mainbrain.h5 OK
20150525_180912.mainbrain.h5 OK
20150624_175622.mainbrain.h5 OK
20150703_173108.mainbrain.h5 FAIL
20150706_180558.mainbrain.h5 FAIL
20150703_173318.mainbrain.h5 OK
20150703_174525.mainbrain.h5 FAIL
20150702_174748.mainbrain.h5 OK

from flydra.

astraw avatar astraw commented on June 15, 2024

@nzjrs but your results about stochasticity are without 9179b57, right? I predict with that commit you won't have stochasticity anymore.

from flydra.

nzjrs avatar nzjrs commented on June 15, 2024

those files which @sdvillal and I have never been able to process - those are corrupt an unrecoverable?

from flydra.

nzjrs avatar nzjrs commented on June 15, 2024

@astraw So far this made it worse (testing with master)

20150525_180912.mainbrain.h5 FAIL
20150624_175622.mainbrain.h5 FAIL
20150703_173108.mainbrain.h5 FAIL
20150706_180558.mainbrain.h5 FAIL

sample log

cat 20150525_180912.mainbrain.h5.log 
STAGE 1: finding timestamps
opening file 20150525_180912.mainbrain.h5...
caching raw 2D data.../home/stowers/Straw/flydra.git/flydra/a2/data2smoothed.py:165: UserWarning: no host flycube6 in timestamp data. making up data.
  'data.'%remote_hostname)
/home/stowers/Straw/flydra.git/flydra/a2/core_analysis.py:1554: UserWarning: passing data_file as string to core_analysis.CachingAnalyzer.load_data()
  warnings.warn('passing data_file as string to '
 done
(cached index of 32154178 frame values of dtype int64)
hostname time_gain time_offset
-------- --------- -----------
   'localhost' 0.999999999999 0.000729657720369

caching Kalman obj_ids...
finding unique obj_ids...
(found 16928)
(will export 16928)
finding 2d data for each obj_id...
STAGE 2: running Kalman smoothing operation
detected file loaded with dynamic model "EKF mamarama, units: mm"
  for smoothing, will use dynamic model "mamarama, units: mm"
Traceback (most recent call last):
  File "/home/stowers/.virtualenvs/flydranew/bin/flydra_analysis_export_flydra_hdf5", line 9, in <module>
    load_entry_point('flydra==0.6.6', 'console_scripts', 'flydra_analysis_export_flydra_hdf5')()
  File "/home/stowers/Straw/flydra.git/flydra/a2/data2smoothed.py", line 311, in export_flydra_hdf5
    main(hdf5_only=True)
  File "/home/stowers/Straw/flydra.git/flydra/a2/data2smoothed.py", line 410, in main
    **kwargs)
  File "/home/stowers/Straw/flydra.git/flydra/a2/data2smoothed.py", line 259, in convert
    **kwargs)
  File "/home/stowers/Straw/flydra.git/flydra/a2/core_analysis.py", line 1663, in load_data
    elevation_up_bias_degrees=elevation_up_bias_degrees,
  File "/home/stowers/Straw/flydra.git/flydra/a2/core_analysis.py", line 943, in query_results
    allocate_space_for_direction=have_body_axis_information,
  File "/home/stowers/Straw/flydra.git/flydra/a2/core_analysis.py", line 372, in observations2smoothed
    dynamic_model_name=dynamic_model_name)
  File "/home/stowers/Straw/flydra.git/flydra/a2/core_analysis.py", line 347, in kalman_smooth
    valid_data_idx=idx)
  File "/home/stowers/Straw/adskalman.git/adskalman/adskalman.py", line 566, in kalman_smoother
    full_output=full_output)
  File "/home/stowers/Straw/adskalman.git/adskalman/adskalman.py", line 454, in kalman_filter
    raise ValueError("cannot do Kalman filtering with nan values in %s (shape %r)" % (name,arr.shape))
ValueError: cannot do Kalman filtering with nan values in R (shape (21642, 3, 3))
Closing remaining open files:20150525_180912.mainbrain.h5...done/mnt/ssd/CORRUPT/20150525_180912.mainbrain.kh5-smoothcache...done

from flydra.

nzjrs avatar nzjrs commented on June 15, 2024

So far @sdvillal gets a prize for fixing the crash with fce3436

20150525_180912.mainbrain.h5 OK
20150624_175622.mainbrain.h5 OK
20150703_173108.mainbrain.h5 OK
20150706_180558.mainbrain.h5 OK
20150703_173318.mainbrain.h5 OK
20150703_174525.mainbrain.h5 OK

the question is now - are the final h5 files the same

from flydra.

astraw avatar astraw commented on June 15, 2024

In my opinion, fce3436 is not a fix but a dangerous workaround that lets a bug propagate bad data into our system. I would delete those resulting processed files as they definitely will contain problematic results. I am working on a real fix but my internet access here is really terrible and hence I'm slow with a real fix.

from flydra.

nzjrs avatar nzjrs commented on June 15, 2024

Some observations:

I'm concerned that this bug started getting hit frequently in the last few days (07/02 and 07/03). This coincided with the upgrading of flydra / pytables on strawcore. Maybe that is coincidence. Maybe not.

Are you suggesting your fix, which caused all those files to be impossible to process was the correct fix or a partial fix? My reading of "propagate bad data into our system" is that the original files contained invalid data and therefor should fail. Thus your fix was correct. Yet this is a terribly large loss of data.

I understand that you don't have good internet for a fix, but a little communication on what you think the outcome is, or what you are currently thinking, would be welcome. As you might recall, regardless of this particular problem, we have gone down a one-way street (that we can't back out of) by moving to new pytables (as a rollback to old version leaves recent h5 files unreadable).

I also am very nervous about this whole pytables mokey patching business.

Between isilon and this, it has been a complete write-off week for us here.

from flydra.

sdvillal avatar sdvillal commented on June 15, 2024

These are the parallel smoothing + simple flydra comparison scripts I'm using

I'm right now running these in strz (Non sunt multiplicanda entia sine necessitate, I know...), using the same conda environment but for 3 different flydra versions:

  • 066: release 0.6.6 (with empty array initialisation)
  • master: 73bd9df (current master, with nan initialisation + the workaround for the race condition)
  • nonans: fce3436 (the reverted, initialise to nonnan commit)

I will maybe also run the same setup in str22 (my machine). In any case, I will report here the results once they are there (sneak peek, both 066 and master fail to smooth most of these files).

from flydra.

sdvillal avatar sdvillal commented on June 15, 2024

I have added a new variant to the mix:

  • nonans0: 614d469 like nonans, but initialising to 0; in my mind it should reproduce the errors I had with initialisation to empty, maybe showing that we are using data outside valid observations ranges somewhere other than in nanchecks.

I'm running in both strz and str22, to check if the distro / processor make a difference.

@astraw Let me know if at any time you would like to run the tests with more proper fixes, it takes me no time now.

from flydra.

sdvillal avatar sdvillal commented on June 15, 2024

Will test it very soon, in case it does not fix it I will call this variant:

  • masterskip: a91f618 like master but with context manager for pytables + non-contiguous skipping fix

Anything that fixes it needs to be proven also in production

from flydra.

nzjrs avatar nzjrs commented on June 15, 2024

Santi, let me know a reduced subset you want me to test today (which HEADs
etc). I need my computer to not be painfully slow for the next few hours.

On 10 July 2015 at 09:31, Santi Villalba [email protected] wrote:

Will test it very soon


Reply to this email directly or view it on GitHub
#24 (comment).

from flydra.

sdvillal avatar sdvillal commented on June 15, 2024

mmmm let me write a bit of analysis code (yes, I cannot help it, will put stuff in a pandas dataframe) and I will tell you better but...

Yesterday I was very puzzled by you being able to smooth successfully these two in strawcore with 0.6.6:

  • Katja's 66a392f402f811e5acb4d850e6c4a608 (20150525_180912.mainbrain.h5)
  • Matthew's edd9d0dc23f811e5a038bcee7bdac3c6 (20150706_180558.mainbrain.h5)

Could you try again (0.6.6, these two) and tell me how often do you succeed?

from flydra.

nzjrs avatar nzjrs commented on June 15, 2024

Sure no problem, with master? with conda? with ubuntu?

On 10 July 2015 at 09:41, Santi Villalba [email protected] wrote:

mmmm let me write a bit of analysis code (yes, I cannot help it, will put
stuff in a pandas dataframe) and I will tell you better but...

Yesterday I was very puzzled by you being able to smooth successfully
these two in strawcore with 0.6.6:

  • Katja's 66a392f402f811e5acb4d850e6c4a608 (
    20150525_180912.mainbrain.h5)
  • Matthew's edd9d0dc23f811e5a038bcee7bdac3c6 (
    20150706_180558.mainbrain.h5)

Could you try again (0.6.6, these two) and tell me how often do you
succeed?


Reply to this email directly or view it on GitHub
#24 (comment).

from flydra.

sdvillal avatar sdvillal commented on June 15, 2024

Actually, I have changed my opinion. Better try first with Andrew's last commit (master) and ubuntu. If that fix it, we will be in much better, less work position.

If not, I was meaning released 0.6.6 (so choose conda or ubuntu, I would start with conda). I have a hypothesis why these files might have been failing until then that I will try to falsify later on (by checking if nans make it to these arrays when initialised to empty).

From my side, I will in this order:

  • try first Andrew's last commit
  • write that little results analysis code + report results
  • if we still have the problem, maybe debug a bit flydra (why should I not have the fun too?)

from flydra.

nzjrs avatar nzjrs commented on June 15, 2024

conda with activate or without (as I get different results)?

from flydra.

sdvillal avatar sdvillal commented on June 15, 2024

It is interesting. If we get different results with and without accelerate I believe it means we are looking at blas/lapack dependent errors. I would just use accelerate to limit options (that's what I'm using everywhere, so we coulod cross-compare).

The first few minutes of tests with masterskip in strz look promising.

from flydra.

nzjrs avatar nzjrs commented on June 15, 2024

maybe I wait. your tests are much faster than mine.

can you also test on strz outside of conda? i.e. its just stock ubuntu right?

(oh, and as I said, they are different stochastic results, so in not really determinatively different)

from flydra.

sdvillal avatar sdvillal commented on June 15, 2024

Yes, just run master with ubuntu, that is the interesting bit, and otherwise let your machine and yourself take a rest.

I'm afraid that even if using stock ubuntu in strz is not a big deal, that cannot make it for a proper test in production (it is a 14.04 and I do not install from Andrew's repos).

from flydra.

nzjrs avatar nzjrs commented on June 15, 2024

ok, its running now. I put the interesting files (for you) first in the list

On 10 July 2015 at 10:25, Santi Villalba [email protected] wrote:

Yes, just run master with ubuntu, that is the interesting bit, and
otherwise let your machine and yourself take a rest.

I'm afraid that even if using stock ubuntu in strz is not a big deal, that
cannot make it for a proper test in production (it is a 14.04).


Reply to this email directly or view it on GitHub
#24 (comment).

from flydra.

nzjrs avatar nzjrs commented on June 15, 2024

So far no crashes with master, as expected given we just drop the invalid data.

20150525_180912.mainbrain.h5 OK
20150706_180558.mainbrain.h5 OK
20150624_175622.mainbrain.h5 OK
20150703_173108.mainbrain.h5 OK

from flydra.

sdvillal avatar sdvillal commented on June 15, 2024

Same good thing in strz, all uuids have been smoothed at least once (yohooo!). Not counting the chickens before they're hatched, these fixes make much sense and could even explain why we were having so many different varying and confusing failures (by just understanding the differences of np.empty between machines + smoothing dynamics with senseless data).

Let's compare these files when all is finished. We should also assess how many trajectories we are removing and think about why these trajectories with holes are happening in mainbrain files. All things being good, we should just resmooth all mainbrains that had trajectories with holes, and probably we will be able to rescue the data for experiments like these examples from Katja and Matthew.

strz 20150703_173318.mainbrain.h5 0 OK
strz 20150706_180558.mainbrain.h5 0 OK
strz 20150703_174525.mainbrain.h5 0 OK
strz 20150706_180558.mainbrain.h5 1 OK
strz 20150703_173318.mainbrain.h5 1 OK
strz 20150703_173108.mainbrain.h5 0 OK
strz 20150702_175709.mainbrain.h5 0 OK
strz 20150624_175622.mainbrain.h5 1 OK
strz 20150624_175622.mainbrain.h5 0 OK
strz 20150525_180912.mainbrain.h5 1 OK
strz 20150525_180912.mainbrain.h5 0 OK
strz 20150703_174525.mainbrain.h5 1 OK
strz 20150703_173108.mainbrain.h5 1 OK
strz 20150702_175828.mainbrain.h5 0 OK
strz 20150702_175709.mainbrain.h5 1 OK
strz 20150702_174748.mainbrain.h5 0 OK
strz 20150703_173318.mainbrain.h5 2 OK
strz 20150525_180912.mainbrain.h5 2 OK
strz 20150702_175121.mainbrain.h5 0 OK
strz 20150624_175622.mainbrain.h5 2 OK
strz 20150703_173108.mainbrain.h5 2 OK
strz 20150706_180558.mainbrain.h5 2 OK
strz 20150703_174525.mainbrain.h5 2 OK
strz 20150630_175211.mainbrain.h5 0 OK

from flydra.

nzjrs avatar nzjrs commented on June 15, 2024

Yes, although I'm more concerned about writing garbage than inducing it during read because of np.empty

from flydra.

sdvillal avatar sdvillal commented on June 15, 2024

Agreed, that is most scaring

from flydra.

sdvillal avatar sdvillal commented on June 15, 2024

Quick summary of smoothing in strz with masterskip:

  • No errors, all runs produce same files (therefore no indeterminism)
  • There were very few obj_ids with holes in each file, and in all but two cases these happened at the beginning (obj_id < 200, and usually much earlier)

The special cases with respect to holes are:

  • 20150702_175709 did not have holes; actually smoothing never failed for me with that file, which is reassuring
  • 20150624_175622 has a hole in obj_id 500019

from flydra.

nzjrs avatar nzjrs commented on June 15, 2024

Cool.

20150624_175622 is not an exception, rather that there is a bug in 0.6.6
where obj ids start from 500000 rather than 1 (31118a5)

On 12/07/2015 1:11 PM, "Santi Villalba" [email protected] wrote:

Quick summary of smoothing in strz with masterskip:

  • No errors, all runs produce same files (therefore no indeterminism)
  • There were very few obj_ids with holes in each trajectory, and in
    all but two cases these happened at the beginning (obj_id < 200, and
    usually much earlier)

The special cases with respect to holes are:

  • 20150702_175709 did not have holes; actually smoothing never
    failed for me with that file, which is reassuring
  • 20150624_175622 has a hole in obj_id 500019


Reply to this email directly or view it on GitHub
#24 (comment).

from flydra.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.