byungsook / deep-fluids Goto Github PK
View Code? Open in Web Editor NEWDeep Fluids: A Generative Network for Parameterized Fluid Simulations
Home Page: http://www.byungsoo.me/project/deep-fluids
Deep Fluids: A Generative Network for Parameterized Fluid Simulations
Home Page: http://www.byungsoo.me/project/deep-fluids
Hi @byungsook. In the build_test_model
function, curl
is used:
self.G_ = curl(self.G_s)
while in build_model
method, jacobian3
is called:
_, self.G_ = jacobian3(self.G_s)
why is it so? I am looking to advect the model predictions in order to obtain files in .vdb
format. Thus, I plan to use jacobian3
call just like when training, and then use the advect
method from smoke3_vel_buo.py
script.
Hello,
As far as I can tell, when reading input data from file and enqueuing it in the FIFO (from where batches are dequeued), the code does it by sampling from a uniform distribution with replacing (data.py, line 128). Is there any reason why sampling without replacement was not used? I think sampling without replacement is the common practice, this way all the data is guaranteed to be 'seen' once in each epoch, leading to better training in the short run
Thank you!
edited: replaced 'with' with 'without', sorry for the confusion
Why the output shape of EncoderBE3 is z_num? I think it should be c_num by the paper. And there are no supervised parameters P in network AE and AE3.
Dear Kim,
I run the data set generation and it worked. But when I run python main.py
for training, I have a error message:
KeyError: "Registering two gradient with name 'BlockLSTM'! (Previous registration was in register /home/symphony/.conda/envs/tf/lib/python3.7/site-packages/tensorflow_core/python/framework/registry.py:66)
I remote to the Linux Red hat server with the Tensorflow 1.14.0. Could you please help me? Many thanks.
@byungsook could you please explain why we have different normalization for training and test data?
in data.py in line 332 we normalize "y" variable by:
for i, ri in enumerate(y_range):
y[i] = (y[i]-ri[0]) / (ri[1]-ri[0]) * 2 - 1
whereas in trainer.py in line 327 we have:
c1 = p1/float(y1-1)*2-1
c2 = p2/float(y2-1)*2-1
In plain language, in the first case we have: (p_current - p_min)/(p_max - p_min)*2 - 1
and in the second: p_current/(p_number_of_values - 1)*2 - 1
Hi @byungsook. I reimplemented 3d smoke experiments in PyTorch in my fork of your project https://github.com/vivanov879/deep-fluids. I also described the experiments and my analysis of the results in the paper https://github.com/vivanov879/deep-fluids/blob/master/PyTorch%20implementation%20of%20Deep%20Fluids.pdf. Feel free to share your thoughts on that and thanks for the awesome project.
I retain the network and I can produce a new output. However, I don't know how to show the result as you pictured it in your paper. Is there any app I should use such as blender ..etc?
Many thanks for your great work.
TL:DR: Could you please provide the commands for training a moving 3D smoke source?
Setup Info:
Tensorflow version: tensorflow-gpu, 1.12.0, Channel: pypi
Issue:
In accordance with run.bat, I generated the smoke3_mov200_f400 dataset using the following command:
python main.py --arch=ae --z_num=16 --max_epoch=10 --filter=64 --is_3d=True --lr_max=0.00005 --dataset=smoke3_mov200_f400 --res_x=48 --res_y=72 --res_z=48 --batch_size=4 --num_worker=1
Then trained the AE model using,
python main.py --arch=ae --z_num=16 --max_epoch=10 --filter=64 --is_3d=True --lr_max=0.00005 --dataset=smoke3_mov200_f400 --res_x=48 --res_y=72 --res_z=48 --batch_size=4 --num_worker=1
However, when I try generating latent code set with (trained model stored in log/smoke3_mov200_f400/1213_021319_ae_tag/ )
python main.py --is_train=False --load_path=log/smoke3_mov200_f400/1213_021319_ae_tag/ --arch=ae --z_num=16 --max_epoch=20 --is_3d=True --dataset=smoke_mov200_f400 --res_x=48 --res_y=72 --res_z=48 --test_batch_size=4
I run into the following error,
Traceback (most recent call last):
File "main.py", line 35, in <module>
main(config)
File "main.py", line 21, in main
trainer = Trainer3(config, batch_manager)
File "/code/deep-fluids/trainer.py", line 123, in __init__
self.sess = sv.prepare_or_wait_for_session(config=sess_config)
File "/root/miniconda3/envs/sandbox/lib/python3.6/site-packages/tensorflow/python/training/supervisor.py", line 730, in prepare_or_wait_for_session
init_feed_dict=self._init_feed_dict, init_fn=self._init_fn)
File "/root/miniconda3/envs/sandbox/lib/python3.6/site-packages/tensorflow/python/training/session_manager.py", line 288, in prepare_session
config=config)
File "/root/miniconda3/envs/sandbox/lib/python3.6/site-packages/tensorflow/python/training/session_manager.py", line 218, in _restore_checkpoint
saver.restore(sess, ckpt.model_checkpoint_path)
File "/root/miniconda3/envs/sandbox/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1582, in restore
err, "a mismatch between the current graph and the graph")
tensorflow.python.framework.errors_impl.InvalidArgumentError: Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
I modified the latent code command with --filter=64, but then I ran out of memory,
Traceback (most recent call last):
File "main.py", line 35, in <module>
main(config)
File "main.py", line 31, in main
trainer.test()
File "/code/deep-fluids/trainer.py", line 308, in test
self.test_ae()
File "/code/deep-fluids/trainer.py", line 502, in test_ae
c = self.sess.run(self.z, {self.x: x})
File "/root/miniconda3/envs/sandbox/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/root/miniconda3/envs/sandbox/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "/root/miniconda3/envs/sandbox/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
run_metadata)
File "/root/miniconda3/envs/sandbox/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[100,64,48,72,48] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node AE_1/enc/0_conv/Conv3D (defined at /root/miniconda3/envs/sandbox/lib/python3.6/site-packages/tensorflow/contrib/layers/python/layers/layers.py:1057) = Conv3D[T=DT_FLOAT, data_format="NDHWC", dilations=[1, 1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1, 1], _device="/job:localhost/replica:0/task:0/device:GPU:0"](_arg_Placeholder_3_0_0/_1283, AE/enc/0_conv/weights/read)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
Even after reducing batch size via --test_batch_size=1. I still get OOM.
Since the paper suggests that all the training was done on NVIDIA Titan X (12Gb) GPU and I have the Titan X Pascal (12Gb), I'm assuming that memory shouldn't be the problem.
Could you please provide the commands for training a moving 3D smoke source? run.bat only provided it for smoke3_mov, 2D. Or alternatively please help me identify what I'm doing wrong?
Thanks
I was wondering if there's any plans for releasing some pre-trained weights or checkpoints for a partially trained network?
I'm currently training the model with data generated by ./scene/smoke_pos_size.py, but it'd be nice if access to pre-trained weights was provided.
Sorry to ask a question that seems to have nothing with this project,it seems there are something wrong with my mantaflow,but I dont know how to fix it.When I run smoke_pos_size.py,I got a error:
NameError: name 'copyGridToArrayMAC' is not defined
,not only copyGridToArrayMAC,but every function about numpy convert doesn't work. Have you ever met such problem?
PS:I have reinstalled mantaflow,same error just happens.
I have some fire spread simulation data that's not generated by mantaflow and is in a NumPy array. I looked at the data generated by the scripts in this repo, but I don't know what the format is.
If I run manta scene/smoke_pos_size.py
, it generates a bunch of .npz
files with fields x
and y
. x
is of shape (128,96,2)
and y
has shape (,3)
. What do those shapes represent? I want to massage my data to fit so I can use it.
Hi @byungsook. Can you please clarify one point for me: is it possible to move from velocity field to density? While reading the Deep Fluids paper, I noticed that the main focus is the velocity field. However, I can't grasp how to move from a velocity field to density. For example, the "Algorithm 1, Simulation with the Latent Space Integration Network" is about velocity field generation. However, the rendered images shown in the paper Figures 8, 9, 10, 11 are all obtained via rendering. Is it possible to make Maya/Blender render with only velocity fields, or is there some explicit way of getting density out of velocity fields for rendering?
def jacobian(x, data_format='NHCW'):
if data_format == 'NCHW':
x = nchw_to_nhwc(x)
the data_format should be 'NHWC'
Thanks for your nice code. I have a problem in running manta on linux. I followed the instruction and installed manta. When I run the test case: ./manta ../scenes/simpleplume.py
. I have a message:
Version: mantaflow 0.12 64bit fp1 omp commit 15eaf4aa72da62e174df6c01f85ccd66fde20acc from Sep 25 2019, 15:17:48
QXcbConnection: Could not connect to display
Aborted (core dumped)
I changed the Cmake to: cmake .. -DGUI=OFF -DOPENMP=ON -DNUMPY=ON
but nothing changed. I remote to the Red Hat 4.4.7-23, Do you have any idea?
Hi @byungsook. Can you please provide intuition why is Algorithm 1 "Simulation with the Latent Space Integration Network
" would be faster than integrating the ODE that obey the inviscid momentum Du/Dt = −∇ p + g
and mass conservation ∇ · u = 0
equations? To me, it's like you have all the formulas and can integrate them. While a neural network introduces multiple layers which will take far more time.
Hi. How do you obtain the pretty video visualizations shown in the youtube video accompanying the paper? I tried plotting the velocity field images via matplotlib but I believe there is a better way to do this. Do you maybe export the mantaflow format to Blender somehow?
Hello,
Please could you tell me how to get the output after running test command. Which folder log ? It is taking too much time to train the model if we get the pre-trained model to see the results?
Hi, for the 3D bouyant smoke plume simulation, I ran manta with the command:
..\manta\build\Release\manta.exe .\scene\smoke3_vel_buo.py
which generated vector field arrays of size (32, 64, 112, 3). Then I've trained the model (only for 50 epochs so far) with the command
python main.py \
--is_3d=True \
--dataset=smoke3_vel5_buo3_f250 \
--res_x=112 \
--res_y=64 \
--res_z=32 \
--batch_size=5 \
--max_epoch=50 \
--num_worker=1 \
--log_step=100 \
--test_step=20 \
--load_path=log/smoke3_vel5_buo3_f250/0324_110822_de_tag/
and I generate predictions with this command:
python main.py \
--is_train=False \
--load_path=log/smoke3_vel5_buo3_f250/0324_110822_de_tag/ \
--test_batch_size=5 \
--is_3d=True \
--dataset=smoke3_vel5_buo3_f250 \
--res_x=112 \
--res_y=64 \
--res_z=32 \
--batch_size=4 \
--num_worker=1
But the output arrays, which I assume are the model's prediction, are all of size (32, 64, 3, 2). Are these outputs 3D vector fields? Do they correspond to some subset of the volume over which the manta simulation was run?
After successfully installing and running mantaflow, I try to run the data generation script by entering the command:
$ ..\manta\build\Release\manta.exe .\scene\smoke_pos_size.py
However, I receive the following error:
Error in copyGridToArrayMAC
sim: 0%| | 0/200 [00:00<?, ?it/s]
scenes: 0%| | 0/105 [00:00<?, ?it/s]
Traceback (most recent call last):
File "scene/smoke_pos_size.py", line 259, in
main()
File "scene/smoke_pos_size.py", line 199, in main
copyGridToArrayMAC(vel, v_)
RuntimeError: The dimensions of source grid (96, 128, 1) and target numpy array (3, 96, 128) do not match!
Error raised in /workdir/manta/source/plugin/numpyconvert.cpp:122
Script finished.
While installing, I deviated from original guidelines in readme.md at two points:
Would be grateful if someone can guide on how to resolve this issue.
Hi @byungsook ,
sorry, I am a bit struggling to reproduce the exact evaluation time you reported in the paper (0.958 ms for a batch of size 5 for the "smoke inflow" dataset). Get about two orders higher time ~ 200 ms.
Could you please share how did you measure the time? Did you call time.time() inside the test_() function in the trainer.py? Did you compute it for a single batch or you ran multiple batches and took an average thereof? Thank you a lot!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.