Comments (9)
Thanks for the info! I'm a tad sick so I'm taking the rest of the day (sorry!), but I can get back to this on Monday.
from ascent.
Thanks for confirming this behavior, we will work to resolve these limits for binning.
from ascent.
Hey @BenWibking sorry for the delay.
A reproducer would be great and I can do my best to try to help you figure out this issue!
from ascent.
Thanks. I've put a reproducer here: parthenon-hpc-lab/athenapk#49. Let me know what you find.
from ascent.
@nicolemarsaglia I've rebuilt Ascent + TPLs with debugging info and I get a more informative backtrace. The segmentation fault happens here:
(cuda-gdb) bt
#0 ascent::runtime::expressions::binning (dataset=..., bin_axes=..., reduction_var="Density", reduction_op="avg", empty_bin_val=0,
component="") at /projects/cvz/bwibking/ascent_debug/ascent/src/libs/ascent/runtimes/expressions/ascent_blueprint_architect.cpp:1649
#1 0x00007fb75d836285 in ascent::runtime::expressions::binning_interface (reduction_var="Density", reduction_op="avg",
n_empty_bin_val=..., n_component=..., n_axis_list=..., dataset=..., n_binning=..., n_output_axes=...)
at /projects/cvz/bwibking/ascent_debug/ascent/src/libs/ascent/runtimes/expressions/ascent_expression_filters.cpp:3158
#2 0x00007fb75d836c1a in ascent::runtime::expressions::Binning::execute (this=0xc40cdc0)
at /projects/cvz/bwibking/ascent_debug/ascent/src/libs/ascent/runtimes/expressions/ascent_expression_filters.cpp:3197
#3 0x00007fb75d0b9ba9 in flow::Workspace::execute (this=0x7ffd1f630098)
at /projects/cvz/bwibking/ascent_debug/ascent/src/libs/flow/flow_workspace.cpp:303
#4 0x00007fb75d7423a7 in ascent::runtime::expressions::ExpressionEval::evaluate (this=0x7ffd1f630030,
expr="binning('Density','avg', [axis('x',[-0.5,0.5]), axis('y', [-0.5,0.5]), axis('z', num_bins=64)])",
expr_name="avg_density_profile")
at /projects/cvz/bwibking/ascent_debug/ascent/src/libs/ascent/runtimes/ascent_expression_eval.cpp:1534
#5 0x00007fb75d927c35 in ascent::runtime::filters::BasicQuery::execute (this=0xa08c460)
at /projects/cvz/bwibking/ascent_debug/ascent/src/libs/ascent/runtimes/flow_filters/ascent_runtime_query_filters.cpp:127
#6 0x00007fb75d0b9ba9 in flow::Workspace::execute (this=0x7df7460)
at /projects/cvz/bwibking/ascent_debug/ascent/src/libs/flow/flow_workspace.cpp:303
#7 0x00007fb75d6fba4f in ascent::AscentRuntime::Execute (this=0x7df6ec0, actions=...)
at /projects/cvz/bwibking/ascent_debug/ascent/src/libs/ascent/runtimes/ascent_main_runtime.cpp:1831
#8 0x00007fb75d6e1915 in ascent::Ascent::execute (this=0x7ffd1f6318d0, actions=...)
at /projects/cvz/bwibking/ascent_debug/ascent/src/libs/ascent/ascent.cpp:410
#9 0x000000000085ba78 in parthenon::AscentOutput::WriteOutputFile(parthenon::Mesh*, parthenon::ParameterInput*, parthenon::SimTime*, parthenon::SignalHandler::OutputSignal) ()
#10 0x0000000000777e7e in parthenon::Outputs::MakeOutputs(parthenon::Mesh*, parthenon::ParameterInput*, parthenon::SimTime*, parthenon::SignalHandler::OutputSignal) ()
#11 0x00000000006cfcb1 in parthenon::EvolutionDriver::Execute() ()
#12 0x00000000004419df in main ()
(cuda-gdb) list
1644 //#endif
1645 for(int i = 0; i < homes_size; ++i)
1646 {
1647 if(homes[i] != -1)
1648 {
1649 update_bin(bins, homes[i], values[i], reduction_op);
1650 }
1651 }
1652 }
1653 }
from ascent.
Here's info args
:
(cuda-gdb) info args
dataset = @0xc2bec10: {m_parent = 0x0, m_schema = 0xc2bfb20, m_owns_schema = true,
m_children = std::vector of length 44, capacity 64 = {0xc309760, 0xc2bd670, 0xc2bd710, 0xc30a490, 0xc30a270, 0xc2bf2b0, 0xc2bf430,
0xc2c1530, 0xc30feb0, 0xc2c4f60, 0xc2c5f20, 0xc2c2130, 0xc2c5ca0, 0xc2c91c0, 0xc2c9d10, 0xc2c8f90, 0xc2c25e0, 0xc2cbba0, 0xc2cbd10,
0xc2c4bb0, 0xc2ccc90, 0xc2c2c10, 0xc2c7590, 0xc2cabe0, 0xc2c7380, 0xc2d51b0, 0xc2d4000, 0xc2d7020, 0xc355040, 0xc2d8e00, 0xa10ea60,
0xc2dab80, 0xc2d4e00, 0xc2da950, 0xb841190, 0xc2db6e0, 0xc2de550, 0xc2d8930, 0xc2d5ea0, 0xc2e17a0, 0xc2e25d0, 0xc3d98b0, 0xc2e42f0,
0xc2e5290}, m_data = 0x0, m_data_size = 0, m_alloced = false, m_mmaped = false, m_mmap = 0x0, m_allocator_id = 0}
bin_axes = @0x7ffd1f62ea80: {m_parent = 0x0, m_schema = 0xc413880, m_owns_schema = true,
m_children = std::vector of length 3, capacity 4 = {0xc40fb70, 0xc40f570, 0xc40fa70}, m_data = 0x0, m_data_size = 0,
m_alloced = false, m_mmaped = false, m_mmap = 0x0, m_allocator_id = 0}
reduction_var = "Density"
reduction_op = "avg"
empty_bin_val = 0
component = ""
And info locals
:
(cuda-gdb) info locals
i = 26
values = {m_data = 0x7fb6ee6f9280, m_dtype = {m_id = 12, m_num_ele = 1728, m_offset = 0, m_stride = 8, m_ele_bytes = 8,
m_endianness = 0}}
comp_path = ""
values_path = "fields/Density/values"
dom = @0xc309760: {m_parent = 0xc2bec10, m_schema = 0xc3096f0, m_owns_schema = false,
m_children = std::vector of length 4, capacity 4 = {0xc309910, 0xc30a720, 0xc30ada0, 0xc30c280}, m_data = 0x0, m_data_size = 0,
m_alloced = false, m_mmaped = false, m_mmap = 0x0, m_allocator_id = 0}
n_homes = {m_parent = 0x0, m_schema = 0xc413bc0, m_owns_schema = true, m_children = std::vector of length 0, capacity 0,
m_data = 0xc4141f0, m_data_size = 6912, m_alloced = true, m_mmaped = false, m_mmap = 0x0, m_allocator_id = 0}
homes = 0xc4141f0
homes_size = 1728
dom_index = 0
var_names = std::vector of length 4, capacity 6 = {"x", "y", "z", "Density"}
topo_and_assoc = @0x7ffd1f62d800: {m_parent = 0x0, m_schema = 0xc411040, m_owns_schema = true,
m_children = std::vector of length 2, capacity 2 = {0xc2cc2e0, 0xc410de0}, m_data = 0x0, m_data_size = 0, m_alloced = false,
m_mmaped = false, m_mmap = 0x0, m_allocator_id = 0}
topo_name = "topo"
assoc_str = "element"
bounds = @0x7ffd1f62d760: {m_parent = 0x0, m_schema = 0xc4112a0, m_owns_schema = true,
m_children = std::vector of length 2, capacity 2 = {0xc410fe0, 0xc413820}, m_data = 0x0, m_data_size = 0, m_alloced = false,
m_mmaped = false, m_mmap = 0x0, m_allocator_id = 0}
min_coords = 0xc2c4370
max_coords = 0xc2c0b20
axes = {{"x", "i", "dx"}, {"y", "j", "dy"}, {"z", "k", "dz"}}
num_axes = 3
num_bins = 64
num_bin_vars = 2
bins_size = 128
bins = 0x6e99a90
mpi_comm = 0x7ffd1f62e240
global_bins = 0x10000000c2bec10
res = {m_parent = 0x0, m_schema = 0xc413880, m_owns_schema = false,
m_children = std::vector of length -17553172076876, capacity -17553146376764 = {0x458b48c389481aeb, 0xf528ede8c78948e8,
0xe8c78948d88948ff, 0xf85d8b48fff59272, 0x4853e5894855c3c9, 0x48e87d894818ec83, 0x48e8458b48e07589, 0x48fff4b0ffe8c789,
0x53e8c78948e8458b, 0x48e0558b48fff4a6, 0x8948d68948e8458b, 0x1aebfff58e20e8c7, 0x48e8458b48c38948, 0x48fff5288fe8c789,
0x9214e8c78948d889, 0xc3c9f85d8b48fff5, 0xec834853e5894855, 0x758948e87d894818, 0xc78948e8458b48e0, 0x458b48fff4b0a1e8,
0xf4a5f5e8c78948e8, 0x458b48e0558b48ff, 0xe8c78948d68948e8, 0x89481aebfff49e32, 0xc78948e8458b48c3, 0xd88948fff52831e8,
0xfff591b6e8c78948, 0x4855c3c9f85d8b48, 0x4848ec834853e589, 0x48b0758948b87d89, 0x43e8c78948b8458b, 0x48b8458b48fff4b0,
0x48fff4a597e8c789, 0x1be8c78948ef458d, 0x48ef558d48fff592, 0x48c0458d48b04d8b, 0x4d94e8c78948ce89, 0x8b48c0558d48fff5,
0xc78948d68948b845, 0x458d48fff49db1e8, 0xf4d5f5e8c78948c0, 0xc78948ef458d48ff, 0x483cebfff51dd9e8, 0x8948c0458d48c389,
0x3ebfff4d5d8e8c7, 0x48ef458d48c38948, 0xebfff51db7e8c789, 0xb8458b48c3894803, 0xfff52776e8c78948, 0xfbe8c78948d88948,
0xc9f85d8b48fff590, 0x8348e589485590c3, 0x8b48f87d894810ec, 0x5a9ce8c78948f845, 0x8948f8458b48fff5, 0xc990fff52740e8c7,
0x8348e589485590c3, 0x8b48f87d894810ec, 0x5a74e8c78948f845, 0x485590c3c990fff5, 0xec8348535441e589, 0x758948b87d894840,
0xc78948b8458b48b0, 0xef45c6fff483d1e8, 0xc78948b0458b4800, 0x458948fff4fe81e8, 0x5e7501d87d8348d8, 0xe8c78948b8458b48, 0x1ef45c6fff4ab0a, 0xe8c78948b0458b48, 0x48c38948fff4fb4a, 0xdbe8c78948b8458b, 0x8948de8948fff4bf, 0x8b48fff561a0e8c7, 0x94e4e8c78948b045, 0x458b48c38948fff5, 0xf48ae5e8c78948b8, 0xe8c78948de8948ff, 0x83482cebfff5196a, 0x458b48127502d87d, 0xf4a845e8c78948b8, 0x4813eb01ef45c6ff, 0x48b8458b48b0558b, 0x68bce8c78948d689, 0x840f00ef7d80fff5, 0xb8458b48000000af, 0xfff4d756e8c78948, 0xb0458b48d0458948, 0xfff4f0a6e8c78948, 0xe045c748c8458948, 0x8b4856eb00000000, 0x8948c8458b48e055, 0xf4cf75e8c78948d6, 0x40bf208b4cff, 0x8948fff51148e800, 0xe8df8948e6894cc3, 0xc05d8948fff5186a, 0xb8558b48c0458b48, 0xc0558d4838508948, 0x48d68948d0458b48, 0x48fff556b7e8c789, 0xc8458b4801e04583, 0xfff571e6e8c78948, 0x84c0920fe0453948, 0xc4894916eb9375c0, 0xfff50a0ee8df8948, 0x33e8c78948e0894c, 0x40c4834890fff58f, 0x485590c35d5c415b, 0x894810ec8348e589, 0x8b48f0758948f87d, 0x824ce8c78948f845, 0x8948f8458b48fff4, 0x8b48fff48530e8c7, 0x8948f0558b48f845, 0xf5518de8c78948d6, 0xe5894855c3c990ff, 0xf87d894810ec8348, 0xf8458b48f0758948, 0xfff4820ee8c78948, 0xe8c78948f0458b48, 0x1f88348fff4fcc2, 0x480e74c084c0940f, 0x4be8c78948f8458b, 0x458b4823ebfff4a9, 0xf4fc9de8c78948f0, 0xc0940f02f88348ff, 0xf8458b480c74c084, 0xfff4a6c6e8c78948, 0xf0558b48f8458b48, 0x43e8c78948d68948, 0x4855c3c990fff567, 0x894810ec8348e589, 0x8b48f0758948f87d, 0x8194e8c78948f845, 0x8b48f0558b48fff4, 0xc78948d68948f845, 0xc3c990fff4fe71e8, 0x10ec8348e5894855, 0xf0758948f87d8948, 0xf0453b48f8458b48, 0x8b48f0558b481374, 0xc78948d68948f845, 0x458b48fff4d771e8, 0xe589485590c3c9f8, 0xf87d894810ec8348, 0xf0558b48f0758948, 0x48d68948f8458b48, 0x48fff589d7e8c789, 0x485590c3c9f8458b, 0x894810ec8348e589, 0x8b48f0758948f87d, 0x8948f8458b48f055, 0xf49a1de8c78948d6, 0x90c3c9f8458b48ff, 0x40ec8348e5894855, 0xf845c748c87d8948, 0xc8458b4800000000, 0xfff4fb96e8c78948, 0xf07d8348f0458948, 0x2f07d8348077401, 0x8948c8458b487275, 0x8948fff4ee58e8c7, 0x8948e8458b48e845, 0x8948fff486d8e8c7, 0xd8458d4827ebd845, 0xfff54356e8c78948, 0x95e8c78948008b48, 0x48f8450148ffffff, 0x2be8c78948d8458d, 0x48e8458b48fff519, 0x48fff5754fe8c789, 0x48e0558d48e04589, 0x8948d68948d8458d, 0xc084fff55598e8c7, 0xf07d834817ebb275, 0x48c8458b48107400, 0x48fff573dfe8c789, 0xc9f8458b48f84589, 0x8348e589485590c3, 0xc748c87d894840ec, 0x8b4800000000f845, 0xfad4e8c78948c845, 0x8348f0458948fff4, 0x7d8348077401f07d, 0xc8458b48727502f0, 0xfff4ed96e8c78948, 0xe8458b48e8458948, 0xfff48616e8c78948, 0x8d4827ebd8458948, 0x4294e8c78948d845, 0xc78948008b48fff5, 0x450148ffffff95e8, 0xc78948d8458d48f8, 0x458b48fff51869e8, 0xf5748de8c78948e8, 0x558d48e0458948ff, 0xd68948d8458d48e0, 0xfff554d6e8c78948, 0x834817ebb275c084...}, m_data = 0x7ffd1f630130, m_data_size = 205600896, m_alloced = 48, m_mmaped = 234, m_mmap = 0x7fb754b039c7 <conduit::Node::init_defaults()+93>, m_allocator_id = 140725130029616}
res_bins = 0x26bbb40 <ompi_mpi_comm_world>
from ascent.
I've uploaded the core files here: https://cloudstor.aarnet.edu.au/plus/s/hTgYZQWYDYTPZn9
from ascent.
This is a very strange bug that I cannot reproduce on either Frontier or Summit. Somehow it appears to only happen on A100s.
from ascent.
Ok, I've traced the issue to the fact that the binning operation runs on the CPU and it attempts to dereference a device pointer, since our code sends the device-resident data to Ascent via zero-copy. This works on systems with unified memory, such as Summit and Frontier, but fails on systems without it.
from ascent.
Related Issues (20)
- annotations conflicts with bg_color HOT 4
- Automaticly adjusting the camera HOT 3
- 3slice's z_offset is not set? HOT 1
- Multiple actions files HOT 4
- Render Transparent Background
- vtkm2.1rc2 build notes HOT 1
- error: identifier "CudaExec" is undefined HOT 1
- trigger with no trigger actions file breaking HOT 1
- Build fails with `targets already defined` HOT 8
- Build fails without Umpire in `ascent_memory_manager.cpp`
- Build with `ENABLE_FORTRAN=ON` fails in tests/ascent/fortran/CMakeLists.txt HOT 2
- zlib linking issue on macOS: dyld[28998]: Library not loaded: libz.1.dylib HOT 2
- Performance regression when *linking* to Ascent HOT 2
- error with in-memory conduit extract test on macos HOT 2
- Unique domain IDs HOT 1
- python 3.10 vs cmake python module logic HOT 2
- blueprint expects connectivity to be ints, 64 bit connectivity support HOT 11
- python deprecation issue with flow helpers HOT 1
- add user options to control axis tick marks
- vtk-m 2.1 build failure with CUDA in Cray environment HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ascent.