Hi Mark (@depristo),
Sorry, I meant to put this together a while ago - regarding #27 (comment) - but got a bit swamped with a research deadline. In any case, this is purely for intellectual curiosity and discussion. Regarding the first point, where differences in allocated CPUs might be the cause for the timing, that could be remedied by specifying a minimal CPU requirement, as noted here:
https://cloud.google.com/compute/docs/instances/specify-min-cpu-platform
So to control for the variability in the test, the two options are either: a) to set the --min-cpu-platform
setting to the maximum available ("Intel Sandy Bridge"
), or b) to keep requesting and canceling instances until the desired one is allocated on which all tests should be performed, thus satisfying consistency.
Just as a quick inspection, by looking at the CPU cycles utilization, I just ran a performance analysis of 0.4 and 0.5.1 on make_examples
- since it displayed the initial discrepancy - and there seem to be some slight increases in 0.5.1
, which might cumulatively affect things. In any case, below is the top of the call-graph of percent utilization by method (per version):
DV 0.4
# Samples: 186K of event 'cpu-clock'
# Event count (approx.): 46604750000
#
# Children, Self,Command ,Shared Object ,Symbol
50.33% , 8.80% ,python ,python2.7 ,[.] PyEval_EvalFrameEx
|
|--42.49%--PyEval_EvalFrameEx
| |
| |--30.79%--deepvariant_realigner_python_ssw_clifwrap::pyAligner::wrapAlign_as_align
| | |
| | --30.34%--StripedSmithWaterman::Aligner::Align
| | |
| | |--27.87%--ssw_align
| | | |
| | | |--14.65%--sw_sse2_word
| | | |
| | | |--8.32%--sw_sse2_byte
| | | |
| | | |--2.91%--banded_sw
| | | |
| | | --1.19%--__memcpy_sse2_unaligned
| | |
| | --1.36%--ssw_init
| | |
| | --0.89%--qP_byte
| |
| |--3.30%--deepvariant_realigner_python_debruijn__graph_clifwrap::wrapBuild_as_build
| | |
| | --3.04%--learning::genomics::deepvariant::DeBruijnGraph::Build
| | |
| | --2.73%--learning::genomics::deepvariant::DeBruijnGraph::DeBruijnGraph
| | |
| | --2.41%--learning::genomics::deepvariant::DeBruijnGraph::AddEdgesForRead
| | |
| | --1.75%--learning::genomics::deepvariant::DeBruijnGraph::AddEdge
| | |
| | --1.46%--learning::genomics::deepvariant::DeBruijnGraph::EnsureVertex
| | |
| | --0.50%--std::_Hashtable<tensorflow::StringPiece, std::pair<tensorflow::StringPiece const, void*>, std::allocator<std::pair<tensorflow::StringPiece const, void*> >, std::__detail::_Select1st, std::equal_to<tensorflow::StringPiece>, tensorflow::StringPieceHasher, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::_M_find_before_node
| |
| |--3.05%--google::protobuf::python::cmessage::GetAttr
| | |
| | |--1.07%--google::protobuf::python::cmessage::InternalGetScalar
| | |
| | --0.63%--google::protobuf::Descriptor::FindFieldByName
| |
| |--0.70%--deepvariant_python_allelecounter_clifwrap::pyAlleleCounter::wrapAdd_as_add
| |
| |--0.59%--google::protobuf::python::cmessage::DeepCopy
| | |
| | --0.58%--google::protobuf::python::cmessage::MergeFrom
| | |
| | --0.57%--google::protobuf::Message::MergeFrom
| | |
| | --0.54%--google::protobuf::internal::ReflectionOps::Merge
| |
| --0.59%--deepvariant_core_python_sam__reader_clifwrap::pySamIterable::wrapNext
|
|--1.84%--0x903b40
| |
| --1.71%--PyEval_EvalFrameEx
|
|--1.06%--0x905d60
| |
| --1.06%--PyEval_EvalFrameEx
|
|--0.76%--0x8fecc0
| PyEval_EvalFrameEx
|
|--0.59%--0x905200
| |
| --0.59%--PyEval_EvalFrameEx
|
--0.51%--0x9060a0
|
--0.51%--PyEval_EvalFrameEx
32.95% , 0.00% ,python ,[unknown] ,[.] 0x00000000009060a0
|
---0x9060a0
|
--32.10%--PyEval_EvalFrameEx
|
--30.79%--deepvariant_realigner_python_ssw_clifwrap::pyAligner::wrapAlign_as_align
|
--30.34%--StripedSmithWaterman::Aligner::Align
|
|--27.87%--ssw_align
| |
| |--14.65%--sw_sse2_word
| |
| |--8.32%--sw_sse2_byte
| |
| |--2.91%--banded_sw
| |
| --1.19%--__memcpy_sse2_unaligned
|
--1.36%--ssw_init
|
--0.89%--qP_byte
30.81% , 0.07% ,python ,libssw_cclib.so ,[.] deepvariant_realigner_python_ssw_clifwrap::pyAligner::wrapAlign_as_align
|
--30.74%--deepvariant_realigner_python_ssw_clifwrap::pyAligner::wrapAlign_as_align
|
--30.34%--StripedSmithWaterman::Aligner::Align
|
|--27.87%--ssw_align
| |
| |--14.65%--sw_sse2_word
| |
| |--8.32%--sw_sse2_byte
| |
| |--2.91%--banded_sw
| |
| --1.19%--__memcpy_sse2_unaligned
|
--1.36%--ssw_init
|
--0.89%--qP_byte
30.36% , 0.04% ,python ,libssw_cpp.so ,[.] StripedSmithWaterman::Aligner::Align
|
--30.32%--StripedSmithWaterman::Aligner::Align
|
|--27.87%--ssw_align
| |
| |--14.65%--sw_sse2_word
| |
| |--8.32%--sw_sse2_byte
| |
| |--2.91%--banded_sw
| |
| --1.19%--__memcpy_sse2_unaligned
|
--1.36%--ssw_init
|
--0.89%--qP_byte
27.87% , 0.05% ,python ,libssw.so ,[.] ssw_align
|
--27.82%--ssw_align
|
|--14.65%--sw_sse2_word
|
|--8.32%--sw_sse2_byte
|
|--2.91%--banded_sw
|
--1.19%--__memcpy_sse2_unaligned
14.65% , 14.62% ,python ,libssw.so ,[.] sw_sse2_word
|
--14.62%--0x9060a0
PyEval_EvalFrameEx
deepvariant_realigner_python_ssw_clifwrap::pyAligner::wrapAlign_as_align
StripedSmithWaterman::Aligner::Align
|
--14.62%--ssw_align
sw_sse2_word
8.32% , 8.31% ,python ,libssw.so ,[.] sw_sse2_byte
|
--8.31%--0x9060a0
PyEval_EvalFrameEx
deepvariant_realigner_python_ssw_clifwrap::pyAligner::wrapAlign_as_align
StripedSmithWaterman::Aligner::Align
|
--8.30%--ssw_align
sw_sse2_byte
4.51% , 0.00% ,python ,[unknown] ,[.] 0x00000000009063e0
|
---0x9063e0
|
--3.66%--PyEval_EvalFrameEx
|
--3.30%--deepvariant_realigner_python_debruijn__graph_clifwrap::wrapBuild_as_build
|
--3.04%--learning::genomics::deepvariant::DeBruijnGraph::Build
|
--2.73%--learning::genomics::deepvariant::DeBruijnGraph::DeBruijnGraph
|
--2.41%--learning::genomics::deepvariant::DeBruijnGraph::AddEdgesForRead
|
--1.75%--learning::genomics::deepvariant::DeBruijnGraph::AddEdge
|
--1.46%--learning::genomics::deepvariant::DeBruijnGraph::EnsureVertex
|
--0.50%--std::_Hashtable<tensorflow::StringPiece, std::pair<tensorflow::StringPiece const, void*>, std::allocator<std::pair<tensorflow::StringPiece const, void*> >, std::__detail::_Select1st, std::equal_to<tensorflow::StringPiece>, tensorflow::StringPieceHasher, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::_M_find_before_node
DV 0.5.1
# Samples: 152K of event 'cpu-clock'
# Event count (approx.): 38010500000
#
# Children, Self,Command ,Shared Object ,Symbol
51.45% , 9.13% ,python ,python2.7 ,[.] PyEval_EvalFrameEx
|
|--43.33%--PyEval_EvalFrameEx
| |
| |--31.12%--deepvariant_realigner_python_ssw_clifwrap::pyAligner::wrapAlign_as_align
| | |
| | --30.63%--StripedSmithWaterman::Aligner::Align
| | |
| | |--28.27%--ssw_align
| | | |
| | | |--14.88%--sw_sse2_word
| | | |
| | | |--8.45%--sw_sse2_byte
| | | |
| | | |--2.89%--banded_sw
| | | |
| | | --1.19%--__memcpy_sse2_unaligned
| | |
| | --1.38%--ssw_init
| | |
| | --0.92%--qP_byte
| |
| |--3.57%--deepvariant_realigner_python_debruijn__graph_clifwrap::wrapBuild_as_build
| | |
| | --3.32%--learning::genomics::deepvariant::DeBruijnGraph::Build
| | |
| | --3.02%--learning::genomics::deepvariant::DeBruijnGraph::DeBruijnGraph
| | |
| | --2.63%--learning::genomics::deepvariant::DeBruijnGraph::AddEdgesForRead
| | |
| | --1.89%--learning::genomics::deepvariant::DeBruijnGraph::AddEdge
| | |
| | --1.60%--learning::genomics::deepvariant::DeBruijnGraph::EnsureVertex
| | |
| | --0.56%--std::_Hashtable<tensorflow::StringPiece, std::pair<tensorflow::StringPiece const, void*>, std::allocator<std::pair<tensorflow::StringPiece const, void*> >, std::__detail::_Select1st, std::equal_to<tensorflow::StringPiece>, tensorflow::StringPieceHasher, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::_M_find_before_node
| |
| |--3.16%--google::protobuf::python::cmessage::GetAttr
| | |
| | |--1.12%--google::protobuf::python::cmessage::InternalGetScalar
| | |
| | --0.58%--google::protobuf::Descriptor::FindFieldByName
| |
| |--0.70%--deepvariant_python_allelecounter_clifwrap::pyAlleleCounter::wrapAdd_as_add
| |
| |--0.62%--deepvariant_core_python_sam__reader_clifwrap::pySamIterable::wrapNext
| |
| --0.57%--google::protobuf::python::cmessage::DeepCopy
| |
| --0.56%--google::protobuf::python::cmessage::MergeFrom
| |
| --0.56%--google::protobuf::Message::MergeFrom
| |
| --0.52%--google::protobuf::internal::ReflectionOps::Merge
|
|--1.92%--0x903b40
| |
| --1.78%--PyEval_EvalFrameEx
|
|--1.09%--0x905d60
| |
| --1.09%--PyEval_EvalFrameEx
|
|--0.78%--0x8fecc0
| PyEval_EvalFrameEx
|
|--0.62%--0x905200
| |
| --0.62%--PyEval_EvalFrameEx
|
--0.54%--0x9060a0
|
--0.54%--PyEval_EvalFrameEx
33.23% , 0.00% ,python ,[unknown] ,[.] 0x00000000009060a0
|
---0x9060a0
|
--32.46%--PyEval_EvalFrameEx
|
--31.12%--deepvariant_realigner_python_ssw_clifwrap::pyAligner::wrapAlign_as_align
|
--30.63%--StripedSmithWaterman::Aligner::Align
|
|--28.27%--ssw_align
| |
| |--14.88%--sw_sse2_word
| |
| |--8.45%--sw_sse2_byte
| |
| |--2.89%--banded_sw
| |
| --1.19%--__memcpy_sse2_unaligned
|
--1.38%--ssw_init
|
--0.92%--qP_byte
31.13% , 0.08% ,python ,libssw_cclib.so ,[.] deepvariant_realigner_python_ssw_clifwrap::pyAligner::wrapAlign_as_align
|
--31.05%--deepvariant_realigner_python_ssw_clifwrap::pyAligner::wrapAlign_as_align
|
--30.63%--StripedSmithWaterman::Aligner::Align
|
|--28.27%--ssw_align
| |
| |--14.88%--sw_sse2_word
| |
| |--8.45%--sw_sse2_byte
| |
| |--2.89%--banded_sw
| |
| --1.19%--__memcpy_sse2_unaligned
|
--1.38%--ssw_init
|
--0.92%--qP_byte
30.64% , 0.03% ,python ,libssw_cpp.so ,[.] StripedSmithWaterman::Aligner::Align
|
--30.61%--StripedSmithWaterman::Aligner::Align
|
|--28.27%--ssw_align
| |
| |--14.88%--sw_sse2_word
| |
| |--8.45%--sw_sse2_byte
| |
| |--2.89%--banded_sw
| |
| --1.19%--__memcpy_sse2_unaligned
|
--1.38%--ssw_init
|
--0.92%--qP_byte
28.27% , 0.04% ,python ,libssw.so ,[.] ssw_align
|
--28.23%--ssw_align
|
|--14.88%--sw_sse2_word
|
|--8.45%--sw_sse2_byte
|
|--2.89%--banded_sw
|
--1.19%--__memcpy_sse2_unaligned
14.88% , 14.86% ,python ,libssw.so ,[.] sw_sse2_word
|
--14.86%--0x9060a0
PyEval_EvalFrameEx
deepvariant_realigner_python_ssw_clifwrap::pyAligner::wrapAlign_as_align
StripedSmithWaterman::Aligner::Align
|
--14.86%--ssw_align
sw_sse2_word
8.45% , 8.43% ,python ,libssw.so ,[.] sw_sse2_byte
|
--8.43%--0x9060a0
PyEval_EvalFrameEx
deepvariant_realigner_python_ssw_clifwrap::pyAligner::wrapAlign_as_align
StripedSmithWaterman::Aligner::Align
|
--8.43%--ssw_align
sw_sse2_byte
4.72% , 0.00% ,python ,[unknown] ,[.] 0x00000000009063e0
|
---0x9063e0
|
--3.94%--PyEval_EvalFrameEx
|
--3.57%--deepvariant_realigner_python_debruijn__graph_clifwrap::wrapBuild_as_build
|
--3.32%--learning::genomics::deepvariant::DeBruijnGraph::Build
|
--3.02%--learning::genomics::deepvariant::DeBruijnGraph::DeBruijnGraph
|
--2.63%--learning::genomics::deepvariant::DeBruijnGraph::AddEdgesForRead
|
--1.89%--learning::genomics::deepvariant::DeBruijnGraph::AddEdge
|
--1.60%--learning::genomics::deepvariant::DeBruijnGraph::EnsureVertex
|
--0.56%--std::_Hashtable<tensorflow::StringPiece, std::pair<tensorflow::StringPiece const, void*>, std::allocator<std::pair<tensorflow::StringPiece const, void*> >, std::__detail::_Select1st, std::equal_to<tensorflow::StringPiece>, tensorflow::StringPieceHasher, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::_M_find_before_node
To do this properly would require that the tests be performed on different datasets, and different CPUs on the same Cloud environment - with different distributed scenarios - which would be cost-prohibitive for me.
Hope it helps and have a great weekend!
Paul