cambridge-iccs / ftorch Goto Github PK
View Code? Open in Web Editor NEWA library for directly calling PyTorch ML models from Fortran.
Home Page: https://cambridge-iccs.github.io/FTorch/
License: MIT License
A library for directly calling PyTorch ML models from Fortran.
Home Page: https://cambridge-iccs.github.io/FTorch/
License: MIT License
PyTorch installations can sometimes lead to a situation where CUDA is needed by libtorch but is not available on the host system (perhaps because it doesn't have a GPU). A solution is to install pytorch from a CPU only repository:
pip install torch --index-url https://download.pytorch.org/whl/cpu
It might be argued that this isn't our problem?
We should look into creating longer form documentation using readthedocs or sphinx or rst FORD.
This should be hosted elsewhere and cover most of what is already there, perhaps more detail on the examples, and full API documentation.
Do we want to host on rtd?
Recently, I want to use a AI model from python in General Circulation Model (GCM) which depends on the compiler version of 2017, but this ftorch module should be compiled by the intel compiler version of 2021. I wonder how to solve this problem and whether ftorch can be compiled in low-version intel compiler like 2017 or not?
The description of pt2ts.py in the utils README could be interpreted that the model will be saved in the same directory as the pt2ts.py file, but it is instead saved in the directory that pt2ts.py is called from.
Two potential solutions are:
I would lean towards the latter, so the location is more consistent.
For example, pt2ts.py might be called from the model directory, a build directory, or the directory up containing (such as from a script like run_benchmarks.sh), each of which currently would save the model in a different location.
Currently the outputs from the python ResNet inference program are not the same as the outputs from the Fortran ResNet inference program in the example.
Really we should make sure they, at minimum, produce the same output, and ideally make it something meaningful (e.g. max arg and location, perhaps provide a real image).
This is not an issue I've encountered, but having followed the FTorch build instructions, the version of libtorch/pytorch installed may mean that FTorch is incompatible with the model saved in the examples, as this pip installs torch in a (new) virtual environment.
This would only lead to errors if breaking changes were made to the TorchScript format between the versions, and in many cases the same pip-installed torch would be used anyway.
At present the wrapper codes can only accept a single output tensor from a TorchScript model.
However, it is perfectly possible to return multiple output tensors from a PyTorch Model.
In this instance the python code returns them as a Tuple.
Is it possible to return multiple output tensors when using the C++ API (and by extension our wrapper scripts)?
The 'simple' answer is to add a concatenation layer onto the end of the pytorch model to return a single tensor and then unpack later. However, this is hacky, requires model alteration to use our code, and restricts us to returning only a single type(?).
The forward method in the API returns an IValue
which is a union over various types: https://pytorch.org/cppdocs/api/structc10_1_1_i_value.html
There has been some discussion on this topic:
It looks like this is probably possible, but may need a bit of rummaging around with C/C++ types and moving pointers.
We should separate the examples from the key source files in the library.
Perhaps into an /examples
directory?
At some point it would be nice to make an example in the examples repo that is as simple as possible (i.e. a 1-D input tensor etc.) so that the user can focus on the coupling procedure rather than any of the deeper challenges that may arise in later exercises (transposes/memory layouts, precision, etc.)
Candidates might be a 'Net' that simply multiplies by two etc.
I will try and do this at some point.
We should add a hook to generate the Fortran source for FTorch from fypp.
This will mean that the full source is in the repo and can be used directly and in documentation.
For developers this would mean they edit the fypp file and then push that, with F90 being generated by the hook.
Black has updated and there are some changes to formatting rules that need incorporating to satisfy CI.
It would be good to provide a citation.cff file in an agreed format.
This might need amending as we present and write about it.
At present the library only allows us to pass a single tensor to a model and receive a single output tensor.
This needs updating so that we can pass multiple input tensors.
In the Fortran:
In the c++:
.push_back
) on to the inputs to the C function.We need to decide if it is best to:
torch_tensor_to_array()
If the latter then interface needs expanding beyond the current 'float' and 'double'.
I can see an argument for the former, however, as it
Discussion appreciated, then we need to reach a decision and triage:
As discussed in #78, there are (at least) two forms of optimisation that would be relatively straight forward to facilitate in some capacity, but require more consideration/are unlikely to be the default options (which is why they are not included in the referenced PR):
optimize_for_inference
(which is currently broken, unfortunately)FTorch with/without gradients and/or frozen models
sections), freezing the model can make more modest, but not insignificant, improvements (in most cases)
scripted_model.save(filename)
with frozen_model = torch.jit.freeze(scripted_model)
, and then frozen_model.save(filename)
in pt2ts.pyoptimize_for_inference
can gives e.g. AttributeError: 'RecursiveScriptModule' object has no attribute 'training'
pt2ts.py
as part of a workflow involving FTorch
trace_to_torchscript
currently uses model freezing. It would be preferable to have a shared setting and/or behaviour, unless there is a clear reason to use freezing in only one of the functionstrace_to_torchscript
compared with script_to_torchscript
may also be useful, as currently there is no clear motivation not to use the "default" script_to_torchscript
FTorch with InferenceMode and NoGradMode
sections), benefits were less clear, but in general it is expected to be at least as fast
torch::AutoGradMode enable_grad(requires_grad);
with c10::InferenceMode guard(requires_grad);
in all ctorch.cpp functions, but ideally both options would be presented to usersNoGradMode
, so cannot be used in all casesThere are some suggestions that libtorch doesn't work with intel C compilers.
I thought we'd tested this, but perhaps we need to review and remove as an option from the README if it doesn't work!
It is not intuitive that when running for GPU the input should be put on CUDA, but the output should NOT (thanks @ElliottKasoar for pointing this out).
We should clearly document this somewhere, and provide an example of running on GPU.
Copied from an email chain with a user:
I hope it's alright that I'm reaching out. I've recently set up a framework to incorporate physics-informed, neural net-based, user material subroutines in Abaqus. The framework is quite simple and doesn't take full advantage of the neural net setup. I would be very interested in coupling the PyTorch models directly to Fortran, and so, I'd be very interested in exploring FTorch.
I had a quick question - I've been trying to install the library, but the CMake configuration fails to identify the Fortran compiler. I've tried:
set(CMAKE_Fortran_COMPILER "/MinGW/bin/gfortran.exe") added to the CMakeLists.txt
cmake .. -DCMAKE_Fortran_COMPILER="/MinGW/bin/gfortran.exe" in cmd.
With both, I get the following error:
It fails with the following output:
Change Dir: C:/Users/USER/FTorch/src/build/CMakeFiles/CMakeScratch/TryCompile-mb6xhj
Run Build Command(s):devenv.com CMAKE_TRY_COMPILE.sln /build Debug /project cmTC_03248 && The system cannot find the file specified
Generator: execution of make failed. Make command was: devenv.com CMAKE_TRY_COMPILE.sln /build Debug /project cmTC_03248 &&
Would you be able to help me work this out? What am I missing here?
Thank you for taking the time, I truly appreciate it.
Comments in ftorch.f90 describing the device to specify when running on GPUs suggest torch_kGPU, as opposed to the correct torch_kCUDA.
Based on #55 it is evidently not clear how to modularise the code for repeated calls in a sensible fashion.
We should adapt one of the benchmark examples to illustrate breaking code into init, main, and finalise functions.
Related to #56 The example code in the README is now incorrect as it has not incorporated the updates to use the layout
argument in torch_tensor_from_blob()
We should either:
t_t_from_array()
layout
argumentMy preference is for the first (see discussion on #56).
However, this information should also be taken out of the README and placed in the longer form API docs now, perhaps as part of #53
We have discussed that fortran-pytorch-lib might not be the nicest most catchy name for the project.
We should settle on a name before first release.
Currently we build a file called ftorch, could we call the project FTorch perhaps?
Disadvantages:
We should provide an example of using multiple input tensors to a model.
Is there a pre-existing trained net we can deploy?
Davenet?
Do this after restructuring examples for #12
As described in #72 we should perhaps include a note warning about Memory pitfalls for calling a net twice with different inputs.
Also a test?
Some of the package names are presented differently across the code, e.g.,
It would be better to make these consistent.
This may be an issue with the implementation separate to FTorch, but from very large tests on GPUs (~100,000 iterations), I sometimes start to run into CUDA memory issues for the cgdrag benchmark example.
This example calls torch_tensor_delete
after every iteration, but perhaps this is not cleaning up data on the GPU?
Full error (after running ./benchmarker_cgdrag_torch ../cgdrag_model saved_cgdrag_model_gpu.pt 100000 10 --use_cuda
for ~32000 iterations):
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacty of 79.15 GiB of which 18.00 MiB is free. Including non-PyTorch memory, this process has 79.12 GiB memory in use. Of the allocated memory 78.63 GiB is allocated by PyTorch, and 5.47 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Unit testing is absolutely required for this code. Ideally this could be done using Fortran by invoking the C API.
It is possible to use styling with FORD to customise appearance.
Whilst the current online docs are functrional, a low-priority might make them more visually appealing and add better formatting of tables and notes etc.
We should add some CMake flags to improve the build experience this should include
Once #53 is merged it would be good to prune the readme to make it friendlier to those who are just discovering the project.
Much of the more detailed information (e.g. Windows build instructions) can just point users to the online docs.
Although just a wrapper for the Fortran interface some doxygen style comments on the C API header would be useful.
It would be nice if we could wrap the fortran tensor generation to transform torch_tensor_from_blob
into something like torch_tensor_from_array
'F'
or 'C'
passed through as string -> c_char
Should we use something like sonarqube to do static analysis on the fortran source code?
Run the testing suite on a GPU system. Requires saving the torch script a bit differently? @jatkinson1000
After make install
the installed library does not automatically pass the Fortran module directory to CMake projects that use the library in the approved manned. What should work is:
find_package(FTorch)
target_link_libraries(foo PRIVATE FTorch:ftorch)
This doesn't work: the compile fails, unable to use ftorch
, and the Fortran module directory does not appear on the compiler command line.
When running make
following the installation instructions in README.md, an error is raised:
libtorch/include/ATen/ATen.h:4:2: error: #error C++17 or later compatible compiler is required to use ATen.
This error was reproduced using both the CPU-only nightly (accessed on 19/09/23) and CPU-only (stable) 2.0.1 libtorch binaries.
This appears to be due to pytorch now requiring support for C++17:
A compiler that fully supports C++17, such as clang or gcc (especially for aarch64, gcc 9.4.0 or newer is required)
Versions used:
When using jit.script
, it appears that .eval
and no_grad
context is not saved.
This is likely to be partially responsible for worse performance than expected during inference.
My current solution has been to add:
model->eval();
torch::NoGradGuard no_grad;
in torch_jit_module_forward
, as adding these in torch_jit_load
did not appear to change the behaviour.
As it is also possible that these functions may be used in training, adding these settings conditionally would also be preferable.
We should look at creating, as a first step, integration tests.
These should be adapted from the benchmarks repo.
Possibly as a set of Fortran programs run via a shell script?
This is not a high priority at the moment, but as discussed in #24, it would be good to improve the user friendliness of the stride functionality, through a F
or C
layout style and shape/size inference.
This should also be documented in (and potentially help clarify) the discussion about when to transpose arrays, in both the ResNet example and (soon) the long-form documentation.
We could use overloading in Fortran 90 to provide a simpler interface.
Here's an example of overloading based on arity: https://gist.github.com/dorchard/3cc13fe75d6d109cb75ec11d41ddc104
(Note something similar can work for overloading on input type too)
This is detritus for now, we can add it back in at a later date if we decide to support conda.
This should include general user documentation rendered to readthedocs via Sphinx. There is also an option to include the API docs within this site (doxygen).
When I compile with a conda installation of pytorch
, I don't get errors during compiling, but when I test the library I get MKL errors and the numbers produced seem to be incorrect. I have tested against compiling the library with libtorch downloaded from source and that appears to work fine.
Here are the modules I'm using
Currently Loaded Modules:
1) math (S) 3) cmake/3.24.2 5) intel-oneapi-compilers-cees-beta/2021.4.0-2xfl6 7) intel-oneapi-mkl-cees-beta/2021.4.0-k4r3o
2) devel (S) 4) gcc/10.3.0 6) intel-cees-beta/2021.4.0
and I've exported the conda environment here: mima-torch-env.txt. I'm using pytorch v2.
The path to TorchConfig.cmake
is in /home/groups/aditis2/lauraman/miniconda3/envs/mima-torch/lib/python3.9/site-packages/torch/share/cmake/Torch
so to build the library I use:
$ cmake .. -DTorch_DIR=/home/groups/aditis2/lauraman/miniconda3/envs/mima-torch/lib/python3.9/site-packages/torch/share/cmake/Torch -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=$GROUP_HOME/lauraman/test-fortran-pytorch-lib
-- The C compiler identification is Intel 2021.4.0.20210910
-- The CXX compiler identification is Intel 2021.4.0.20210910
-- The Fortran compiler identification is Intel 2021.4.0.20210910
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /home/groups/s-ees/share/cees/spack_cees/spack/opt/spack/linux-centos7-x86_64_v3/gcc-4.8.5/intel-oneapi-compilers-2021.4.0-2xfl6e7kdhxegq5msukpqoxdmftsu2w6/compiler/2021.4.0/linux/bin/intel64/icc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /home/groups/s-ees/share/cees/spack_cees/spack/opt/spack/linux-centos7-x86_64_v3/gcc-4.8.5/intel-oneapi-compilers-2021.4.0-2xfl6e7kdhxegq5msukpqoxdmftsu2w6/compiler/2021.4.0/linux/bin/intel64/icpc - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Detecting Fortran compiler ABI info
-- Detecting Fortran compiler ABI info - done
-- Check for working Fortran compiler: /home/groups/s-ees/share/cees/spack_cees/spack/opt/spack/linux-centos7-x86_64_v3/gcc-4.8.5/intel-oneapi-compilers-2021.4.0-2xfl6e7kdhxegq5msukpqoxdmftsu2w6/compiler/2021.4.0/linux/bin/intel64/ifort - skipped
-- Detecting Fortran/C Interface
-- Detecting Fortran/C Interface - Found GLOBAL and MODULE mangling
-- Verifying Fortran/CXX Compiler Compatibility
-- Verifying Fortran/CXX Compiler Compatibility - Success
-- MKL_ARCH: None, set to ` intel64` by default
-- MKL_LINK: None, set to ` dynamic` by default
-- MKL_INTERFACE_FULL: None, set to ` intel_ilp64` by default
-- MKL_THREADING: None, set to ` intel_thread` by default
-- MKL_MPI: None, set to ` intelmpi` by default
CMake Warning at /home/groups/aditis2/lauraman/miniconda3/envs/mima-torch/lib/python3.9/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):
static library kineto_LIBRARY-NOTFOUND not found.
Call Stack (most recent call first):
/home/groups/aditis2/lauraman/miniconda3/envs/mima-torch/lib/python3.9/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:127 (append_torchlib_if_found)
CMakeLists.txt:35 (find_package)
-- Found Torch: /home/groups/aditis2/lauraman/miniconda3/envs/mima-torch/lib/python3.9/site-packages/torch/lib/libtorch.so
-- Configuring done
-- Generating done
-- Build files have been written to: /scratch/users/lauraman/MiMA_pytorch/new-fortran-pytorch-lib/fortran-pytorch-lib/fortran-pytorch-lib/build_conda
$ make
Scanning dependencies of target ftorch
[ 33%] Building Fortran object CMakeFiles/ftorch.dir/ftorch.f90.o
[ 66%] Building CXX object CMakeFiles/ftorch.dir/ctorch.cpp.o
[100%] Linking CXX shared library libftorch.so
[100%] Built target ftorch
$ make install
Consolidate compiler generated dependencies of target ftorch
[100%] Built target ftorch
Install the project...
-- Install configuration: "Release"
-- Installing: /home/groups/aditis2/lauraman/test-fortran-pytorch-lib/lib64/libftorch.so
-- Set runtime path of "/home/groups/aditis2/lauraman/test-fortran-pytorch-lib/lib64/libftorch.so" to "$ORIGIN/../lib64:/home/groups/s-ees/share/cees/spack_cees/spack/opt/spack/linux-centos7-zen2/intel-2021.4.0/intel-oneapi-mkl-2021.4.0-k4r3on5jujinjf5tjqs6u3jguuecptj4/mkl/2021.4.0/lib:/home/groups/aditis2/lauraman/miniconda3/envs/mima-torch/lib/python3.9/site-packages/torch/lib:/home/groups/s-ees/share/cees/spack_cees/spack/opt/spack/linux-centos7-zen2/intel-2021.4.0/intel-oneapi-mkl-2021.4.0-k4r3on5jujinjf5tjqs6u3jguuecptj4/compiler/latest/linux/compiler/lib/intel64_lin"
-- Installing: /home/groups/aditis2/lauraman/test-fortran-pytorch-lib/include/ctorch.h
-- Installing: /home/groups/aditis2/lauraman/test-fortran-pytorch-lib/lib64/cmake/FTorchConfig.cmake
-- Installing: /home/groups/aditis2/lauraman/test-fortran-pytorch-lib/lib64/cmake/FTorchConfig-release.cmake
-- Installing: /home/groups/aditis2/lauraman/test-fortran-pytorch-lib/include/ftorch/ftorch.mod
All looks like it compiles okay. I then go into the ResNet example and follow the steps to build this (Also, I noticed a subtle typo in the ResNet build instructions there: -DFTorchDIR
should be -DFTorch_DIR
)
$ cmake .. -DFTorch_DIR=$GROUP_HOME/lauraman/test-fortran-pytorch-lib/lib64/cmake/ -DCMAKE_BUILD_TYPE=Release
-- Building with Fortran PyTorch coupling
-- Configuring done
-- Generating done
-- Build files have been written to: /scratch/users/lauraman/MiMA_pytorch/new-fortran-pytorch-lib/fortran-pytorch-lib/examples/1_ResNet18/build
$ make
Scanning dependencies of target resnet_infer_fortran
[ 50%] Building Fortran object CMakeFiles/resnet_infer_fortran.dir/resnet_infer_fortran.f90.o
[100%] Linking Fortran executable resnet_infer_fortran
[100%] Built target resnet_infer_fortran
No errors there but when I run:
$ ./resnet_infer_fortran ../saved_resnet18_model_cpu.pt
Intel MKL ERROR: Parameter 13 was incorrect on entry to SGEMM .
-6.3447808E-03
Expect behaviour (?):
I tested this but instead of using the path to torch in the conda environment, I downloaded libtorch
from source and compiled the library using that. Following the same steps I got the following result- could you check if this is correct please?
$ ./resnet_infer_fortran ../saved_resnet18_model_cpu.pt
0.1825228
In the ResNet example we use pretrained=True
, but this is deprecated and should now be weights='IMAGENET1K_V1'
.
Integration of FTorch with a distributed CPU based solver can lead to a scenario where there are N (--ntasks-per-node
) MPI and M (torch::cuda::device_count()
) GPUs per node (**M** <= **N**)
. The current implementation of FTorch appears to leverage only GPU:0 for all N MPI ranks. Providing user ability to decide which GPU to leverage can ensure that all available GPUs are used.
An initial discussion regarding this potential feature: #84.
Furthermore, there might still be multiple MPI ranks per GPU even after uniformly distributing the MPI ranks among available GPUs. The GPU probably calls these ML model copies serially. CUDA MPS could be utilized to concurrently run the ML model copies. An alternative might be to perform (gather to a single task, deploy the ML model from that task, and finally scatter to respective tasks) inside the fortran code.
Removed from being part of #20 we should add some information for advanced users about manipulating data beyond simple transposition using the 'stride' options, as was done by @jatkinson1000 and @SimonClifford in the MiMA-ML project.
## Advanced use
Those experienced with C will perhaps have noticed that there are further freedoms
available beyond those presented above.
Always stride, or is there a transpose tradeoff?
Currently, data is (probably) explicitly moved to the target device, rather than created directly on the desired device, as is preferable.
Creating the tensor directly on the device would be closer to the previous form of the code (see changes for GPUs), although that exact implementation did not seem to work.
We should add some basic linting and QC on the repository to help guide PRs to meet some basic standards in advance.
For now we should do this for python.
Apply:
Consider:
Possible memory leak, need to investigate further.
When following the README of the ResNet18 example I found some typos in the instructions.
The following code is commented out and should be removed. Similarly, for the torch_from_blob_f
header, which is unused.
Lines 112 to 138 in 2653bf8
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.