gpufit / gpufit Goto Github PK

View Code? Open in Web Editor NEW

300.0 20.0 90.0 1.19 MB

GPU-accelerated Levenberg-Marquardt curve fitting in CUDA

License: MIT License

CMake 5.24% C++ 32.24% C 0.18% MATLAB 14.91% Cuda 31.65% Python 7.87% Batchfile 1.82% Java 6.09%

gpu-computing non-linear-regression curve-fitting super-resolution gpu-acceleration gpu-programming levenberg-marquardt

gpufit's Introduction

Gpufit

Levenberg Marquardt curve fitting in CUDA.

Homepage: github.com/gpufit/Gpufit

The manuscript describing Gpufit is now published in Scientific Reports.

Quick start instructions

To verify that Gpufit is working correctly on the host computer, go to the folder gpufit_performance_test of the binary package and run Gpufit_Cpufit_Performance_Comparison.exe. Further details of the test executable can be found in the documentation package.

Binary distribution

The latest Gpufit binary release, supporting Windows 32-bit and 64-bit machines, can be found on the release page.

Documentation

Documentation for the Gpufit library may be found online (latest documentation), and also as a PDF file in the binary distribution of Gpufit.

Building Gpufit from source code

Instructions for building Gpufit are found in the documentation: Building from source code.

Using the Gpufit binary distribution

Instructions for using the binary distribution may be found in the documentation. The binary package contains:

The Gpufit SDK, which consists of the 32-bit and 64-bit DLL files, and the Gpufit header file which contains the function definitions. The Gpufit SDK is intended to be used when calling Gpufit from an external application written in e.g. C code.
Gpufit Performance test: A simple console application comparing the execution speed of curve fitting on the GPU and CPU. This program also serves as a test to ensure the correct functioning of Gpufit.
Matlab 32 bit and 64 bit bindings, with Matlab examples.
Python version 2.x and version 3.x bindings (compiled as wheel files) and Python examples.
Java binding, with Java examples.
The Gpufit manual in PDF format

Examples

There are various examples that demonstrate the capabilities and usage of Gpufit. They can be found at the following locations:

/examples/c++ - C++ examples for Gpufit
/examples/c++/gpufit_cpufit - C++ examples that use Gpufit and Cpufit
/examples/matlab - Matlab examples for Gpufit including spline fit examples (also requires Gpuspline)
/examples/python - Python examples for Gpufit including spline fit examples (also requires Gpuspline)
/Cpufit/matlab/examples - Matlab examples that only uses Cpufit
/Gpufit/java/gpufit/src/test/java/com/github/gpufit/examples - Java examples for Gpufit

Authors

Gpufit was created by Mark Bates, Adrian Przybylski, Björn Thiel, and Jan Keller-Findeisen at the Max Planck Institute for Biophysical Chemistry, in Göttingen, Germany.

How to cite Gpufit

If you use Gpufit in your research, please cite our publication describing the software. A paper describing the software was published in Scientific Reports. The open-access manuscript is available from the Scientific Reports website, here.

Gpufit: An open-source toolkit for GPU-accelerated curve fitting
Adrian Przybylski, Björn Thiel, Jan Keller-Findeisen, Bernd Stock, and Mark Bates
Scientific Reports, vol. 7, 15722 (2017); doi: https://doi.org/10.1038/s41598-017-15313-9

License

MIT License

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

gpufit's People

Contributors

Stargazers

Watchers

Forkers

yongdengzhang tiabph guokr1991 patrisch jkfindeisen irieger neurorobotictech caowencai mscipio capri2014 liu3xing3long simonsleo jokropp acschaefer spacejake cfsterpka chomolungma edenshin michaelmestreoni arapiraca ltorres6 lesauxvi xyuan ironictoo sdasd12312wasdasd lsaca05 lstoleriu tcald zhengpuas47 hkp119 hendrik727 satoshirobatofujimoto marysalvana faridyagubbayli poti35 apriori alan0526 gavinlee960312 gefeichen ogsdave zeta1999 ethanhs joonvan surpris nick-garabedian qi2lab kiotex elmokulc sriharijayaram5 baldrlector yak1990 maximilianwinter straussmaximilian anovoseltseva bitbucketboss bigbsb murphysong aim-jessica smxplorer wangce888 ravusairam raiuny dabossxinxin mfkiwl sbresler jhu-s yisonghan efuchey remingtonrohel jimkring mikevolk jkisoftware mj-xmr bpi-oxford kubiczek36 gerald00627 casparvitch jacob-q-cat rin-yokoyama chengwei920412 walkerknapp jackzhousz chrinide simonclifford mgeismann xianghao-wang

gpufit's Issues

Installation and raw data used in the paper?

Dear all,
1- I will try to use GPUFIT which seems to be really useful even if I did not found the PDF documentation and I don't know from which point I should start to use GPUFIT.
2- Could you put in the repository raw data used for your tests please?
Best regards.

How does our linear solver compare to cuBLAS or cuSOLVER and should we maybe replace it?

Actually I would like to know how it compares performance wise. The question would be how far away (or ahead) we are of other implementations.

In a potential replacement we would need to take into account that cuBLAS uses dynamic parallelism (capability 3.5)??

Concentrate information about a model

consider a two-part organization, consisting of a C++ object which stores information about the model, and a CUDA code block which handles the calculations

Can GPU fit handle functions that returns vectors instead of a single value?

Hi,

I have a function that returns more than value that I have to fit, like a f : R^3 -> R^3. I was wondering if GPUFit can handle such cases as well, but it doesn't seem to do so.

Regards

Lower/Upper bound

I am just wondering how can we add lower and upper bounds for parameters. Is there anyways to do so?

Build on macOS

And add build instructions to the documentation.

Gpufit applications sometimes very slow on Linux

I experience very long delays with the execution of examples/tests of Gpufit on Linux. The executables have been built on Ubuntu 16.04 LTS, CUDA 9, gcc 5.4 with the current source of the master branch. Instead of less than a second each, the tests often need around 25 seconds to finish, but sometimes not. Results seem unaffected otherwise. Could have to do with CUDA?

Constrained nonlinear fits

Is it possible to incorporate constraints on the parameters into Gpufit? If so, how do other packages do it? At least parameter lower and upper bounds on parameters seem reasonable.

Include independent variables and data dimension sizes in the gpufit interface

Which other open source PALM/STORM application could benefit from Gpufit?

Those fitting applications which could benefit from Gpufit (PALM, STORM, ... but not limited) help them using Gpufit. For example by forking and introducing Gpufit. This may include re-design since Gpufit needs the fits to be stored, then performed in one go, then then results need to be distributed.

How many n_fits(bytes) totally to feed once according to the GPU performance

To make a full use of the local GPU, is there any easy approach to set the proper n_fits*n_points（or in bytes) for those who are unfamiliar with cuda?
Thanks a lot.

If fit goes wrong convergence test can report positively although fit didn't converge

If for some reason the first step is too large (too large lambda?), the parameters are way off after the first iteration and then never recover, but still chi_square-previous_chi_square will be zero sometimes and the convergence test will be positive although the model and the parameters are far from anything good.

Not sure yet, what exactly is wrong there and how to fix it but it surely deserves to be looked at.

This may also happen in other fit frameworks, but maybe not to the same extent. The value of the initial lambda parameter as well as a more complex analysis of the converged state may improve the situation

Automatically discover and add estimator and model functions to the cmake configuration

[Debugging new models] Any idea of how to expose device void calculate_model() in order to check model implementation?

I just want to test a new model but despite note having compilation issue, it doesn't fit my data.
I thought of testing my code giving my kernel a set of parameter and checking the shape of the simulated curve but I am having more troubles than expected.

Down here what I tried to do, but it is not working at all ...

created a new method in lm_fit_cuda.cpp as:

void LMFitCUDA::simul()
{
    // initialize the chi-square values
    calc_curve_values();
}

and then in lm_fit.cpp a new method of LMfit class:

void LMFit::simul(float const tolerance)
{
    set_parameters_to_fit_indices();

    GPUData gpu_data(info_);
    gpu_data.init_user_info(user_info_);

    // loop over data chunks
    while (n_fits_left_ > 0)
    {
        chunk_size_ = int((std::min)(n_fits_left_, info_.max_chunk_size_));
        info_.set_fits_per_block(chunk_size_);

        gpu_data.init(
            chunk_size_,
            ichunk_,
            data_,
            weights_,
            initial_parameters_,
            parameters_to_fit_indices_);

        LMFitCUDA lmfit_cuda(
            tolerance,
            info_,
            gpu_data,
            chunk_size_);
        lmfit_cuda.simul();
        get_results(gpu_data, chunk_size_);
        n_fits_left_ -= chunk_size_;
        ichunk_++;
    }
}

A new method for class FitInterface:

void FitInterface::simulate(ModelID const model_id)
{
    int n_dimensions = 0;
    configure_model(model_id, n_parameters_, n_dimensions);

    check_sizes();

    Info info;
    configure_info(info, model_id);

    LMFit modelsim
    (
        data_,
        weights_,
        info,
        initial_parameters_,
        parameters_to_fit_,
        user_info_,
        output_parameters_,
        output_states_,
        output_chi_squares_,
        output_n_iterations_
    ) ;
    modelsim.simul(tolerance_);
}

And eventually a new function in gpufit.cpp

int gpusimul
(
    size_t n_fits,
    size_t n_points,
    float * data,
    float * weights,
    ModelID model_id,
    float * initial_parameters,
    float tolerance,
    int max_n_iterations,
    int * parameters_to_fit,
    EstimatorID estimator_id,
    size_t user_info_size,
    char * user_info,
    float * output_parameters,
    int * output_states,
    float * output_chi_squares,
    int * output_n_iterations
)
try
{
    __int32 n_points_32 = 0;
    if (n_points <= (unsigned int)(std::numeric_limits<__int32>::max()))
    {
        n_points_32 = __int32(n_points);
    }
    else
    {
        throw std::runtime_error("maximum number of data points per fit exceeded");
    }

    FitInterface fi(
        data,
        weights,
        n_fits,
        n_points_32,
        tolerance,
        max_n_iterations,
        estimator_id,
        initial_parameters,
        parameters_to_fit,
        user_info,
        user_info_size,
        output_parameters,
        output_states,
        output_chi_squares,
        output_n_iterations);

    fi.simulate(model_id);

    return ReturnState::OK  ;
}

Additional stopping criteria

Currently supported:

Function value finite difference
Maximal number of iteration
Function value NaN, Inf (is this really the case?)

Also used typically:

Parameter value finite difference
First-order optimality measure
Maximal number of function evaluations

Maybe we want that too.

Comment:
The parallelization algorithm internally waits for all fits in the current batch to either converge or exceed the maximum number of iterations. This synchronization effect means that the runtime is susceptible to the longest running fit in the current batch. An asynchronous loading and fitting might be an alternative some day.

Why function "configure_model" is declared in cuda_kernel.cuh and defined in mdoels.cuh?

I have built an individual vs project based on Gpufit and in my project, I have done nothing to change the code dependencies.
The function "configure_model" is declared in cuda_kernel.cuh and defined in mdoels.cuh in the src.
In my vs project, it worked well to ignore the situation, not to change header dependencies for time saving.
I guess , it is in with compiling ?

Early check for valid estimator and model ID

Whether the estimator and model IDs are given internally as const, define or enum, there would currently be no way to fail early in case the user specifies an invalid (non-existing) value for the estimator or the model. A solution may require listing these values a second time (or at least checking for being within the valid range of int values).

Include runtime compilation of CUDA code for custom estimators

See PyCUDA (https://mathema.tician.de/software/pycuda/) for a library claiming to do this as well as the documentation for runtime compilation (http://docs.nvidia.com/cuda/nvrtc/index.html) for more information.

Point spread funtion

dear all,
is there any psf https://en.wikipedia.org/wiki/Point_spread_function support here?

Build on Linux

For example with Ubuntu 16.10 (https://packages.ubuntu.com/yakkety/nvidia-cuda-toolkit)

Include runtime compilation of CUDA code for custom function models

Calculation of the hessian in GPUFit

Hi,

Just a quick question, with respect to the following function

__device__ void calculate_hessian_lse(
    double * hessian,
    int const point_index,
    int const parameter_index_i,
    int const parameter_index_j,
    float const * data,
    float const * value,
    float const * derivative,
    float const * weight,
    char * user_info,
    std::size_t const user_info_size)
{
    if (weight)
    {
        *hessian
            += derivative[parameter_index_i] * derivative[parameter_index_j]
            * weight[point_index];
    }
    else
    {
        *hessian
            += derivative[parameter_index_i] * derivative[parameter_index_j];
    }
}

How is exactly computed the hessian?
Is it approximated using the derivative somehow?

Thank you

Python and Matlab bindings for gpufit_cuda_available() function

Bindings for utility functions such as this are needed.

Add a new fit model+ MATLAB

I did add a new fit model according to the instruction, but it did not show up in the list of ModelID.m. I noticed I have to add my model to ModelID.m in C:\Sources\Gpufit-master\Gpufit\matlab folder.

External binding to Java

With a structure similar to the existing bindings to Matlab and Python.

Framework to numerically check supplied derivatives

That is standard in many other packages and also quite useful to check models. It also would allow to use models without an analytical derivative at the cost of performance and maybe numerical accuracy.

Compartmental model fitting for medical imaging using Gpufit

Hi everybody,
I tried to do a similar thing to what you did here using just PyCuda to make python able to talk with the GPU, avoiding C++ coding. I just made the repo public after submitting a paper using it, so if you want you can have a look.

What I did there was to implement Compartmental Model fitting for dynamic PET imaging using CUDA parallelization and adding a spatial regularization term to the LevMar update (I wanted to fit time series of 3D images, so I add voxel neighborhood to exploit after synchronizing the fitting across voxels).

I would like to test the CUDA kernel to compute model and derivatives I developed for my tool in Gpufit. I've already read the documentation page about extending the library with new models. I think I could do it.

I have a question, though.
Compartment models are, basically, a convolution of two biexponential functions.
The current version of my kernel uses an analytic form to compute the model update and, above all, its derivatives. Do you have any reference to how I can implement a numeric convolution in a CUDA kernel, and how I can derive it to update the gradient?

What is the performance overhead of more complex models with fixed parameters vs. simpler models?

If it is not very big, we could use the rotated, elliptic 2D Gaussian to also include the non-rotated elliptic 2D Gaussian.

On the other hand, there is an elliptic 2D Cauchy peak but not a symmetric or elliptic, rotated 2D Cauchy.

2D Gaussian MLE state 3 problem

Hello, thank you for this awesome tool.

We are running into a problem fitting to a 2D Gaussian with the MLE method in Matlab. The test data you provided runs correctly for 2D Gaussian MLE, and we have been successfully running the code on our own data doing 2D Gaussian fitting with the least-squares method, but whenever we run our own data with 2D Gaussian MLE a problem occurs. What happens is that all of the states outputted are state 3, which as far as I can tell means that the function is reaching a negative value. This is perplexing as we have carefully verified that we are not sending any negative values in the data or the parameters.

If you load the attached .mat file (in the .zip bc of GitHub) and run the following function call it should reproduce this problem. And as suggested earlier, it works correctly for least-squares.

[parameters, states, chi_squares,~,~] = gpufit(dataset, [],model_id, initial_parameters, tolerance, max_n_iterations, params_to_fit, estimator_id, []);

This seems like a bug in the MLE code to me, but it could be that we're doing something wrong. Regardless, I'd greatly appreciate any help you can provide.

Thank you very much,

github_question.zip

CUDA C interface

e have a C interface, but maybe we could also use a "CUDA interface", so that if you do some stuff on the GPU before and afterwards you do not need to copy the data from and to the GPU which takes a lot of time sometimes. So basically, it would be the whole algorithm minus the data copying but already assuming the data has been copied at the beginning and doesn't need to be copied at the end.

Is this useful?

Running 'make' on Linux

Hi, I'm having issues installing this library. The cmake step passed successfully but I still get errors on the make:

In file included from /scratch/bjb56/download/Gpufit/Gpufit/gpufit.cpp:2:0:
/scratch/bjb56/download/Gpufit/Gpufit/interface.h:6:14: error: expected constructor, destructor, or type conversion before ‘(’ token
static_assert( sizeof( int ) == 4, "32 bit 'int' type required" ) ;
^
/scratch/bjb56/download/Gpufit/Gpufit/gpufit.cpp: In function ‘int gpufit(size_t, size_t, float*, float*, int, float*, float, int, int*, int, size_t, char*, float*, int*, float*, int*)’:
/scratch/bjb56/download/Gpufit/Gpufit/gpufit.cpp:48:12: error: ‘ReturnState’ is not a class or namespace
return ReturnState::OK ;
^
/scratch/bjb56/download/Gpufit/Gpufit/gpufit.cpp:54:12: error: ‘ReturnState’ is not a class or namespace
return ReturnState::ERROR ;
^
/scratch/bjb56/download/Gpufit/Gpufit/gpufit.cpp:60:12: error: ‘ReturnState’ is not a class or namespace
return ReturnState::ERROR;
^
/scratch/bjb56/download/Gpufit/Gpufit/gpufit.cpp: In function ‘int gpufit_get_cuda_version(int*, int*)’:
/scratch/bjb56/download/Gpufit/Gpufit/gpufit.cpp:90:16: error: ‘ReturnState’ is not a class or namespace
return ReturnState::OK;
^
/scratch/bjb56/download/Gpufit/Gpufit/gpufit.cpp:96:16: error: ‘ReturnState’ is not a class or namespace
return ReturnState::ERROR;
^
CMakeFiles/Gpufit.dir/build.make:90: recipe for target 'CMakeFiles/Gpufit.dir/gpufit.o' failed
make[2]: *** [CMakeFiles/Gpufit.dir/gpufit.o] Error 1
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/Gpufit.dir/all' failed
make[1]: *** [CMakeFiles/Gpufit.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2

Any tips would come much appreciated.

Thanks,

Ben

Interoperability with Matlab gpuArray processes

If you try to use gpuArray (the Matlab gpu computation) after gpufit has been used (for example to speed up other parts of the script), an error is thrown.

Error using gpuArray
An unexpected error occurred during CUDA execution. The CUDA error was:
cannot set while device is active in this process

The other way around (first using gpuArray, then gpufit, then gpuArray again, ..) is fine though.

We could unload Gpufit using the command clear GpufitMex; after a call to gpufit and before using a gpuArray but this may include time overheads (if done repeatedly) and is not very elegant.

We think that Matlab checks for active contexts and if there is one throws this error. In the future this might be different.

We cannot make a CUDA reset because that also destroys all gpuArray variables.

So, the recommend way for now is to first start gpuDevice/gpuArray once and then continue with gpufit and/or gpuArray...

It's unclear if we create another context if Matlab already has a context, or what the difference is to running gpufit after gpuArray vs. running it before.

"invalid configuration argument"

Hello,

I've implemented the following example for fitting a vector function rather complicated. I can't post the model, mostly because it's quite involved. But for the issue I'm pointing out I don't think it matters.

The following is my example

void my_model_fit_example() {

	//Generating sample test for fitting a vector function


	const size_t M = 12;
	const size_t N = M;
	
	const size_t X = 12; 
	const size_t Y = 12;
	const size_t n_channels = 3;
	const size_t n_dir_samples = X * Y;
	const size_t n_components = n_channels*M*N;
	const size_t n_fits = 1;
	const size_t n_points_per_fit = n_dir_samples * n_components; 
	const size_t n_model_parameters = 4 + 9*M*N;

	size_t temp, offset;

	// true parameters
	std::vector< float > real_params(n_fits*n_model_parameters);
	std::vector<float> init_params(n_fits*n_model_parameters);
	init_my_model_params(real_params, M, N);

	//random number generator
	// initialize random number generator
	std::mt19937 rng;
	rng.seed(0);
	std::uniform_real_distribution< float> uniform_dist(0, 1);

	//Let's perturbe a bit such parameters
	for (auto i = 0; i < n_fits*n_model_parameters; ++i) {
		init_params[i] = real_params[i] + uniform_dist(rng);
	}

	//centroids generator
	std::vector< float > x_coords(X);
	std::vector< float > y_coords(Y);
	for (auto i = 0; i < X; ++i) { //this for is silly given what I'm doing with it
		x_coords[i] = i;
		y_coords[i] = i;
	}

	//
	//
	//f = {
	//	g0(x[0],y[0]),
	//	...
	//	g3*MN-1(x[0],y[0]),
	//	...
	//	g0(x[i],y[j]),
	//	...
	//	g3*MN-1(x[i],y[j]),
	//	...
	//	};
	//
	std::vector< float > data(n_fits * n_points_per_fit);
	std::vector< float > curr_chunk_of_samples(n_components);
	size_t I, J;
	//iterating through directions
	for (size_t l = 0; l < X*Y; ++l) //if f(x,y,z) = (u(x,y,z),v(x,y,z)), the layout is {u(x[0],y[0],z[0]),v(x[0],y[0],z[0]),u(x[0],y[0],z[1]),v(x[0],y[0],z[1]),...,u(x[i],y[j],z[k]),v(x[i],y[j],z[k]),...}
	{

		J = l % X;
		I = l / X;
		//iterating through the pixels of this tile..
		my_model(x_coords[I], y_coords[J], real_params, curr_chunk_of_samples);
		for (auto k = 0; k < n_components; ++k) {
			data[l*n_components + k] = curr_chunk_of_samples[k];
		}
	}
	
	// tolerance
	float const tolerance = 0.001f;

	// maximum number of iterations
	int const max_number_iterations = 20;

	// estimator ID
	int const estimator_id = LSE;

	// model ID
	int const model_id = MY_MODEL;

	// parameters to fit (all of them)
	std::vector< int > parameters_to_fit(n_model_parameters, 1);

	// output parameters
	std::vector< float > output_parameters(n_fits * (4 + 9*M*N));
	std::vector< int > output_states(n_fits);
	std::vector< float > output_chi_square(n_fits);
	std::vector< int > output_number_iterations(n_fits);

	// call to gpufit (C interface)
	std::chrono::high_resolution_clock::time_point time_0 = std::chrono::high_resolution_clock::now();
	int const status = gpufit
	(
		n_fits,
		n_points_per_fit,
		data.data(),
		0,
		model_id,
		init_params.data(),
		tolerance,
		max_number_iterations,
		parameters_to_fit.data(),
		estimator_id,
		0,
		0,
		output_parameters.data(),
		output_states.data(),
		output_chi_square.data(),
		output_number_iterations.data()
	);

	std::chrono::high_resolution_clock::time_point time_1 = std::chrono::high_resolution_clock::now();
	
	// check status
	if (status != ReturnState::OK)
	{
		throw std::runtime_error(gpufit_get_last_error());
	}

	// print execution time
	std::cout << "execution time "
		<< std::chrono::duration_cast<std::chrono::milliseconds>(time_1 - time_0).count() << " ms" << std::endl;

 	std::cout << "Finished!" << std::endl;
}

I've run my example in debug mode and what I see is that an exception is thrown when the following is reached:


void LMFitCUDA::run()
{
    // initialize the chi-square values
	calc_curve_values();
    calc_chi_squares();
    calc_gradients();
    calc_hessians(); //THIS LINE!!!

    gpu_data_.copy(
        gpu_data_.prev_chi_squares_,
        gpu_data_.chi_squares_,
        n_fits_);

    // loop over the fit iterations
    for (int iteration = 0; !all_finished_; iteration++)
    {
        // modify step width
        // Gauss Jordan
        // update fitting parameters
        solve_equation_system();

        // calculate fitting curve values and its derivatives
        // calculate chi-squares, gradients and hessians
		calc_curve_values();
        calc_chi_squares();
        calc_gradients();
        calc_hessians();

        // check which fits have converged
        // flag finished fits
        // check whether all fits finished
        // save the number of needed iterations by each fitting process
        // check whether chi-squares are increasing or decreasing
        // update chi-squares, curve parameters and lambdas
        evaluate_iteration(iteration);
    }
}

The thrown exception is "invalid configuration argument", not sure what this exception mean. My model both implements calculation of function value and derivatives (jacobian to be more specific is a vector function the one I have to fit), the estimator I'm using is the LSE provided by the tool.

Anyway I do wonder what can be the problem. I cannot post the whole function for several reasons, but I can post the for loops eventually (in order to understand from where to where I'm iterating). I'm saying this because I can see the hessian calculation is based on the derivatives, which are implemented in my model.

Can anyone help?

Test also parts of the algorithm

We mostly test the whole procedure in the C++ tests and the Matlab/Python examples. We should also test parts of the algorithms both for consistency and correctness.

Side issue: Continuous integration like Travis CI might help there.

Unique X coordinate values for each fit

(In a 1D Gaussian function, n_fits and n_points_per_fit are given): to match unique X coordinate values for each fit in user_info, so the length of user_info (in type of float) is euqal to n_fits* n_points_per_fit ,the same with the length of data(the 3rd parameter in gpufit()?

How to conveniently build install targets for a specific configuration from outside?

This would make packaging easier. Maybe with http://buildbot.net/

Count or ignore failed cycles of the algorithm in the iteration count?

Currently we count all iterations as iterations even if they do not change the parameter sets (only change the lambda for example). Others like CMinpack do not count these iterations (according to Björn). Which way should we do it?

In any case, this should be described in the API documentation of the output parameter.

External binding to LabVIEW

With a structure similar to the existing bindings to Matlab and Python, if possible.

Build errors on Ubuntu16.04

FLT_EPSILON was not declared in Brown_Dennis_Fit.cpp
there are errors:
/Gpufit/Gpufit/tests/Brown_Dennis_Fit.cpp: In member function ‘void Fletcher_Powell_Helix::test_method()’:
/Gpufit/Gpufit/tests/Brown_Dennis_Fit.cpp:69:71: error: ‘FLT_EPSILON’ was not declared in this scope
BOOST_CHECK(std::abs(output_parameters[0] - true_parameters[0]) < FLT_EPSILON);
^
/Gpufit/Gpufit/tests/Brown_Dennis_Fit.cpp:70:71: error: ‘FLT_EPSILON’ was not declared in this scope
BOOST_CHECK(std::abs(output_parameters[1] - true_parameters[1]) < FLT_EPSILON);
^
Gpufit/Gpufit/tests/Brown_Dennis_Fit.cpp:71:71: error: ‘FLT_EPSILON’ was not declared in this scope
BOOST_CHECK(std::abs(output_parameters[2] - true_parameters[2]) < FLT_EPSILON);
^
/Gpufit/Gpufit/tests/Brown_Dennis_Fit.cpp:72:71: error: ‘FLT_EPSILON’ was not declared in this scope
BOOST_CHECK(std::abs(output_parameters[3] - true_parameters[3]) < FLT_EPSILON);

Gpufit and Cpufit libraries export too many symbols on Linux

Because the symbol export definition in the *.def files is not recognized on Linux. This means that all applications using both (Gpufit and Cpufit) like the performance comparison crash. Indeed they do.

We should use "attribute ((visibility ..." on Linux and "__declspec(..." on Windows instead. The old definition files should go. Or are they needed for anything?

Move estimators and model functions to their own source directory

Output_parameters turn out to be all-zero inexplicably when n_fits is set up to large enough

To handle with the stack overflow problem, all my parameters in gpufit are initialized in the way like

float *initial_parameters = new float [n_fits * n_model_parameters] ();

Then I found the n_fits was limited in a extent : when the n_fits was large enough(like 1000,0000 in my case),the gpufitted output_parameters turned out to be all-zero, but it worked as expected when n_fits was set to 100,0000. It is hard to figure out. By the way, the PC memory is OK.

Additional parameters for gpufit() via function overloading not possible if we use "extern C"

With reference to issue #30 I have been trying lately to add an optional parameter to gpufit that is able to give you the output of the fitting in terms of time curve, that you can compare to your original data to assess also visually the quality of the result.

After some trying, I managed to find a (imho) nice way to do it through overloading of some class init function, so that, depending on the number of parameters you use when calling gpufit(), the library can understand if you want to have the output of the fitting, as well, or just the standard output argument (params, number of iterations, and so on ...)

I am having just one problem in doing it: I can overload every function I need to, and this approach is working well, a part from the actual gpufit() function!

This is due to gpufit being exposed as a C interface:

#ifdef __cplusplus
extern "C" {
#endif

Given that function overloading is a specific property of C++, you cannot use it if you want to treat gpufit() as a C function. So far I used a workaraound based on using a different alias for the gpufit() function, in case I want to call it with the additional output_data parameter. But I don't like it ...

Two questions:
1- why do you need to do the extern "C" around gpufit()? Is it required by external bindings like python or matlab?
2- if this "extern" is actually needed, do you have any idea of alternative ways to use function overloading inside the extern "C" environment?

Thanks!

Ubuntu compilation problem

Hi,

I'm having troubles running make after cmake on ubuntu machine with CUDA 9. Could you help please?

[ 56%] Building NVCC (Device) object Gpufit/examples/CMakeFiles/CUDA_Interface_Example.dir/CUDA_Interface_Example_generated_CUDA_Interface_Example.cu.o
In file included from /usr/include/c++/4.8/random:35:0,
from /home/andrey/Downloads/Gpufit/Gpufit/examples/CUDA_Interface_Example.cu:4:
/usr/include/c++/4.8/bits/c++0x_warning.h:32:2: error: #error This file requires compiler and library support for the ISO C++ 2011 standard. This support is currently experimental, and must be enabled with the -std=c++11 or -std=gnu++11 compiler options.
#error This file requires compiler and library support for the
^
CMake Error at CUDA_Interface_Example_generated_CUDA_Interface_Example.cu.o.RELEASE.cmake:221 (message):
Error generating
/home/andrey/Downloads/Gpufit/Gpufit/examples/CMakeFiles/CUDA_Interface_Example.dir//./CUDA_Interface_Example_generated_CUDA_Interface_Example.cu.o

make[2]:

[Gpufit/examples/CMakeFiles/CUDA_Interface_Example.dir/CUDA_Interface_Example_generated_CUDA_Interface_Example.cu.o] Error 1
make[1]: *** [Gpufit/examples/CMakeFiles/CUDA_Interface_Example.dir/all] Error 2
make: *** [all] Error 2

some details on LMFitCuda::solve_equation_system()

Hi All,
With reference to "#46" I've implemented a model with many parameters (more than 32) and I'm trying to implement a "vector function" to be fit, plus the number of parameter is more than 32. Therefore I'm trying to hack a bit the code in such a way I'd be able to support my case. With reference to function below

void LMFitCUDA::solve_equation_system()
{
	dim3  threads(1, 1, 1);
	dim3  blocks(1, 1, 1);

	threads.x = info_.n_parameters_to_fit_*info_.n_fits_per_block_;
	blocks.x = n_fits_ / info_.n_fits_per_block_;

	cuda_modify_step_widths << < blocks, threads >> >(
		gpu_data_.hessians_,
		gpu_data_.lambdas_,
		gpu_data_.scaling_vectors_,
		info_.n_parameters_to_fit_,
		gpu_data_.iteration_failed_,
		gpu_data_.finished_,
		info_.n_fits_per_block_);
	CUDA_CHECK_STATUS(cudaGetLastError());

	int n_parameters_pow2 = 1;

	while (n_parameters_pow2 < info_.n_parameters_to_fit_)
	{
		n_parameters_pow2 *= 2;
	}

	//set up to run the Gauss Jordan elimination
	int const n_equations = info_.n_parameters_to_fit_;
	int const n_solutions = n_fits_;

	threads.x = n_equations + 1;
	threads.y = n_equations;
	blocks.x = n_solutions;

	//set the size of the shared memory area for each block
	int const shared_size
		= sizeof(float) * ((threads.x * threads.y)
			+ n_parameters_pow2 + n_parameters_pow2);

	//set up the singular_test vector
	int * singular_tests;
	CUDA_CHECK_STATUS(cudaMalloc((void**)&singular_tests, n_fits_ * sizeof(int)));

	//run the Gauss Jordan elimination
	cuda_gaussjordan << < blocks, threads, shared_size >> >(
		gpu_data_.deltas_,
		gpu_data_.gradients_,
		gpu_data_.hessians_,
		gpu_data_.finished_,
		singular_tests,
		info_.n_parameters_to_fit_,
		n_parameters_pow2);
	CUDA_CHECK_STATUS(cudaGetLastError());

	//set up to update the lm_state_gpu_ variable with the Gauss Jordan results
	threads.x = std::min(n_fits_, 256);
	threads.y = 1;
	blocks.x = int(std::ceil(float(n_fits_) / float(threads.x)));

	//update the lm_state_gpu_ variable
	cuda_update_state_after_gaussjordan << < blocks, threads >> >(
		n_fits_,
		singular_tests,
		gpu_data_.states_);
	CUDA_CHECK_STATUS(cudaGetLastError());

	CUDA_CHECK_STATUS(cudaFree(singular_tests));

	threads.x = info_.n_parameters_*info_.n_fits_per_block_;
	threads.y = 1;
	blocks.x = n_fits_ / info_.n_fits_per_block_;

	cuda_update_parameters << < blocks, threads >> >(
		gpu_data_.parameters_,
		gpu_data_.prev_parameters_,
		gpu_data_.deltas_,
		info_.n_parameters_to_fit_,
		gpu_data_.parameters_to_fit_indices_,
		gpu_data_.finished_,
		info_.n_fits_per_block_);
	CUDA_CHECK_STATUS(cudaGetLastError());
}

What is the meaning of the following:

	threads.x = info_.n_parameters_to_fit_*info_.n_fits_per_block_;
	blocks.x = n_fits_ / info_.n_fits_per_block_;

	//set up to run the Gauss Jordan elimination
	int const n_equations = info_.n_parameters_to_fit_;
	int const n_solutions = n_fits_;

	threads.x = n_equations + 1;
	threads.y = n_equations;
	blocks.x = n_solutions;

	threads.x = info_.n_parameters_*info_.n_fits_per_block_;
	threads.y = 1;
	blocks.x = n_fits_ / info_.n_fits_per_block_;

I mean the link I pointed above give a work around to implement a vector function, however due to the large number of parameters I have to modify the threads and blocks variables, but I don't want to mess-up with how they're actually used in practice. I have just one function to fit.

Criteria for pull requests of new models?

Example: Michele Scipioni suggests in #35 that a model for PET pharmocokinetics is merged to this repository here (gpufit/Gpufit). It might be useful for some although already specific and require maintenance in the long run.

What should our general guidance/policies be in that regard?

GPUFit VS project issues, debugging problems

Hello,

I'm trying to debug some of the cuda kernels I've hacked a bit in the GPUFit, but for some reason the CUDA debugger skips all the breakpoints. Also there's something else wrong, no CUDA C/C++ tab appears in the project properties.

I'm trying to follow this to sort out the problem:
[(https://stackoverflow.com/questions/10813260/cuda-c-tab-in-project-properties)]

Can anyone suggest where those rules should be copied?

Parallelization direction for Hessian matrix calculation

So far the parallelization was over the number of model parameters. This is efficient for a small number of data points per fit. Now that we allow a larger number of data points per fit, this becomes inefficient. Parallelizing over the number of data points would be efficient for a large number of data points but inefficient for a small number.

The best of both world might be to parallelize over the parameters and a chunk of the data points (possibly all). Parameters could be the second block dimension and data points the first block so that the product of both does not exceed 1024.

We should test this and if beneficial include it.

"not enough free GPU memory available" exception caught

Hello,

For some reason I don't quite understand I have an exception thrown due to the call of:

int gpufit
(
    size_t n_fits,
    size_t n_points,
    float * data,
    float * weights,
    int model_id,
    float * initial_parameters,
    float tolerance,
    int max_n_iterations,
    int * parameters_to_fit,
    int estimator_id,
    size_t user_info_size,
    char * user_info,
    float * output_parameters,
    int * output_states,
    float * output_chi_squares,
    int * output_n_iterations
)
try
{
    FitInterface fi(
        data,
        weights,
        n_fits,
        static_cast<int>(n_points),
        tolerance,
        max_n_iterations,
        static_cast<EstimatorID>(estimator_id),
        initial_parameters,
        parameters_to_fit,
        user_info, //0 in Gauss_Fit_2D example
        user_info_size, //0 in Gauss_Fit_2D example
        output_parameters,
        output_states,
        output_chi_squares,
        output_n_iterations);

    fi.fit(static_cast<ModelID>(model_id));

    return ReturnState::OK ;
}

catch( std::exception & exception )
{
    last_error = exception.what() ;

    return ReturnState::ERROR ;
}

Not entirely sure what the issue con be, I've implemented my own model to be minimized using an LSE estimator. What kind of information can I provide in order to get some help in debugging?

The content of last_error is last_error = "not enough free GPU memory available"

Centralize version string and spread with CMake

We have currently version strings duplicated at various places:

bindings (Python, Matlab)
documentation
packaging
source code (or at least it should be there)
examples (or at least it should be there)

Set it only once, then use CMake to spread the version.

gpufit / gpufit Goto Github PK

gpufit's Introduction

Gpufit

Quick start instructions

Binary distribution

Documentation

Building Gpufit from source code

Using the Gpufit binary distribution

Examples

Authors

How to cite Gpufit

License

gpufit's People

Contributors

Stargazers

Watchers

Forkers

gpufit's Issues

Recommend Projects

Recommend Topics

Recommend Org