Hi Keurfon, How do you output all models + misfit by iteration? I'm looking to cre

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Output models ,about keurfonluu/evodcinv

Comments (44)

keurfonluu commented on June 12, 2024 1

The seed is set before the "inversion" loop, so each inversion is using a different set of random numbers. It's easy to check that since each run is returning a different optimal solution.

from evodcinv.

keurfonluu commented on June 12, 2024 1

Hi @79seismo,

Please update evodcinv to v2.2.1, that should prevent this issue from happening hopefully.

from evodcinv.

keurfonluu commented on June 12, 2024 1

The first reason is API limitation: most optimization libraries (e.g., scipy), if not all, simply only allow to calculate and store the misfit value for a model. Although I am the developer of stochopy, I would need to make some substantial changes in the API to allow to store more than that.
The second reason is that evodcinv only calculates the dispersion velocities at discrete input periods. When plotting results, we usually want to plot the "full" dispersion curves (i.e., at periods different that of the data points), so storing the dispersion curves calculated during the inversion doesn't look as useful given that constraint.
Anyway, after some benchmarks, the performance issues of the plotting routines are not due to the recalculation of the dispersion curves (it should only take a few seconds, especially after thresholding), there seem to be some issues with sequential plotting of thousands of lines with matplotlib.

from evodcinv.

keurfonluu commented on June 12, 2024 1

Hi @79seismo,

I am not sure to correctly understand your question. By default, when maxrun is greater than 1, the result output contains all models visited by all runs. Thresholding the results returns an ensemble of models from all runs that satisfy the thresholding criterion.

from evodcinv.

keurfonluu commented on June 12, 2024

Hi @79seismo,

All models and misfits are output by default (attributes models and misfits of evodcinv.InversionResult).
Please check the script .github/sample.py to see how the figures in the README have been generated.

Without any prior information, it would be good to set rather unconstraining velocity bounds (yet still realistic, i.e., we do not expect 5 km/s at the surface) and constant thicknesses for each layer. Also, set a large population size for the optimizer (e.g., 10 times the number of layers).

Vp is calculated from Vs and Poisson's ratio. By default, density is calculated using Nafe-Drake's equation, but you can set your own model (density option in method configure).

Not sure what you mean by "benchmark", but dispersion curves calculated by disba are consistent with the ones from CPS for the velocity models I have tested. stochopy (the optimization library used here) has been thoroughly tested during my PhD. Since evodcinv is mainly relying on these two libraries, the forward modeling and the optimization algorithms are assumed to be working, thus the inversion code here should work properly. If it means anything, the example in the README is actually a sample problem from Geopsy's Wiki.

from evodcinv.

79seismo commented on June 12, 2024

Thanks very much Keurfon. This is useful information. By benchmark I meant comparing results with an established code, which is what you may have done with CPS.

I tried inverting for the model with 10 layers in your example in disba, and although the misfit gets minimised well, the model fit isn't that great for CPSO and is a bit better with NA from what I can tell. Any tips on how I can improve the fit? I set the search ranges as follows for all the layers model.add(Layer([1, 20], [1, 5.0]))

Can I get a copy of your PhD thesis as well please? Thanks!

from evodcinv.

keurfonluu commented on June 12, 2024

Please note that thickness is defined in km and velocity in km/s.

For the sample problem, the model layers can be configured as follows (assuming 10 layers of constant thickness h=5m down to 50 m depth):

for _ in range(10):
    model.add(Layer([0.005, 0.005], [0.1, 3.0]))

Set the population size to 100 and increase the number of iterations to 1000-2000 to be sure to reach convergence. You may also try different optimizers as you did with NA. CMA-ES seems to work quite well here:

model.configure(
    optimizer="cmaes",
    misfit="rmse",
    density=lambda vp: 2.0,
    optimizer_args={
        "popsize": 100,
        "maxiter": 2000,
        "workers": -1,
        "seed": 0,
    },
)

You can also increase the number of runs (e.g., to 3) to start from different initial populations.

res = model.invert(curves, maxrun=3, split_results=True)
res = min(res, key=lambda x: x.misfit)  # Get the results for the run with the lowest misfit

Note that you likely will never get the exact true model due to the non-uniqueness of the solution.

--------------------------------------------------------------------------------
Best model out of 200000 models (1 run)

Velocity model                                    Model parameters
----------------------------------------          ------------------------------
         d        vp        vs       rho                   d        vs        nu
      [km]    [km/s]    [km/s]   [g/cm3]                [km]    [km/s]       [-]
----------------------------------------          ------------------------------
    0.0050    0.3372    0.2065    2.0000              0.0050    0.2065    0.2000
    0.0050    0.3506    0.2147    2.0000              0.0050    0.2147    0.2000
    0.0050    0.5330    0.2263    2.0000              0.0050    0.2263    0.3901
    0.0050    0.5984    0.2443    2.0000              0.0050    0.2443    0.4000
    0.0050    0.2664    0.1632    2.0000              0.0050    0.1632    0.2000
    0.0050    1.3206    0.5391    2.0000              0.0050    0.5391    0.4000
    0.0050    7.3476    3.0000    2.0000              0.0050    3.0000    0.4000
    0.0050    5.8546    2.3907    2.0000              0.0050    2.3907    0.3999
    0.0050    0.3920    0.1600    2.0000              0.0050    0.1600    0.4000
    1.0000    2.0985    0.8567    2.0000                   -    0.8567    0.4000
----------------------------------------          ------------------------------

Number of layers: 10
Number of parameters: 29
Best model misfit: 0.0006
--------------------------------------------------------------------------------

Note that CMA-ES may not always work well. CMA-ES can be compared to a gradient based optimizer as it tries to generate more samples along the direction of the steepest descent.

from evodcinv.

79seismo commented on June 12, 2024

Thanks! much appreciated. Encountered the following for a VDCMA run, possible bug?

I was using population size=100 and iternations = 1000

Also, is there a way to isolate an ensemble of models within a certain threshold of the minimum misfit, say within 20% of the minimum misfit?

What does this do?
res = res.threshold(0.1)

Also, rather than showing all models with 'show=all' as given below, is there a way to isolate and show the models that were extracted within the earlier-mentioned threshold; i.e. say within 20% of the minimum misfit?
res.plot_model( "vs", zmax=zmax, show="all", ax=ax[0], plot_args={"cmap": cmap},

Thank you!

from evodcinv.

keurfonluu commented on June 12, 2024

I guess VDCMA does not like when all models have infinite misfit (happens when constraints are not satisfied). I will need to look into that.

The method threshold does exactly what you want: it extracts models that have misfits lower than the input value. If you want the threshold to be 20% of the minimum misfit:

value = 1.2 * res.misfit

Note that:

res = res.threshold(value)

overwrites the result variable res with the result filtered by the method threshold. You may want to save the filtered results in another variable, e.g.:

res2 = res.threshold(value)

Then, invoking the method plot_model from the filtered result will only show the models that satisfied your threshold criterion. Alternatively, you can pipe the methods as follows:

res.threshold(value).plot_models("vs", zmax=zmax, show="all", ax=ax[0], plot_args={"cmap": cmap)

from evodcinv.

79seismo commented on June 12, 2024

Neat! Thanks very much for your help!

from evodcinv.

79seismo commented on June 12, 2024

I still haven't been able to get a good inversion result for a synthetic model I'm trying and it looks like the following - a simple model with a velocity gradient. I tried all algorithms with population size = 100 to 150 and iterations 500 to 2500. Target computational costs don't let me go above these parameters. Any suggestions to improve the inversion? Also, the plotting routines are quite computationally intensive. I wonder if it can be made more efficient. Thanks!

velocity_model = np.array([ [ 0.001 , 0.346 , 0.2 , 1.4099 ], [ 0.001 , 0.4786 , 0.2767 , 1.4116 ], [ 0.001 , 0.6113 , 0.3533 , 1.4132 ], [ 0.001 , 0.7439 , 0.43 , 1.4145 ], [ 0.001 , 0.8765 , 0.5067 , 1.4158 ], [ 0.001 , 1.0092 , 0.5833 , 1.4169 ], [ 0.001 , 1.1418 , 0.66 , 1.418 ], [ 0.001 , 1.2744 , 0.7367 , 1.419 ], [ 0.001 , 1.4071 , 0.8133 , 1.42 ], [ 0.001 , 1.5397 , 0.89 , 1.4209 ], [ 0.001 , 1.6723 , 0.9667 , 1.4218 ], [ 0.001 , 1.805 , 1.0433 , 1.4226 ], [ 0.001 , 1.9376 , 1.12 , 1.4234 ], [ 0.001 , 2.0702 , 1.1967 , 1.4242 ], [ 0.001 , 2.2029 , 1.2733 , 1.425 ], [ 0.001 , 2.3355 , 1.35 , 1.4257 ], [ 0.001 , 2.4681 , 1.4267 , 1.4264 ], [ 0.001 , 2.6008 , 1.5033 , 1.4271 ], [ 0.001 , 2.7334 , 1.58 , 1.4278 ], [ 0.001 , 2.866 , 1.6567 , 1.4285 ], [ 0.001 , 2.9987 , 1.7333 , 1.4291 ], [ 0.001 , 3.1313 , 1.81 , 1.4298 ], [ 0.001 , 3.2639 , 1.8867 , 1.4304 ], [ 0.001 , 3.3966 , 1.9633 , 1.431 ], [ 0.001 , 3.5292 , 2.04 , 1.4316 ], [ 0.001 , 3.6618 , 2.1167 , 1.4322 ], [ 0.001 , 3.7945 , 2.1933 , 1.4328 ], [ 0.001 , 3.9271 , 2.27 , 1.4334 ], [ 0.001 , 4.0597 , 2.3467 , 1.4339 ], [ 0.001 , 4.1924 , 2.4233 , 1.4345 ], [ 0.001 , 4.325 , 2.5 , 1.435 ], ])

Fundamental mode Rayleigh Group dispersion curve from disba for this model:

Period:
array([ 0.1 , 0.10476158, 0.10974988, 0.1149757 , 0.12045035, 0.12618569, 0.13219411, 0.13848864, 0.14508288, 0.15199111, 0.15922828, 0.16681005, 0.17475284, 0.18307383, 0.19179103, 0.2009233 , 0.21049041, 0.22051307, 0.23101297, 0.24201283, 0.25353645, 0.26560878, 0.27825594, 0.29150531, 0.30538555, 0.31992671, 0.33516027, 0.35111917, 0.36783798, 0.38535286, 0.40370173, 0.42292429, 0.44306215, 0.46415888, 0.48626016, 0.5094138 , 0.53366992, 0.55908102, 0.58570208, 0.61359073, 0.64280731, 0.67341507, 0.70548023, 0.7390722 , 0.77426368, 0.81113083, 0.84975344, 0.89021509, 0.93260335, 0.97700996, 1.02353102, 1.07226722, 1.12332403, 1.17681195, 1.23284674, 1.29154967, 1.35304777, 1.41747416, 1.48496826, 1.55567614, 1.62975083, 1.70735265, 1.78864953, 1.87381742, 1.96304065, 2.05651231, 2.15443469, 2.25701972, 2.36448941, 2.47707636, 2.59502421, 2.71858824, 2.84803587, 2.98364724, 3.12571585, 3.27454916, 3.43046929, 3.59381366, 3.76493581, 3.94420606, 4.1320124 , 4.32876128, 4.53487851, 4.75081016, 4.97702356, 5.21400829, 5.46227722, 5.72236766, 5.9948425 , 6.28029144, 6.57933225, 6.8926121 , 7.22080902, 7.56463328, 7.92482898, 8.30217568, 8.69749003, 9.11162756, 9.54548457, 10. ])

Velocity:
array([1.09786535, 1.20770009, 1.30751688, 1.39670049, 1.47560087, 1.54520622, 1.60664691, 1.66103997, 1.70934501, 1.7524971 , 1.79118683, 1.82605584, 1.85757589, 1.88622391, 1.91242616, 1.93634936, 1.95832355, 1.97861568, 1.99732756, 2.01468714, 2.03079722, 2.04576357, 2.05973416, 2.07277005, 2.08493231, 2.09632235, 2.10704698, 2.11707513, 2.12650911, 2.13536061, 2.14373248, 2.15163577, 2.15908177, 2.1660324 , 2.17268075, 2.1789452 , 2.18487845, 2.19044092, 2.19573366, 2.20076534, 2.20553736, 2.21001281, 2.21428938, 2.21832549, 2.2221736 , 2.22583808, 2.22927476, 2.23253889, 2.23567875, 2.23865024, 2.24150667, 2.24420385, 2.24669491, 2.24912634, 2.25145642, 2.2536359 , 2.25576521, 2.25775087, 2.25964183, 2.26144286, 2.26315383, 2.26477708, 2.26631494, 2.26776975, 2.26923784, 2.27057825, 2.27183783, 2.27302139, 2.2742255 , 2.27530642, 2.27636073, 2.27734372, 2.2783025 , 2.27923952, 2.28015721, 2.28090897, 2.28173825, 2.28250089, 2.28324658, 2.28387826, 2.28454512, 2.285195 , 2.28573319, 2.28635639, 2.28686785, 2.28741698, 2.28785676, 2.2883816 , 2.28879949, 2.28920765, 2.2896535 , 2.2900422 , 2.29042116, 2.29074295, 2.29110732, 2.2914145 , 2.29171193, 2.2920045 , 2.2922922 , 2.29257016])

from evodcinv.

keurfonluu commented on June 12, 2024

I inverted your group velocities with 20 layers:

import numpy as np
from evodcinv import EarthModel, Layer, Curve
import matplotlib.pyplot as plt

period = np.array([ 0.1 , 0.10476158, 0.10974988, 0.1149757 , 0.12045035, 0.12618569, 0.13219411, 0.13848864, 0.14508288, 0.15199111, 0.15922828, 0.16681005, 0.17475284, 0.18307383, 0.19179103, 0.2009233 , 0.21049041, 0.22051307, 0.23101297, 0.24201283, 0.25353645, 0.26560878, 0.27825594, 0.29150531, 0.30538555, 0.31992671, 0.33516027, 0.35111917, 0.36783798, 0.38535286, 0.40370173, 0.42292429, 0.44306215, 0.46415888, 0.48626016, 0.5094138 , 0.53366992, 0.55908102, 0.58570208, 0.61359073, 0.64280731, 0.67341507, 0.70548023, 0.7390722 , 0.77426368, 0.81113083, 0.84975344, 0.89021509, 0.93260335, 0.97700996, 1.02353102, 1.07226722, 1.12332403, 1.17681195, 1.23284674, 1.29154967, 1.35304777, 1.41747416, 1.48496826, 1.55567614, 1.62975083, 1.70735265, 1.78864953, 1.87381742, 1.96304065, 2.05651231, 2.15443469, 2.25701972, 2.36448941, 2.47707636, 2.59502421, 2.71858824, 2.84803587, 2.98364724, 3.12571585, 3.27454916, 3.43046929, 3.59381366, 3.76493581, 3.94420606, 4.1320124 , 4.32876128, 4.53487851, 4.75081016, 4.97702356, 5.21400829, 5.46227722, 5.72236766, 5.9948425 , 6.28029144, 6.57933225, 6.8926121 , 7.22080902, 7.56463328, 7.92482898, 8.30217568, 8.69749003, 9.11162756, 9.54548457, 10. ])
group = np.array([1.09786535, 1.20770009, 1.30751688, 1.39670049, 1.47560087, 1.54520622, 1.60664691, 1.66103997, 1.70934501, 1.7524971 , 1.79118683, 1.82605584, 1.85757589, 1.88622391, 1.91242616, 1.93634936, 1.95832355, 1.97861568, 1.99732756, 2.01468714, 2.03079722, 2.04576357, 2.05973416, 2.07277005, 2.08493231, 2.09632235, 2.10704698, 2.11707513, 2.12650911, 2.13536061, 2.14373248, 2.15163577, 2.15908177, 2.1660324 , 2.17268075, 2.1789452 , 2.18487845, 2.19044092, 2.19573366, 2.20076534, 2.20553736, 2.21001281, 2.21428938, 2.21832549, 2.2221736 , 2.22583808, 2.22927476, 2.23253889, 2.23567875, 2.23865024, 2.24150667, 2.24420385, 2.24669491, 2.24912634, 2.25145642, 2.2536359 , 2.25576521, 2.25775087, 2.25964183, 2.26144286, 2.26315383, 2.26477708, 2.26631494, 2.26776975, 2.26923784, 2.27057825, 2.27183783, 2.27302139, 2.2742255 , 2.27530642, 2.27636073, 2.27734372, 2.2783025 , 2.27923952, 2.28015721, 2.28090897, 2.28173825, 2.28250089, 2.28324658, 2.28387826, 2.28454512, 2.285195 , 2.28573319, 2.28635639, 2.28686785, 2.28741698, 2.28785676, 2.2883816 , 2.28879949, 2.28920765, 2.2896535 , 2.2900422 , 2.29042116, 2.29074295, 2.29110732, 2.2914145 , 2.29171193, 2.2920045 , 2.2922922 , 2.29257016])


model = EarthModel()

for _ in range(20):
    model.add(Layer(0.0015, [0.1, 3.0]))

model.configure(
    optimizer="cpso",
    misfit="rmse",
    density=lambda vp: 2.0,
    optimizer_args={
        "popsize": 20,
        "maxiter": 200,
        "workers": -1,
        "seed": 0,
    },
)
curves = [Curve(period, group, 0, "rayleigh", "group")]
res = model.invert(curves, maxrun=3)

res.plot_curve(period, 0, "rayleigh", "group", show="best")
plt.semilogx(period, group, ls="--")

The results look good to me:

from evodcinv.

79seismo commented on June 12, 2024

Yes, the dispersion curve gets recovered well but not the velocity profile? I was doing this to kind of validate the inversions.

from evodcinv.

keurfonluu commented on June 12, 2024

The solutions of inversion problems are, in general, non-unique (there is an infinite number of solutions that can match your data). We can't expect the code to find the exact solution, even for synthetic cases. To get "better" results, we need to provide prior information (via better boundary constraints for instance, or additional regularization terms). Without any prior information, the code, as "dumb" as it is, returns one model that fits your data.

from evodcinv.

79seismo commented on June 12, 2024

Indeed, and is there a way to modify the misfit function? Alternatively, it's easier to quantify uncertainty if I can retrieve all the models within a certain threshold of the misfit. Is it possible to retrieve models after res.threshold()?

from evodcinv.

keurfonluu commented on June 12, 2024

You can look at the option "extra_terms" in "configure", this option allows to add additional misfit terms (e.g., to penalize models).

res.threshold simply filters out all the models that do not satisfy your threshold criterion. All the acceptable models are then accessible in res.models.

from evodcinv.

79seismo commented on June 12, 2024

Brilliant, thank you!

from evodcinv.

keurfonluu commented on June 12, 2024

I've updated the code and improved the option increasing_velocity (as usual, pip install evodcinv -U). The models are now initialized following this paper. Interestingly, this options seems to work quite well with Differential Evolution for the three sample problems I have tested (again, that may not be true for all problems).

model = EarthModel()

for _ in range(20):
    model.add(Layer(0.0015, [0.1, 3.0]))

model.configure(
    optimizer="de",
    misfit="rmse",
    density=lambda vp: 2.0,
    optimizer_args={
        "popsize": 50,
        "maxiter": 500,
        "workers": -1,
        "seed": 0,
        "strategy": "rand1bin",
    },
    increasing_velocity=True,
)
curves = [Curve(period, group, 0, "rayleigh", "group")]
res = model.invert(curves, maxrun=3)

--------------------------------------------------------------------------------
Best model out of 75000 models (3 runs)

Velocity model                                    Model parameters
----------------------------------------          ------------------------------
         d        vp        vs       rho                   d        vs        nu
      [km]    [km/s]    [km/s]   [g/cm3]                [km]    [km/s]       [-]
----------------------------------------          ------------------------------
    0.0015    0.4773    0.2389    2.0000              0.0015    0.2389    0.3329
    0.0015    0.6175    0.3148    2.0000              0.0015    0.3148    0.3243
    0.0015    0.7540    0.4099    2.0000              0.0015    0.4099    0.2903
    0.0015    0.9715    0.5724    2.0000              0.0015    0.5724    0.2341
    0.0015    1.2553    0.7678    2.0000              0.0015    0.7678    0.2011
    0.0015    1.4520    0.7960    2.0000              0.0015    0.7960    0.2851
    0.0015    1.6085    0.8779    2.0000              0.0015    0.8779    0.2879
    0.0015    1.8379    1.0229    2.0000              0.0015    1.0229    0.2756
    0.0015    2.2715    1.0593    2.0000              0.0015    1.0593    0.3611
    0.0015    2.0950    1.1280    2.0000              0.0015    1.1280    0.2959
    0.0015    2.5391    1.1726    2.0000              0.0015    1.1726    0.3644
    0.0015    2.4278    1.3305    2.0000              0.0015    1.3305    0.2854
    0.0015    2.6118    1.4435    2.0000              0.0015    1.4435    0.2801
    0.0015    2.6806    1.5059    2.0000              0.0015    1.5059    0.2694
    0.0015    3.2776    1.5632    2.0000              0.0015    1.5632    0.3528
    0.0015    3.0416    1.6856    2.0000              0.0015    1.6856    0.2784
    0.0015    3.3495    1.8566    2.0000              0.0015    1.8566    0.2783
    0.0015    3.7398    2.1915    2.0000              0.0015    2.1915    0.2385
    0.0015    3.9894    2.3113    2.0000              0.0015    2.3113    0.2474
    1.0000    4.4713    2.4880    2.0000                   -    2.4880    0.2758
----------------------------------------          ------------------------------

Number of layers: 20
Number of parameters: 59
Best model misfit: 0.0002
--------------------------------------------------------------------------------

from evodcinv.

79seismo commented on June 12, 2024

Thanks. I guess I have to reinstall the codes if you've updated them? increasing_velocity=True ensures that the inversion produces a best-fitting positive velocity gradient? So to ensure that I am not omitting structure with low velocity layers, I have to run a separate inversion with increasing_velocity=False and compare the misfits as a strategy?

Yes, I read your paper and that's what got me interested in your codes.

from evodcinv.

keurfonluu commented on June 12, 2024

Yes, please run the command pip install evodcinv -U to update the package.
I guess that can be a strategy as well. I would also recommend not to invert for that many layers due to the non-unicity of the inversion problem. In general, simpler models are preferable (e.g., 10 would be enough).

from evodcinv.

keurfonluu commented on June 12, 2024

You can look at the option "extra_terms" in "configure", this option allows to add additional misfit terms (e.g., to penalize models).

Here an example where we assume we have a linear prior model (Vs(z=0m)=0.1 km/s, Vs(z=40m)=3 km/s). with a regularization factor of 0.001 (can be tuned):

model = EarthModel()

for _ in range(10):
    model.add(Layer(0.003, [0.1, 3.0]))

def prior(x):
    vel = model.transform(x)
    z = np.insert(vel[:-1, 0].cumsum(), 0, 0.0)
    vprior = np.interp(z, [0.0, 0.04], [0.1, 3.0])

    return 1.0e-3 * np.square(vel[:, 2] - vprior).sum()

model.configure(
    optimizer="cpso",
    misfit="norm2",
    density=lambda vp: 2.0,
    optimizer_args={
        "popsize": 40,
        "maxiter": 500,
        "workers": -1,
        "seed": 0,
    },
    extra_terms=[prior],
)
curves = [Curve(period, group, 0, "rayleigh", "group")]
res = model.invert(curves, maxrun=3)

res.plot_curve(period, 0, "rayleigh", "group", show="best")
plt.semilogx(period, group, ls="--")

Note that I am here using a "norm2" misfit function.

--------------------------------------------------------------------------------
Best model out of 60000 models (3 runs)

Velocity model                                    Model parameters
----------------------------------------          ------------------------------
         d        vp        vs       rho                   d        vs        nu
      [km]    [km/s]    [km/s]   [g/cm3]                [km]    [km/s]       [-]
----------------------------------------          ------------------------------
    0.0030    0.4892    0.2925    2.0000              0.0030    0.2925    0.2217
    0.0030    0.9116    0.5113    2.0000              0.0030    0.5113    0.2705
    0.0030    1.2563    0.7523    2.0000              0.0030    0.7523    0.2205
    0.0030    1.7016    0.9394    2.0000              0.0030    0.9394    0.2808
    0.0030    2.1432    1.0716    2.0000              0.0030    1.0716    0.3333
    0.0030    2.3475    1.2724    2.0000              0.0030    1.2724    0.2920
    0.0030    2.9727    1.5176    2.0000              0.0030    1.5176    0.3238
    0.0030    3.1543    1.6221    2.0000              0.0030    1.6221    0.3203
    0.0030    3.5203    1.9024    2.0000              0.0030    1.9024    0.2938
    1.0000    4.4075    2.4936    2.0000                   -    2.4936    0.2646
----------------------------------------          ------------------------------

Number of layers: 10
Number of parameters: 29
Best model misfit: 0.0004
--------------------------------------------------------------------------------

from evodcinv.

79seismo commented on June 12, 2024

I am not sure if I am following what's happening within def prior(x). If you can add some comments, that'd be great. Ultimately, I'm guessing you are creating a misfit term that is norm2 + extra_terms, correct? In this case, rather than fixing the velocity profile to a prior, I think it's useful to add a penalty term such that the velocity jumps across layers are restricted. For example see the second term in eq(2) here: https://www.sciencedirect.com/science/article/pii/S0040195117302482. The philosophy behind this is that drastic velocity changes don't occur across layers particularly if you have a model parameterisation with fine layers.

from evodcinv.

keurfonluu commented on June 12, 2024

extra_terms is a list of functions that take a single input (the model parameter vector x) and returning a misfit value added to the main one (the one that measures the misfit with data points).

evodcinv gives you full control of the misfit function which can be written for a model parameter $\boldsymbol{\mathrm{m}}$:

$$ E(\boldsymbol{\mathrm{m}}) = \sum_{i=1}^{N_c} \omega_i f \left( \frac{ \boldsymbol{\mathrm{d}}_{obs, i} - g_i (\boldsymbol{\mathrm{m}})} {\boldsymbol{\mathrm{\sigma_i}}} \right) + \sum f_j (\boldsymbol{\mathrm{m}}) $$

with $N_c$ the number of data curves to invert, $\omega_i$ the weight associated to data curve $i$ (default 1), $f$ the misfit function (default RMSE), $\boldsymbol{\mathrm{d}}_{obs, i}$ the vector of data points for curve $i$, $g_i (\boldsymbol{\mathrm{m}})$ the data calculated by forward modeling operator $g_i$, $\boldsymbol{\mathrm{\sigma_i}}$ the vector of uncertainties associated to data point $i$ (default 1), and $f_j$ the extra misfit functions in extra_terms.

I just updated evodcinv (please update to version 2.2.0: pip install evodcinv -U). This version adds a small factory of extra misfit functions that can be used, including your roughness criterion which can be used as follows:

from evodcinv import factory

model = EarthModel()

for _ in range(20):
    model.add(Layer(0.0015, [0.1, 3.0]))

model.configure(
    optimizer="de",
    misfit="norm2",
    density=lambda vp: 2.0,
    optimizer_args={
        "popsize": 50,
        "maxiter": 500,
        "workers": -1,
        "seed": 0,
    },
    increasing_velocity=True,
    extra_terms=[
        lambda x: factory.smooth(x, alpha=1.0e-3),
        lambda x: factory.prior(x, [0.0, 0.04], [0.1, 3.0], alpha=1.0e-3)
    ],
)
curves = [Curve(period, group, 0, "rayleigh", "group")]
res = model.invert(curves, maxrun=3)

--------------------------------------------------------------------------------
Best model out of 75000 models (3 runs)

Velocity model                                    Model parameters
----------------------------------------          ------------------------------
         d        vp        vs       rho                   d        vs        nu
      [km]    [km/s]    [km/s]   [g/cm3]                [km]    [km/s]       [-]
----------------------------------------          ------------------------------
    0.0015    0.6590    0.2996    2.0000              0.0015    0.2996    0.3697
    0.0015    0.6590    0.3043    2.0000              0.0015    0.3043    0.3646
    0.0015    0.7904    0.4359    2.0000              0.0015    0.4359    0.2815
    0.0015    1.0767    0.5475    2.0000              0.0015    0.5475    0.3256
    0.0015    1.2365    0.6761    2.0000              0.0015    0.6761    0.2868
    0.0015    1.5246    0.7643    2.0000              0.0015    0.7643    0.3322
    0.0015    1.5583    0.8654    2.0000              0.0015    0.8654    0.2770
    0.0015    1.6613    0.9901    2.0000              0.0015    0.9901    0.2245
    0.0015    2.2096    1.1111    2.0000              0.0015    1.1111    0.3308
    0.0015    2.0777    1.2241    2.0000              0.0015    1.2241    0.2342
    0.0015    2.1615    1.3197    2.0000              0.0015    1.3197    0.2029
    0.0015    2.3499    1.4282    2.0000              0.0015    1.4282    0.2071
    0.0015    2.8160    1.5777    2.0000              0.0015    1.5777    0.2712
    0.0015    2.8547    1.7478    2.0000              0.0015    1.7478    0.2002
    0.0015    3.4989    1.8622    2.0000              0.0015    1.8622    0.3024
    0.0015    3.5972    1.9435    2.0000              0.0015    1.9435    0.2939
    0.0015    3.3907    2.0421    2.0000              0.0015    2.0421    0.2154
    0.0015    3.6506    2.1789    2.0000              0.0015    2.1789    0.2233
    0.0015    3.8436    2.3452    2.0000              0.0015    2.3452    0.2034
    1.0000    4.4124    2.4914    2.0000                   -    2.4914    0.2660
----------------------------------------          ------------------------------

Number of layers: 20
Number of parameters: 59
Best model misfit: 0.0014
--------------------------------------------------------------------------------

from evodcinv.

79seismo commented on June 12, 2024

That's brilliant! Thank you! We should write a paper together to demonstrate the Utility of evodcinv in different applications, not just geophysics!

from evodcinv.

79seismo commented on June 12, 2024

I think there should be an option to run "maxrun=" with different seeds. Does it make sense to run multiple inversions with the same seed? Shouldn't the starting point be different for different runs? Thanks!

from evodcinv.

79seismo commented on June 12, 2024

Thanks, can you please explain the functionality of "seed":0 in model.configure(). Is it for the first run?

from evodcinv.

keurfonluu commented on June 12, 2024

The purpose of this option is reproducibility, so that every time you run the same script, you get the same output. It's calling np.random.seed in the backend before the inversions.

from evodcinv.

79seismo commented on June 12, 2024

Hi, there are instances when an inversion run terminates before reaching 100% (i.e. population size x iterations) if the misfit approaches a small value, e.g. 10^-9, early on. In those instances, not all the model arrays in res (i.e. res.models[i]) are populated. Instead, beyond the model that approached the minimum misfit, all model arrays in res will be populated with zeros. It would be good to trim the res.models and everything else in res by deleting unused array locations that are filled with zeros, for example with np.where method (actually the nonzero value is len(res.xs)), so that res.threshold can be used with statistical measures without having to include intermediate code by hand. Thanks!

from evodcinv.

keurfonluu commented on June 12, 2024

I think the default minimum misfit value in stochopy is 1.0e-8, but I don't know how it's possible to get such a low misfit when inverting dispersion curves, unless you are inverting only few data points. Anyway, I will see what I can do, in the meantime, I would recommend to avoid such situation by setting ftol to 0.0 in optimizer_args (didn't try, but it should work).

from evodcinv.

79seismo commented on June 12, 2024

Here's an example dispersion curve I've tried producing 10^-9 misfit. As you can see, this spans lower than usual periods. I just realized that the inversion is fitting only the first couple of points of the dispersion curve and hence the very low misfit. Why would it do that?

length of period array: 100
array([0.00716448, 0.00727753, 0.00739235, 0.00750899, 0.00762746,
0.00774781, 0.00787006, 0.00799423, 0.00812036, 0.00824849,
0.00837863, 0.00851083, 0.00864511, 0.00878152, 0.00892007,
0.00906081, 0.00920377, 0.00934899, 0.0094965 , 0.00964633,
0.00979853, 0.00995314, 0.01011018, 0.0102697 , 0.01043173,
0.01059632, 0.01076351, 0.01093334, 0.01110585, 0.01128107,
0.01145907, 0.01163987, 0.01182352, 0.01201007, 0.01219957,
0.01239205, 0.01258758, 0.01278618, 0.01298792, 0.01319285,
0.013401 , 0.01361245, 0.01382722, 0.01404539, 0.014267 ,
0.0144921 , 0.01472076, 0.01495302, 0.01518895, 0.0154286 ,
0.01567204, 0.01591931, 0.01617049, 0.01642562, 0.01668479,
0.01694804, 0.01721545, 0.01748707, 0.01776298, 0.01804325,
0.01832794, 0.01861711, 0.01891086, 0.01920923, 0.01951231,
0.01982018, 0.0201329 , 0.02045056, 0.02077323, 0.02110099,
0.02143392, 0.02177211, 0.02211563, 0.02246457, 0.02281902,
0.02317905, 0.02354477, 0.02391626, 0.02429361, 0.02467692,
0.02506627, 0.02546177, 0.02586351, 0.02627158, 0.02668609,
0.02710715, 0.02753484, 0.02796929, 0.02841059, 0.02885885,
0.02931419, 0.02977671, 0.03024652, 0.03072376, 0.03120852,
0.03170092, 0.0322011 , 0.03270917, 0.03322526, 0.03374949])

length of group velocity array: 100
array([0.04187945, 0.0522467 , 0.06277752, 0.0734745 , 0.08434025,
0.09537744, 0.11581332, 0.13734638, 0.15272252, 0.15786487,
0.16308835, 0.16839425, 0.17378387, 0.17925852, 0.18481956,
0.19046833, 0.19620623, 0.19589644, 0.19506623, 0.19422292,
0.1933663 , 0.19249617, 0.1916123 , 0.19071449, 0.18980252,
0.18887616, 0.18793518, 0.18697935, 0.18600845, 0.18502222,
0.18402043, 0.18300284, 0.18196919, 0.18091923, 0.17856765,
0.17617895, 0.17375257, 0.17128791, 0.16878436, 0.16855932,
0.17066695, 0.17280784, 0.17498251, 0.17719149, 0.17943532,
0.18171456, 0.18402976, 0.18638148, 0.18877032, 0.19119684,
0.19366165, 0.19616535, 0.20207335, 0.20888199, 0.21579806,
0.22282325, 0.22995929, 0.23720792, 0.24110632, 0.24461078,
0.24817054, 0.25178646, 0.25545943, 0.25919035, 0.26298014,
0.26682973, 0.27074005, 0.27471208, 0.27874677, 0.28262544,
0.28103002, 0.27940942, 0.27776326, 0.27609112, 0.2743926 ,
0.27266729, 0.27091474, 0.26913455, 0.26732627, 0.2660577 ,
0.2660577 , 0.2660577 , 0.2660577 , 0.2660577 , 0.2660577 ,
0.2660577 , 0.2660577 , 0.26676526, 0.26941959, 0.27211581,
0.27485457, 0.27763654, 0.2804624 , 0.28333285, 0.28624859,
0.28921034, 0.29221882, 0.29527476, 0.29837892, 0.30153206])

from evodcinv.

79seismo commented on June 12, 2024

Thanks, that appeared to have fixed the issue. Also, I realized that dispersion curve predictions are not stored during the inversion and are recalculated within the plotting routines. This is a bit inefficient as evodcinv is then computing the same dispersion curves twice. Why not store dispersion curves within the inversion itself and use them in the plotting routines?

from evodcinv.

79seismo commented on June 12, 2024

The first reason is API limitation: most optimization libraries (e.g., scipy), if not all, simply only allow to calculate and store the misfit value for a model. Although I am the developer of stochopy, I would need to make some substantial changes in the API to allow to store more than that. The second reason is that evodcinv only calculates the dispersion velocities at discrete input periods. When plotting results, we usually want to plot the "full" dispersion curves (i.e., at periods different that of the data points), so storing the dispersion curves calculated during the inversion doesn't look as useful given that constraint. Anyway, after some benchmarks, the performance issues of the plotting routines are not due to the recalculation of the dispersion curves (it should only take a few seconds, especially after thresholding), there seem to be some issues with sequential plotting of thousands of lines with matplotlib.

Thanks for that explanation.

from evodcinv.

79seismo commented on June 12, 2024

Are there reasons why a run would quit half way through other than approaching a machine precision minimum as discussed before?

from evodcinv.

keurfonluu commented on June 12, 2024

It depends on the algorithm I guess. For instance, for Differential Evolution, it may be caused by a lack of diversity in the population at a given iteration, i.e., all the models are too similar so recombining models would be useless.

from evodcinv.

79seismo commented on June 12, 2024

@keurfonluu is it possible to assemble the minimum misfit models across from multiple runs instead of assembling the lowest misfit models from the run that has the minimum misfit? The former should give you more precise models than the latter especially if the minimum misfit is not that different across multiple runs.

from evodcinv.

79seismo commented on June 12, 2024

Thanks, exactly what I was looking for.

from evodcinv.

Output models about evodcinv HOT 44 OPEN

Comments (44)

Related Issues (11)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent