ds2010 / pystoned Goto Github PK

View Code? Open in Web Editor NEW

28.0 4.0 16.0 36.21 MB

A Python Package for Convex Regression and Frontier Estimation

Home Page: https://pystoned.readthedocs.io

License: GNU General Public License v3.0

Python 6.78% Jupyter Notebook 93.22%

python stoned cnls cer cqr dea fdh icnls z-variables

pystoned's Introduction

pyStoNED

pyStoNED is a Python package that provides functions for estimating multivariate convex regression, convex quantile regression, convex expectile regression, isotonic regression, stochastic nonparametric envelopment of data, and related methods. It also facilitates efficiency measurement using the conventional data envelopement analysis (DEA) and free disposable hull (FDH) approaches. The pyStoNED package allows practitioners to estimate these models in an open access environment under a GPL-3.0 License.

Installation

The pyStoNED package is now avaiable on PyPI and the latest development version can be installed from the Github repository pyStoNED. Please feel free to download and test it. We welcome any bug reports and feedback.

PyPI

pip install pystoned

GitHub

pip install -U git+https://github.com/ds2010/pyStoNED

Authors

Sheng Dai, PhD, Turku School of Economics, University of Turku.
Yu-Hsueh Fang, Computer Engineer, Institute of Manufacturing Information and Systems, National Cheng Kung University.
Chia-Yen Lee, Professor, College of Management, National Taiwan University.
Timo Kuosmanen, Professor, Turku School of Economics, University of Turku.

Citation

If you use pyStoNED for published work, we encourage you to cite our following paper and other related works. We appreciate it.

Dai S, Fang YH, Lee CY, Kuosmanen T. (2021). pyStoNED: A Python Package for Convex Regression and Frontier Estimation. arXiv preprint arXiv:2109.12962.

pystoned's People

Contributors

Stargazers

Watchers

Forkers

shironz szhao33 carlosvneves xun90 xiaojiewen trendingtechnology advancehs uibewulaoer zhiqiangliao kevinlu43 diligentwang1998 fangop geapa jingmouren shadowfaxjc jeaniek

pystoned's Issues

StoNED and Plot2d/3d: can not plot the StoNED frontier

Hi @JulianaTa, I found we can not plot the StoNED frontier using the plot. It should be OK. Please check the following error.

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-14-cfd06442dd17> in <module>
      2 rd = StoNED.StoNED(model)
      3 model_new = rd.get_frontier(RED_MOM)
----> 4 plot2d(model_new, x_select=0, label_name="StoNED frontier", fig_name="stoned_2d")

C:\Anaconda3\lib\site-packages\pystoned\plot.py in plot2d(model, x_select, label_name, fig_name)
     15         fig_name (String, optional): The name of figure to save. Defaults to None.
     16     """
---> 17     x = np.array(model.x).T[x_select]
     18     y = np.array(model.y).T
     19     if y.ndim != 1:

AttributeError: 'numpy.ndarray' object has no attribute 'x'

I have tried to add the following line to StoNED.

pyStoNED/pystoned/StoNED.py

Line 17 in b673006

self.x = model.x

But it still does not work. Could you please help to fix it? Many thanks in advance!

Sheng

CNLSG: return then error when using the local solver

Hi @JulianaTa, it seems that there is another bug in line 122 CNLSG . I have used the CNLSG to estimate the multiplicative cost function using a local solver MINOS, but it returns the following error:

File "/home/dais2/anaconda3/lib/python3.8/site-packages/pystoned/CNLSG.py", line 122, in __convergence_test self.Active2[i, j] = - alpha[i] - np.sum(beta[i, :] * x[i, :]) + \ TypeError: bad operand type for unary -: 'NoneType'.

Interestingly, when I using the 'NEOS' to solve the same model, there is no error, and I can receive the final estimation results.
Further, there is no problem when we estimate the additive production function using the local solver MOSEK.

Could you please help to check and fix it? Many thanks! For your convenience, please see the following example:

Example

import numpy as np
import pandas as pd
from pystoned import CNLSG
from pystoned.constant import CET_MULT, FUN_COST, OPT_LOCAL, RTS_VRS


url='https://raw.githubusercontent.com/ds2010/pyStoNED/master/pystoned/data/electricityFirms.csv'
df = pd.read_csv(url, error_bad_lines=False)

# output
y = df['TOTEX']

# inputs
x1 = df['Energy']
x1 = np.asmatrix(x1).T
x2 = df['Length']
x2 = np.asmatrix(x2).T
x3 = df['Customers']
x3 = np.asmatrix(x3).T
x = np.concatenate((x1, x2, x3), axis=1)

model = CNLSG.CNLSG(y, x, z=None, cet=CET_MULT, fun=FUN_COST, rts=RTS_VRS)
model.optimize(OPT_LOCAL)

model.display_beta()

Evaluate the effect of class method

Hi guys, this provides a sample of class method.
This evaluation aimed to know the difference of time used in python class approach and pure function approach.

from CNLS import cnls
from StoNED import stoned
import pandas as pd
import numpy as np
import time

class StoNED:
    def __init__(self, x, y, cet, fun, rts):
        self.model = cnls(y, x, cet, fun, rts)
        # using remote solver (NEOS)
        from pyomo.environ import SolverManagerFactory
        solver_manager = SolverManagerFactory('neos')
        self.results = solver_manager.solve(self.model, opt='knitro', tee=True)
        self.val = list(self.model.e[:].value)
        self.eps = np.asarray(self.val)
        self.fun = fun
        self.cet = cet
        self.y = y
        self.x = x
    def technical_efficiency(self, method="MoM"):
        self.TE = stoned(self.y, self.eps, self.fun, method, self.cet)
        return self.TE


def test_class(x, y, cet, fun, rts, method):
    model = StoNED(x,y, cet, fun, rts)
    model.technical_efficiency(method)
    
def test_non_class(x, y, cet, fun, rts, method):
    model = cnls(y, x, cet, fun, rts)
    from pyomo.environ import SolverManagerFactory
    solver_manager = SolverManagerFactory('neos')
    results = solver_manager.solve(model, opt='knitro', tee=True)
    val = list(model.e[:].value)
    eps = np.asarray(val)
    TE = stoned(y, eps, fun, method, cet)


if __name__ == "__main__":
    url = 'https://raw.githubusercontent.com/ds2010/pyStoNED-Tutorials/master/Data/firms.csv'
    df = pd.read_csv(url, error_bad_lines=False)
    df.head(5)
    y  = df['TOTEX']
    
    x1  = df['Energy']
    x1  = np.asmatrix(x1).T
    x2  = df['Length']
    x2  = np.asmatrix(x2).T
    x3  = df['Customers']
    x3  = np.asmatrix(x3).T
    x   = np.concatenate((x1, x2, x3), axis=1)

    cet = "mult"
    fun = "cost"
    rts = "crs"
    method = "MoM"

    start_time = time.monotonic()
    for i in range(10):
        test_class(x,y,cet,fun,rts,method)
    end_time = time.monotonic()
    class_total_time = (end_time - start_time)/100

    start_time = time.monotonic()
    for i in range(10):
        test_non_class(x,y,cet,fun,rts,method)
    end_time = time.monotonic()
    non_class_total_time = (end_time - start_time)/100

    print("class method average time: "+str(class_total_time))
    print("non class method average time: "+str(non_class_total_time))

Here is some experiment result:

class method average time: 17.53017269
non class method average time: 19.347612973

class method average time: 10.874256837999999
non class method average time: 24.106178928

class method average time: 16.953747197
non class method average time: 14.307711206

As expected, there is no absolute relation between different approaches, since the main time consuming part is the calculations.

Solver Binding Error

Hello. Great work. I have been looking for something like this for a while.

I am trying to run some examples but I am facing some issues with bindings ro the solver. Error message:

"No Python bindings available for <class 'pyomo.solvers.plugins.solvers.mosek_direct.MOSEKDirect'> solver plugin"

Any hints on how to solve this?

doc(API): The list of the outdated API doc

The API doc is vital for users as reference.
Here are some pages outdated/need to be fixed:

Built-in constant: the built-in constant is refer to constant rather than inner modules.
Inner modules is need for contributors
Formulations Classes
- StoNED: Maybe StoNED does not belong to this category anymore.
- Plot: Same as StoNED

I'll take the responsibility to rearrange this part.
Once the documents are updated, I'll close this issue.

API documentations

The new pr #23 (Autodoc) works well locally but does not on the ReadTheDocs. You can check the CNLS API in the website generated by ReadTheDocs. It is empty.
However, if we compile the sphinx locally using make html, the docstring will show in the HTML file. See the following screenshot.

I failed to fix it. Since the website is automatically generated by the ReadTheDocs, @JulianaTa , could you please help me to fix it? Thanks in advance!

The enhancement of the Plot method

Since the StoNED is initialized with only formulation classes, we may revise the plot methods for consistency.

typos

Hello, your open source package has helped me a lot. But I found a problem, I think the picture above should be lamda instead of lamda**2

doc(Example): The list of the missing examples

The document is vital for users as reference.
Here are some pages missing:

Monotonic Models
- ICNLS
- ICQER
Free Disposal Hull
Optimization with genetic algorithm
- CNLSG
- CQERG
Plot methods

Once the documents are updated, we can close this issue.

StoNED: can not get unconditional expected inefficiency

Hi @JulianaTa , It seems that there is a bug in StoNED.py when calculating the unconditional expected inefficiency. Please check the following error and fix it. Thanks in advance!

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-4-8d44572d25fb> in <module>
      1 # retrive the unconditional expected inefficiency \mu
      2 rd = StoNED.StoNED(model)
----> 3 print(model.get_unconditional_expected_inefficiency('KDE'))

AttributeError: 'CNLS' object has no attribute 'get_unconditional_expected_inefficiency'

feat(CNLS/CNLSDDF): Implementation of get_frontier.

The get_frontier function is for getting the value of estimated frontier(y value) by CNLS/CNLSDDF.
Here is the some thought for better implementation of get_frontier.
Please help me justify if my thought have some logical error.

Since true y value = estimated y value + residual for additive models, we may implement the frontier like below:

CNLS

The fallowing y refer to the true y value; frontier refer to estimated y value.

Additive

frontier = y - residual

Multiplicative

frontier = y/(exp(residual)) -1

CNLSDDF

The fallowing y refer to the true y value; frontier refer to estimated y value.

frontier list = y list - residual list

Fail to "return False" in CNLS class

Hi @JulianaTa,

It seems there is a return error in the CNLS class. When we use the local solver to estimate the multiplicative model, the Class should print "Estimating the multiplicative model will be available in near future." and return False. But actually, the class first prints the warning and then continues to calculate the model (perhaps using the remote solver). Finally, we can obtain the model estimates. Please see the following example:

from pystoned import StoNED
import pandas as pd
import numpy as np
# import Finnish electricity distribution firms data 
url = 'https://raw.githubusercontent.com/ds2010/pyStoNED-Tutorials/master/Data/firms.csv'  
df = pd.read_csv(url, error_bad_lines=False)

# output (total cost)
y  = df['TOTEX']

# inputs 
x1  = df['Energy']
x1  = np.asmatrix(x1).T
x2  = df['Length']
x2  = np.asmatrix(x2).T
x3  = df['Customers']
x3  = np.asmatrix(x3).T
x   = np.concatenate((x1, x2, x3), axis=1)

using local solver (suppose we have one)

# build and optimize
instance = StoNED.StoNED(y, x, z=None, cet = "mult", fun = "cost", rts = "crs")
instance.optimize(remote=True)
print(instance.get_technical_inefficiency(method='QLE'))

using remote solver

# build and optimize
instance = StoNED.StoNED(y, x, z=None, cet = "mult", fun = "cost", rts = "crs")
instance.optimize(remote=False)
print(instance.get_technical_inefficiency(method='QLE'))

If I understand correctly, when choosing the remote=False, the class should print the warning and return False. So, I think we have to fix it.

Error: an error occured when optimizing with remote solver

Describe the error

My code:

from pystoned import CNLS
from pystoned.constant import CET_ADDI, FUN_PROD, RTS_VRS

# define the CNLS model
model = CNLS.CNLS(y_tr, x_tr, z=None, cet = CET_ADDI, fun = FUN_PROD, rts = RTS_VRS)
# solve the model with remote solver
model.optimize('[email protected]')

The result:

Estimating the additive model remotely with mosek solver.
ERROR: Error parsing NEOS solution file  NEOS log: Job 13110247 dispatched
password: IcCbDRGl
    ---------- Begin Solver Output -----------
    Condor submit: 'neos.submit' Condor submit: 'watchdog.submit' Job
    submitted to NEOS HTCondor pool.
Traceback (most recent call last):
  File "/home/zhiqiang/.local/lib/python3.10/site-packages/pyomo/opt/plugins/sol.py", line 41, in __call__
    return self._load(f, res, soln, suffixes)
  File "/home/zhiqiang/.local/lib/python3.10/site-packages/pyomo/opt/plugins/sol.py", line 83, in _load
    raise ValueError("no Options line found")
ValueError: no Options line found

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/zhiqiang/.local/lib/python3.10/site-packages/pyomo/neos/plugins/kestrel_plugin.py", line 219, in _perform_wait_any
    solver_results = opt.process_output(rc)
  File "/home/zhiqiang/.local/lib/python3.10/site-packages/pyomo/opt/solver/shellcmd.py", line 396, in process_output
    results = self._results_reader(
  File "/home/zhiqiang/.local/lib/python3.10/site-packages/pyomo/opt/plugins/sol.py", line 45, in __call__
    raise ValueError(
ValueError: Error reading '/tmp/tmphe6jan6i.neos.sol': no Options line found.
SOL File Output:
ERROR: An error occured with your submission.

ERROR: ERROR: An error occured with your submission.

doc(Example): The list of the outdated examples

The document is vital for users as reference.
Here are some pages updated:

Stochastic Nonparametric Envelopment of Data
- Method of Moments
- Quassi-likelihood estimation
- Kernel deconvolution estimation
Multiple Outputs (DDF Formulation)
- CNLS with multiple outputs
- CQR (CER) with multiple outputs
Data Envelopment Analysis
- Radial model: Input orientation
- Radial model: Output orientation