Project, etc are currently member functions of histogram. This is breaking the interfa

project, shrink, rebin interface as in C++,about scikit-hep/boost-histogram

Comments (21)

henryiii commented on May 25, 2024

They are free functions in Boost::Histogram because that allows someone to either not use them or use different versions. In Python, you can't opt out or add other items. Also, you can't have templated functions, so the only way to do this would be to register N overloads, where N is the number of histograms, and have Python do a lookup each time. This method version is faster, and more Pythonic as well; you expect to find the tools you need in the class rather than as free-standing functions.

There are other differences in the API as well, for example, we can't match "indexed"s dereference behavior, that's not valid in Python.

Also, reduce can't be implemented property because it needs to support an unlimited number of arguments, which can't be done in Python except by chaining (so might as well chain in Python instead).

from boost-histogram.

henryiii commented on May 25, 2024

PS: we could fake it, however, by adding freestanding functions that are pure Python that call the methods on the classes. Just a thought.

from boost-histogram.

HDembinski commented on May 25, 2024

I see how it was more convenient to implement in this way, but we should keep the interfaces consistent. We can make functions with infinite numbers of keyword arguments in pybind11. The Python reduce function can analyse the type of the histogram argument and the call the correct templated form of C++ reduce, which can be registered together with the C++ histogram type.

from boost-histogram.

HDembinski commented on May 25, 2024

Also, you can't have templated functions, so the only way to do this would be to register N overloads, where N is the number of histograms, and have Python do a lookup each time. This method version is faster, and more Pythonic as well; you expect to find the tools you need in the class rather than as free-standing functions.

When you register N histograms, you also register N versions of project, rebin etc. So I don't quite see the advantage. The method version is not measurably faster. If you claim that, you should prove it. Even if you can prove it, any difference in performance doesn't matter, since these functions/methods are not called in hot loops.

Interface consistency is a very high value and may only be broken for very strong reasons. The reasons you mention are not strong enough. Before a release of Boost-Histogram, these methods need to be removed or made private, by adding a _.

from boost-histogram.

henryiii commented on May 25, 2024

Why do you want exact interface matching for project and the other free functions, but not for make_histogram? There are several good reasons to avoid free functions that can only take a histogram as their first argument a member function in Python that are not (as) valid in C++. I think having dual interfaces would be in line with what both numpy and pandas do, and would be reasonable. For example, you can call np.sum(x) or x.sum().

In Python, most users will probably type hist.<tab> to see what they can do with the histogram.
In Python, there is no benefit to having these be free functions; unlike C++ (argument mentioned above)
In Python, you don't have strong typing, so it's not clear that this only works on histograms, unlike C++
We can't support variadic reduce; so pretending it is the same by making it look the same is probably a bad idea.

I was going to propose we prove a small python module, boost.histogram.algorithm, that would have all the free-standing functions from the boost histogram library. An example of one:

def project(hist, *args, **kargs):
    hist.project(*args, **kargs)

from boost-histogram.

henryiii commented on May 25, 2024

Another reason for the method version: If we provide users a way to add histograms through a separate extension module, the method version would still work, but an external function would not - they would not be the same function, and you can't add to the lookup list from a different compiled module.

from boost-histogram.

HDembinski commented on May 25, 2024

Python has strong typing, this is a common misconception. Strong typing means that each variable has a well-defined type at any given time. In Python, "2" * "3" doesn't work because of strong typing, while it would produce "6" in other languages.

from boost-histogram.

HDembinski commented on May 25, 2024

I see the argument regarding hist.<tab>, but we agreed at some point that we want boost.histogram to be a one-to-one mapping of the C++ interface, while hist was supposed to provide a different interface. In my view, you can do all you want in hist, but boost-histogram - if it mirrors a C++ interface at all - it should mirror it according to some simple rules that fit on a napkin. I want to be able to read some C++ example of Boost::Histogram and then be able to rewrite this in Python, following these rules.

You are breaking the C++ / Python interface symmetry without a need and it is in contradiction to what we agreed upon for this project. Furthermore, it is bad design to write redundant interfaces, that's why I don't like to have things as free functions and methods.

from boost-histogram.

HDembinski commented on May 25, 2024

The interface transfer rules from C++ and Python are as follows:

If an interface is provided by C++ and Python, it should have a simple rule-based mapping.
If an interface is only provided by C++ or only provided by Python, then this rule obviously doesn't apply. We do not want to map all of the C++ interface Python. Similarly, the Python layer may contain interface that is Python-only.

Regarding the mapping rules, the shorter the list the better, and there should be no exceptions.
I am open to discuss the actual mapping rules. For example, hist(1, 2, weight(2)) should obviously map to hist(1, 2, weight=2). Similarly, reduce(hist, shrink(0, 2.3, 4.5) should map to reduce(hist, shrink=(0, 2.3, 4.5)).

from boost-histogram.

HDembinski commented on May 25, 2024

numpy and pandas do, and would be reasonable. For example, you can call np.sum(x) or x.sum().

They do that, but it is just bloating the interface. It makes the interface harder to learn. To me this looks like a historic mistake and not by design.

from boost-histogram.

HDembinski commented on May 25, 2024

Why do you want exact interface matching for project and the other free functions, but not for make_histogram?

I think I told you the reason in another issue. make_histogram is only a workaround, because C++14 doesn't have deduction guides. It makes no sense to map that to Python. make_histogram will become a relic very soon.

from boost-histogram.

henryiii commented on May 25, 2024

So what would bh::reduce(hist, bh::shrink(0, 2.3, 4.5), bh::shrink(1, 3.2, 4.3)) become? Pretending it is the same by making it look the same is probably a bad idea, when they are not really the same. boost-histogram needs to be a Python library and should behave like a Python library, not a perfect 1:1 translation of Boost::Histogram. When possible, the syntax should be similar, but you can't treat Python exactly like C++. Filling won't match. hist.shrink(0, 1, 2) reads like bh::shrink(hist, 0, 1, 2), and is more natural in Python. Python does have strong typing (I mispoke), but it uses duck typing, so there's no indication in Python and no way for Juptyer or an IDE to help a user if the function has a limited selection or a specific type for the first argument - this is why methods are used. This would not be true for a Julia binding of Boost::Histogram - that does expect to have free functions.

The napkin rule is simple: Free functions on single histograms are methods. This is true for fill, for reduce, etc.

from boost-histogram.

henryiii commented on May 25, 2024

Remember: Hist will add lots of other requirements, like matplotlib and things like that. This should become the "numpy" of histogramming, and hist should become the "pandas" of histogramming.

And, I'm targeting this to become the backend for other packages, like Physt. If it's hard to use and unpythonic, it will not be used by anyone but us (in Hist). It will try not to name things differently from Boost::Histogram, but it will translate things that are "C++-like" to Pythonic methods. I believe transitioning between them will still be quite easy.

I rather expect most users who use the Python version will not know how to use the C++ version - otherwise, why not use that? I would love to have the C++ and Python examples side-by-side in the docs, though, will look into that.

from boost-histogram.

HDembinski commented on May 25, 2024

Functions that don't modify the state of the object argument should be free functions and not member functions.

https://stackoverflow.com/questions/5989734/effective-c-item-23-prefer-non-member-non-friend-functions-to-member-functions

from boost-histogram.

henryiii commented on May 25, 2024

That is exactly what I want to do, if you read the argument and then convert it to Python, which you have to do.

That's arguing that non-friend, non-member functions are best, and the reason is they depend only on the public API. I fully agree this is good C++. But in PyBind11, we have to have "an internal dependency" on the object, because PyBind11 only compiles one version of the Python method or function call. Making it a free function makes it look like you can put arbitrary objects in as the first argument, which you can't in Python. If we could, like in C++, implement this using only public, non-hidden methods (the closest thing Python has to the above, since it does not even have the concept of friend, protection, etc. that C++ does), then I would agree that it could be a free function. That's, in fact, exactly the reasoning behind making these non-hidden methods while also providing a free function form.

Just apply the logic to Python: You should be able to use any histogram libraries any free function we provide, as long as the histogram library provides the same public API.

from boost-histogram.

HDembinski commented on May 25, 2024

You argue about technical points, but we are discussing design.

The decision to make reduce in C++ a free function was not a technical decision, but a design decision. reduce touches the internal state of the histogram. It would have been easier to implemented it as a member function, but I made it a free function because it is better design. It doesn't modify the internal state of the input.

The same design arguments also apply to Python. You claim in Python it is all different, but I don't agree. len(...) is a function, not a method. You cannot use len(...) on any object, only on sequences. Making something a free function doesn't imply that you can use it on any object. I am sure we can technically implement reduce as a free function if we really want to. It may not be as easy for us to write, but we can achieve it if we really want.

Here is a compromise: all project, shrink, rebin are removed, and only the short-hand is implemented that we developed together, which is anyway nicer. We only offer this interface in Python are done with it.

from boost-histogram.

henryiii commented on May 25, 2024

What about sum? Every user in python will expect hist.sum() to work. Just like they will type hist.<tab> and look to see what they can do. We might be able to make np.sum work without supporting np.asarray (at least for a very recent version of numpy?), but we absolutely cannot make a bh.sum. from boost.histogram import * is not recommended, but it should not break the built-in sum, or from numpy import * or from pylab import *. (Note: The numpy sum carefully does mimic the built-in sum for this reason). This is the point; PyBind11 functions are tied to the classes, so they should be methods. You should not provide them free, because they only work on a list of predefined classes. Otherwise, why have any methods at all?

And, hist.sum(flow=True) should also work. All methods that optionally can access flow bins have a flow keyword parameter. This is a design decision in Python.

If we design a Python library, even one based on a C++ library, we need to follow standard design patterns for Python. Methods are one of them. We can match in names, but not in free functions vs. methods. Just like we have keyword arguments when C++ doesn't. This is one of the advantages of explicitly designing and binding everything instead of handing the code to SWIG or CPPYY.

from boost-histogram.

henryiii commented on May 25, 2024

Not providing these, and just providing index access is possible. We can work on that for now. The one thing you lose in the shortcut method is it's hard to reduce or project a programmatically provided axis; If I ask you to project axis x, you would have to build a tuple of slices to do it. Even if you know it's axis 3, for example, you would still need to write [:,:,::bh::project,...] instead of giving an integer.

from boost-histogram.

HDembinski commented on May 25, 2024

After private discussion the compromise is to have both reduce(hist, <options>) and hist.reduce(<options>). Similar for all other free functions in C++ boost::histogram.

from boost-histogram.

HDembinski commented on May 25, 2024

The free functions in Python just call the member functions and are provided for seamless transition between Python and C++ code for those people who work a lot in both languages. In interactive Python and modern IDEs, member functions provide the hist.<tab> completion feature that allow one to quickly discover everything that can be done with a histogram.

This dual approach is a bit redundant, but it mimics the design decision for NumPy and is therefore familiar for Python users.

from boost-histogram.

henryiii commented on May 25, 2024

I think this has been fully implemented. Reopen if there are still any problems.

h = histogram(regular(20, 1, 5))
hs = bh.algorithm.reduce(h, bh.algorithm.shrink(0, 1, 2))
# OR
hs = h.reduce(bh.algorithm.shrink(0, 1, 2))

(Note: there was bug in this that I just fixed. Fix will be merged later and the version bumped to 0.40)

from boost-histogram.

project, shrink, rebin interface as in C++ about boost-histogram HOT 21 CLOSED

Comments (21)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent