Comments (7)
There was no intended change of behavior in JPype. But we had to change mechanisms. If the new mechanism is triggering a referencing problem, please try to replicate.
from ixmp.
Totally understood, and thank you for the comment.
As the description says, we can try to condense a MWE from our somewhat complicated code base—only, no bandwidth to do this right now. This issue is just to put a pin in the matter until we manage to find time.
from ixmp.
Can you give me a rough description of your test?
For example, something like this:
- Create a Java object instance
- Check the ref count is 1
- Set up a reference loop
- Delete the original reference
- Call garbage collector
- Check if hash of item appears in GC list
(which is incidentally one of the patterns that I check).
Most important details....
- What is the type of objects that was created (class, object, primitive, proxy, string, exception)?
- How was it structurally in relationships to others? (we created a class, then an object, then a proxy, then placed the proxy in a container, etc)
- How was it determined if the object was leaking?
- Does the leak checker fail on one version or multiple? (We were forced to change patterns for Py 3.11, but the older ones could use a different pattern?
- Last, was an exception part of the process? (We changed both the value creation and the exception frame generator. Both are suspects as the gc could be triggering on a Frame creation rather than a JPype object create.)
We have built in leak checkers for objects, strings, and primitives (brute force create a lot and make sure they go away), but that can miss structural defects in which the order, timing, or connections of an object causes an issue.
from ixmp.
With the release of JPype 1.5.0, this has become relevant again since all our workflows on Pyhon < 3.11 are breaking. This is because the temporary workaround specifies pip install "JPype1 != 1.4.1"
, which is also satisfied by installing 1.5.0. Interestingly, 1.5.0 also works for Python 3.11. For the sake of completeness, there are also other other minor version changes from the last working runs to the first failed ones: ipykernel went from 6.27.1 to 6.28.0, jupyter-core from 5.5.1 to 5.6.0, and jupyter-server-terminals from 0.5.0 to 0.5.1. However, they all seem unrelated, especially seeing that the error messages are exactly the same again as described above here.
So a path to mitigation would be clear: instead of temporarily enforcing JPype1 != 1.4.1
, enforce JPype1 < 1.4.1
or even JPype1 == 1.4.0
, both of which would not automatically test if potential newer versions than 1.5.0 might resolve this issue. So a proper fix would be desirable and since you've taken an interest before, @Thrameos, maybe you could take another look? I'm not familiar with Java, unfortunately, but I'll do my best to break down our test to answer your questions. Any help is much appreciated :)
from ixmp.
TL;DR: this has gotten and taken much longer than anticipated. For my money, the issue can likely be explained by understanding the changes you made for Python 3.11. Jpype1 1.5.0 (and 1.4.1, for that matter) work like expected for Python 3.11, it's only failing on older versions on our side.
The first failing test is located in ixmp/tests/backend/test_base.py:
def test_del_ts(self, test_mp):
"""Test CachingBackend.del_ts()."""
# Since CachingBackend is an abstract class, test it via JDBCBackend
backend = test_mp._backend
cache_size_pre = len(backend._cache)
# Load data, thereby adding to the cache
s = make_dantzig(test_mp)
s.par("d")
# Cache size has increased
assert cache_size_pre + 1 == len(backend._cache)
# Delete the object; associated cache is freed
del s
# Objects were invalidated/removed from cache
assert cache_size_pre == len(backend._cache)
self
is an instance of the test class and should not matter, but test_mp
is an ixmp.core.platform.Platform object. This is a python class in our modelling framework that connects model
s to a data-storing backend
. The backend
, in turn, is defined in the test_mp
fixture as an instance of the JDBC backend. This is another python class (inheriting from CachingBackend) that uses JPype/JDBC to connect to an Oracle or HyperSQL database. In our test case, if that's relevant, we are using a local HyperSQL database. backend._cache
is just a dictionary, then.
Next, the function make_dantzig
creates a scenario containing the data of the Dantzig transportation problem. The line s.par("d")
accesses the data of the distance parameter d
, which looks (in python) something like this:
i j value unit
seattle new-york 2.5 km
seattle chicago 1.7 km
seattle topeka 1.8 km
san-diego new-york 2.5 km
san-diego chicago 1.8 km
san-diego topeka 1.4 km
(Just noticing: these distances are supposed to represent thousands of miles rather than km, so our units are wrong here, but that hardly matters for the error, I think.)
On the python side, this is stored in the backend._cache
dict such that one value is added, which is confirmed by the first successful assert
line. Then, we are deleting this whole scenario s
again and expect this action to free the backend._cache
, too, which is not happening for JPype1 >= 1.4.1.
The del
function has been overwritten for the TimeSeries
class, which the Scenario
class inherits from, so del s
will call the JDBCBackend del_ts
function, which first calls the general CachingBackend.del_ts()
, which just calls its own cache_invalidate()
. This, finally, should remove all values from the backend._cache
dict (but might return None
if some key could not be found). Back in JDBCBackend.del_ts()
, the last step is to call JDBCBackend.gc()
, which shows some explicit mention of jpype:
def gc(cls):
if _GC_AGGRESSIVE:
# log.debug('Collect garbage')
try:
java.System.gc()
except jpype.JVMNotRunning:
pass
gc.collect()
Lastly, the overwriting of the del
function only calls this whole thing as part of a try clause, excepting (AttributeError, ReferenceError)
and moving on since we assume the object has already been garbage-collected. So if one of these is raised without backend._cache
being emptied, this would probably result in exactly the error we are seeing.
(Not sure this can interfere here, but platform.__getattr__
has been overwritten and could raise an AttributeError
.)
On the java side, I can't tell you much about what's happening, I'm afraid. When the Scenario
object is initialized, it also initializes the JDBCBackend
, which does things like start_jvm()
, and
try:
self.jobj = java.Platform("Python", properties)
except java.NoClassDefFoundError as e: # pragma: no cover
raise NameError(
f"{e}\nCheck that dependencies of ixmp.jar are "
f"included in {Path(__file__).parents[2] / 'lib'}"
)
except jpype.JException as e: # pragma: no cover
# Handle Java exceptions
jclass = e.__class__.__name__
if jclass.endswith("HikariPool.PoolInitializationException"):
redacted = copy(kwargs)
redacted.update(user="(HIDDEN)", password="(HIDDEN)")
msg = f"unable to connect to database:\n{repr(redacted)}"
elif jclass.endswith("FlywayException"):
msg = "when initializing database:"
if "applied migration" in e.args[0]:
msg += (
"\n\nThe schema of the database does not match the schema of "
"this version of ixmp. To resolve, either install the version "
"of ixmp used to create the database, or delete it and retry."
)
else:
_raise_jexception(e)
raise RuntimeError(f"{msg}\n(Java: {jclass})")
, so it is checking for some exceptions. The next step, then, finally gets deep into java territory:
s.par("d")
calls s._backend("item_get_elements", "par", name, filters)
(with name="d"
and filters=None
), which calls s.platform._backend(s, method, *args, **kwargs)
(with method="item_get_elements"
and the rest passed along as *args
). s.platform._backend
is an instance of (through inheritance) the abstract Backend
class, which does specify a __call__(self, obj, method, *args, **kwargs)
function to return getattr(self, method)(obj, *args, **kwargs)
. With self=JDBCBackend
, this means the function in question is this:
# type="par", name="d"
def item_get_elements(self, s, type, name, filters=None): # noqa: C901
if filters:
# Convert filter elements to strings
filters = {dim: as_str_list(ele) for dim, ele in filters.items()}
try:
# Retrieve the cached value with this exact set of filters
return self.cache_get(s, type, name, filters)
except KeyError:
pass # Cache miss
try:
# Retrieve a cached, unfiltered value of the same item
unfiltered = self.cache_get(s, type, name, None)
except KeyError:
pass # Cache miss
else:
# Success; filter and return
return filtered(unfiltered, filters)
# Failed to load item from cache
# Retrieve the item
item = self._get_item(s, type, name, load=True)
idx_names = list(item.getIdxNames())
idx_sets = list(item.getIdxSets())
# Get list of elements, using filters if provided
if filters is not None:
jFilter = java.HashMap()
for idx_name, values in filters.items():
# Retrieve the elements of the index set as a list
idx_set = idx_sets[idx_names.index(idx_name)]
elements = self.item_get_elements(s, "set", idx_set).tolist()
# Filter for only included values and store
filtered_elements = filter(lambda e: e in values, elements)
jFilter.put(idx_name, to_jlist(filtered_elements))
jList = item.getElements(jFilter)
else:
jList = item.getElements()
if item.getDim() > 0:
# Mapping set or multi-dimensional equation, parameter, or variable
columns = copy(idx_names)
# Prepare dtypes for index columns
dtypes = {}
for idx_name, idx_set in zip(columns, idx_sets):
# NB using categoricals could be more memory-efficient, but requires
# adjustment of tests/documentation. See
# https://github.com/iiasa/ixmp/issues/228
# dtypes[idx_name] = CategoricalDtype(
# self.item_get_elements(s, 'set', idx_set))
dtypes[idx_name] = str
# Prepare dtypes for additional columns
if type == "par":
columns.extend(["value", "unit"])
dtypes.update(value=float, unit=str)
# Same as above
# dtypes['unit'] = CategoricalDtype(self.jobj.getUnitList())
elif type in ("equ", "var"):
columns.extend(["lvl", "mrg"])
dtypes.update(lvl=float, mrg=float)
# Copy vectors from Java into pd.Series to form DataFrame columns
columns = []
def _get(method, name, *args):
columns.append(
pd.Series(
# NB [:] causes JPype to use a faster code path
getattr(item, f"get{method}")(*args, jList)[:],
dtype=dtypes[name],
name=name,
)
)
# Index columns
for i, idx_name in enumerate(idx_names):
_get("Col", idx_name, i)
# Data columns
if type == "par":
_get("Values", "value")
_get("Units", "unit")
elif type in ("equ", "var"):
_get("Levels", "lvl")
_get("Marginals", "mrg")
result = pd.concat(columns, axis=1, copy=False)
elif type == "set":
# Index sets
# dtype=object is to silence a warning in pandas 1.0
result = pd.Series(item.getCol(0, jList)[:], dtype=object)
elif type == "par":
# Scalar parameter
result = dict(
value=float(item.getScalarValue().floatValue()),
unit=str(item.getScalarUnit()),
)
elif type in ("equ", "var"):
# Scalar equation or variable
result = dict(
lvl=float(item.getScalarLevel().floatValue()),
mrg=float(item.getScalarMarginal().floatValue()),
)
# Store cache
self.cache(s, type, name, filters, result)
return result
From the ixmp java source code (which is private for some reason, I believe), I would say that Parameter
is a class: public class Parameter extends Item {
, but I can't really tell you without reading up much more on java if we set up a reference loop or some such, sorry.
In the test, we then call the del
construction as described above, which does not mention java or jpype a lot.
from ixmp.
@glatterf42 thanks for digging in—I hope that was instructive!
The fault probably lies in code that I originally added in #213, several years ago. The basic purpose of this code is to aggressively free up references to Python objects (like Scenario, TimeSeries, etc.) so that they and the associated ixmp_source Java objects (via JPype) can be garbage-collected, freeing memory. This was deemed necessary for larger applications, like running many instances of the MESSAGEix-GLOBIOM global model in older workflows (using the scenario_runner/runscript_main pattern).
I will open a PR to remove the workaround for the current issue and adjust/update the tests and/or caching/memory-management code. Then we can discuss whatever the solution turns out to be.
from ixmp.
I spent some further time investigating this, including using refcycle. After del s
here:
ixmp/ixmp/tests/backend/test_base.py
Lines 148 to 149 in d92cfcd
…I inserted code like:
import gc
gc.collect()
gc.collect()
gc.collect()
import sys
import traceback
import refcycle
import ixmp
snapshot = refcycle.snapshot()
scenarios = snapshot.find_by(lambda obj: isinstance(obj, ixmp.Scenario))
s2 = scenarios[0]
print(f"{s2 = }")
snapshot.ancestors(s2).export_image("refcycle-main.svg")
tbs = list(
snapshot.ancestors(s2).find_by(
lambda obj: type(obj).__name__ == "traceback"
)
)
print(f"{tbs = }")
del snapshot
gc.collect()
for i, obj in enumerate(tbs):
print(f"{i} {obj}")
print(f"{obj.tb_frame = }")
traceback.print_tb(obj, file=sys.stdout)
This produced output like the following:
In short:
- At that point in the test, there are six builtins.traceback objects that exist.
- These appear to originate in the Java code, rather than in the Python code (first frame object referenced by each traceback object).
- Each of those frame stacks ultimately contain 1 or more references to the
s
Scenario instance. - These are what make the reference count non-zero, and prevent
s
from being garbage-collected. - Again, this occurs only with JPype ≥1.4.1 (including 1.5.0) and Python ≤ 3.10. It does not (see the CI runs on #504) occur with Python 3.11 or 3.12. I don't know why this is.
- I also spent several hours fiddling with the exception handling. For instance, because tracebacks are associated with exceptions, I tried aggressively deleting and GC'ing jpype.JException instances and returning only Python exceptions in jdbc.py. Nothing seemed to make those traceback objects disappear.
from ixmp.
Related Issues (20)
- Work around conda-forge/openjdk-feedstock#107
- Adjust for mypy `--no-implicit-optional`
- Address failing tests of R/Jupyter notebooks
- Adjust for pandas 2.0.0 HOT 4
- IXMP on Apple Silicon (ARM architecture) HOT 11
- Documentation to subannual timeslices erroneous
- `ixmp[tests] <= 3.6.0` not installable ← `codecov` yanked from PyPI HOT 1
- R 4.3.0 not working as expected on macOS HOT 2
- Track/work around actions/setup-python#682 HOT 5
- IXMP down HOT 1
- Handling flaky tests HOT 1
- Track/work around lp:2035206: missing lxml wheel for cp38-macosx HOT 1
- Support Python 3.12
- Advise users to check GAMS log file HOT 3
- Address tests that are flaky with pytest-xdist HOT 2
- GAMS is not registered correctly in show-versions HOT 1
- Add a 'wide' version of the Excel file format HOT 1
- AttributeError: module 'ixmp' has no attribute 'config' HOT 4
- Work around libgmdjni64.dylib not compatible with arm64 architecture on macOS HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ixmp.