Coder Social home page Coder Social logo

Comments (9)

yrodiere avatar yrodiere commented on June 19, 2024

Hey @beikov , could you please share your opinion on the right solution for this using Hibernate ORM? I'd definitely be interested in hearing it.

from quarkus.

beikov avatar beikov commented on June 19, 2024

I think the proper solution is to always think about the what data you really need and model this with DTOs to avoid any sort of unpredictable lazy loading.
In this particular case, you seem to need some sort of recursive fetching, so I would suggest you to make use of recursive CTEs to model the data loading e.g.

with statements as (
  select cast(:id) as id
  union all
  select s.previousStatement.id as id from Statement s join statements r on s.id = r.id
)
select s from Statement s where s.id in (select r.id from statements r)

This way, you will select all statements that are reachable via previousStatement associations, starting at the statement with the id as given through the :id parameter.

Ideally, we'd offer a recursive fetch join to model this, but we don't have that yet.

from quarkus.

ymajoros avatar ymajoros commented on June 19, 2024

@beikov thanks for your feedback. In my use case:

  • most of the time we don't need the previousStatement
  • ... but some flows do
  • ... and sometimes we need to dig further

As getting a statement is a O(1) operation, but fetching the whole chain is O(n), I do not want to fetch them eagerly or with any kind of projection or recursive fetch. Fetching lazily means that I have a best case of O(1) and a worst case of O(f) (f meaning a few here). The proposed alternative is O(n) best case, making it perform much worse (n can be big here).

There are a lot of other valid use cases. Like the 'debate' about open-session-in-view for spring, there will always be PROs and CONs, and I suggest to not go too deep into this debate here but let users make conscious decisions about their use cases.

Therefore, such a functionality is useful for some users.

from quarkus.

beikov avatar beikov commented on June 19, 2024

Then IMO you should model your different flows explicitly by creating separate data access methods that initializes exactly the state that the caller needs. Any sort of unpredictable lazy initialization will lead to pain, so it's best to avoid it in the first place.
Open-session-in-view or enabling the hibernate.enable_lazy_load_no_trans setting are both recipes for disaster.

but fetching the whole chain is O(n)

Talking about O(n) here is a bit confusing, because the expensive part really is the separate select statement execution, not the lookup in the database itself. Using a recursive fetch like I showed you only does one database request.

There are a lot of other valid use cases. Like the 'debate' about open-session-in-view for spring, there will always be PROs and CONs, and I suggest to not go too deep into this debate here but let users make conscious decisions about their use cases.

The only PRO I see is that you can turn off your brain and not think about the data access design, but that will bite you very fast when you run into performance issues, because changing the design later is painful. So I don't really view that as a PRO, but rather as a CON in disguise. Modelling your data fetching requirements is IMO not something that should be an after-thought just like performance testing shouldn't be, because at some point there will be that one user with hundreds of chained Statement associations who will bring down your whole application.

Therefore, such a functionality is useful for some users.

If you really want this broken behavior, then let the transaction span across the whole request by annotating your HTTP endpoint method with @Transactional. Nobody is preventing you from doing that. Having a global flag is IMO a very bad idea as that will make every HTTP request claim a database connection and also span a transaction including locks for much longer than needed.

from quarkus.

ymajoros avatar ymajoros commented on June 19, 2024

This is quite opinionated, and I really have a different view and experience.

Even if the recursive fetch is a single request, it is definitely a O(n) one that I want to avoid. As said, I most often do not need this data.

Of course, we could replace the relation with an ad-hoc call. But that also means refactoring the database to replace the many-to-one by a relation table, which I don't consider better design for this use case.

'Turn off your brain' is rather though wording: I'd rather rephrase it as decoupling the service returning a Statement from whatever the next transaction wants to do with it.

There is no use case where a user will bring down the whole application, for multiple reasons.

This is also a very simple use case, but navigating complex object graphs not known in advance in a different transaction isn't bad design per se. Performance isn't an afterthought here, on the contrary.

I'm new to this project, but having bigger transactions is one of the steps that we are taking further. Still, it doesn't cover all use cases and we sometimes do want a separate one.

Of course, https://hibernate.atlassian.net/browse/HHH-17750 makes our problem worse: if fetching a lazy transient attribute of a detached entity would work in a new transaction, the need for this would be much lower (though still there).

from quarkus.

beikov avatar beikov commented on June 19, 2024

This is quite opinionated, and I really have a different view and experience.

I was asked for an opinion and here you have it. I simply prefer designing the data access rather than burying requirements in e.g. the UI code through implicit lazy initialization. If you ask me, no entity should ever be serialized to JSON, but only DTOs. I understand that this might seem extreme to some folks, but designing the data access this way makes it way simpler to reason about performance and optimize, simply because there are no unknowns.

Even if the recursive fetch is a single request, it is definitely a O(n) one that I want to avoid. As said, I most often do not need this data.

That's totally understandable, hence my suggestion.

Then IMO you should model your different flows explicitly by creating separate data access methods that initializes exactly the state that the caller needs.

Of course, we could replace the relation with an ad-hoc call. But that also means refactoring the database to replace the many-to-one by a relation table, which I don't consider better design for this use case.

I didn't suggest that and I agree that this would not improve the design.

'Turn off your brain' is rather though wording: I'd rather rephrase it as decoupling the service returning a Statement from whatever the next transaction wants to do with it.

I don't know your application design, but it seems to me that your app could know beforehand what the next transaction requires and hence load the appropriate data without the need for lazy loading. It's a matter of creating separate data access methods for the respective flows.

This is also a very simple use case, but navigating complex object graphs not known in advance in a different transaction isn't bad design per se. Performance isn't an afterthought here, on the contrary.
I'm new to this project, but having bigger transactions is one of the steps that we are taking further. Still, it doesn't cover all use cases and we sometimes do want a separate one.

No idea how you are "navigating" in "a different transaction", but maybe you you can solve your issues by simply loading the subsequent object graphs explicitly instead of relying on lazy loading?

Of course, https://hibernate.atlassian.net/browse/HHH-17750 makes our problem worse: if fetching a lazy transient attribute of a detached entity would work in a new transaction, the need for this would be much lower (though still there).

It seems @mbladel is already looking into this issue, so let's maybe keep the discussion about the Jira issue in the Jira comments.

from quarkus.

ymajoros avatar ymajoros commented on June 19, 2024

I simply prefer designing the data access rather than burying requirements in e.g. the UI code through implicit lazy initialization.

This isn't UI code. The first transaction is simply committed and we end up with an object with lazy relations.

If you ask me, no entity should ever be serialized to JSON, but only DTOs.

I do strictly agree with that. Not what is happening in my example.

I understand that this might seem extreme to some folks, but designing the data access this way makes it way simpler to reason about performance and optimize, simply because there are no unknowns.

There are unknowns, the first service doesn't know how much data it has to send back. Even the caller doesn't know at call time, it might need to fetch relations. It's the essence of lazy relationships, the fact they are in a different transaction doesn't make this different.

Even if the recursive fetch is a single request, it is definitely a O(n) one that I want to avoid. As said, I most often do not need this data.

That's totally understandable, hence my suggestion.

Then IMO you should model your different flows explicitly by creating separate data access methods that initializes exactly the state that the caller needs.

No, at the time of the first call, I can't decide how many previous statements I need. It also depends on expensive calculations on the statement itself (calling third party services, ...). I really can't know beforehand.

I don't know your application design, but it seems to me that your app could know beforehand what the next transaction requires and hence load the appropriate data without the need for lazy loading. It's a matter of creating separate data access methods for the respective flows.

So, basically, this is not the case.

I also don't see different kind of structures for slightly different use cases as a better design. Your proposal does have its merits to optimize some use cases, but isn't the holy grail either.

JPA is a leaky abstraction, and that's both a problem and some of its strengths. Lazy loading, even in a different transaction, is quite useful in some scenarios. I do agree it can be misused.

No idea how you are "navigating" in "a different transaction", but maybe you you can solve your issues by simply loading the subsequent object graphs explicitly instead of relying on lazy loading?

That means you need to know all the paths that you'll have to follow beforehand. If there is any complex logic or dependencies, this isn't necessarily the case.

It seems @mbladel is already looking into this issue, so let's maybe keep the discussion about the Jira issue in the Jira comments.

Yes. I just meant this is the first reason why we wanted this feature, even it were only for migration.

from quarkus.

beikov avatar beikov commented on June 19, 2024

No, at the time of the first call, I can't decide how many previous statements I need. It also depends on expensive calculations on the statement itself (calling third party services, ...). I really can't know beforehand.
That means you need to know all the paths that you'll have to follow beforehand. If there is any complex logic or dependencies, this isn't necessarily the case.

Maybe I'm not totally understanding, but in your second transaction, you can execute your logic to determine what data needs to be fetched and then load the Statement with the id of the previousStatement along with how many previousStatement you may need, but in one go. Put all of that into a DTO so there is no lazy entity anymore and you're done :)

from quarkus.

ymajoros avatar ymajoros commented on June 19, 2024

Maybe I'm not totally understanding, but in your second transaction, you can execute your logic to determine what data needs to be fetched and then load the Statement with the id of the previousStatement along with how many previousStatement you may need, but in one go. Put all of that into a DTO so there is no lazy entity anymore and you're done :)

In the second transaction, I can't know if I need a 3rd or 4th object before checking the second one. How would an ad-hoc call to fetch a new statement be different from a lazy fetch regarding performance? I'm still in the service layer btw.

from quarkus.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.