ORMs vs SQL: The JPA Story

I previously wrote about ORMs vs SQL and received a lot of reaction to it--most of it positive. Some of it was predictable ("you don't know what you're talking about (because you don't agree with me)") but one reaction from a couple of people surprised me: they took my post to mean that I was against persistence abstractions. I will now expand on those points with a specific example: the Java Persistence API ("JPA").

Some History

Hibernate was the first really successful project to try and create an object model on top of a relational one. It was--and still is--quite popular. It is clearly the most popular Java ORM. Through the better part of the last decade Spring and Hibernate were the de facto Java enterprise standard.

Other projects have come along to do much the same thing (TopLink, OpenJPA, EclipseLink and so on). Of course Sun intervened and did what they always do: tried to standardize things by creating JPA 1.0 as the persistence layer of the EJB 3.0 specification.

Standardization

To quote Joel Spolsky:

When you try to unify two opposing forces by creating a third alternative, you just end up with three opposing forces. You haven't unified anything and you haven't really fixed anything.

Sun's track record here is terrible. They made a dog's breakfast of logging (JDK 1.4), have been pushing the (still) unsuccessful JavaServer Faces ("JSF") Web application framework (sorry but the defiant cries of "next release/year, it will take off" have a certain "boy who cried wolf" quality after 7-8 years), ignored the already-adopted and popular OSGi standard for their own Java module system, have made a stillborn fray into rich client territory with JavaFX and have shown an inability to lead the community on Java 7 and advancement of both the language and the platform.

JPA 1.0

Nevertheless, we did get the JPA 1.0 spec as the persistence layer for EJB 3.0, which boldly embraced the POJO philosophy trail blazed by Spring years earlier. JPA represents the lowest common denominator between the various ORMs that support it. It hasn't really unified anything and it hasn't really fixed anything. In fact, a good case can be made that standardization wasn't even necessary.

That being said, JPA isn't bad. It just has a lot of limitations such that your chances of not using any provider-specific extensions on any real project is almost zero.

Anyway, Oracle donated (part of) their TopLink product that became TopLink Essentials, which is the reference implementation of JPA 1.0. Just like Sun ignored Log4J in the logging debate (yes, yes, I know the JDK logging can wrap Log4j), one has to wonder why they bypassed Hibernate but I guess we should expect that by now.

Oracle donated TopLink to the Eclipse Foundation in 2006 and that became EclipseLink, a product I’ve used a lot and respect a lot in this space. It has some nice features that I've found no equivalent for in Hibernate (but I digress). EclipseLink 2.0 will be the reference implementation of the imminent JPA 2.0 specification as part of EJB 3.1 in JEE 6 (did you get all that?).

Complexity

While all these libraries do basically the same thing, they are fundamentally different in their implementation and that's the first problem. What happens when you try and use these outside of a J(2)EE container?

My point here is that there is an awful lot of complexity here just for one feature: lazy fetching of associated entities.

Differences

This is a list of some of the differences and extensions for some of the JPA providers. This list is by no means exhaustive but it illustrates my point:

  • EclipseLink has the @PrivateOwned annotation, for automatically deleting child records that are removed from the collection. Programmers often mistakenly think that's what CascadeType.DELETE does. Not so;
  • EclipseLink has the BATCH query hint, which is incredibly useful for mass loading of a large number of entities with discriminated type. This is something for which I have no found no Hibernate equivalent. I'll happily be proven wrong on this one;
  • Performance of differnet JPA providers can be hugely different (although I think that test doesn't do EclipseLink justice); and
  • The properties and setup are different.

Whenever there are standards there will be differences from different providers. But when common functionality is not sufficient to the point that (typically extensive) use of extensions is a given an arguably unnecessary "standard" becomes pointless or even counterproductive.

Problems

JPA is certainly not without problems. What comes to mind is:

  • Native queries are really awkward to use, returning Object arrays with multiple selects. This is, in part, Java's fault compared to, say, C#, which neatly gets around this with the "var" type (which is really just syntactic sugar for reflection on properties);
  • JPA can be a real black box for generating SQL;
  • Composite keys are really awkward to use. So much so that composite primary keys are often described as "legacy" in JPA texts, blogs and articles;
  • Entities, despite the claims of being POJOs, really aren't. They're typically unsuitable for transmission over a network, conversion to JSON and so on, typically requiring a translation layer;
  • No standard support for filtering collections. For example, a Customer entity may have several child Accounts, only 1-2 of which are active (marked with a flag). JPA doesn't really support just joining across the "active" children in this scenario; and
  • JPA QL is another language you have to learn with little to no tooling support. It's not as capable as SQL is either (hence the need for native queries).

Again, this list is illustrative not exhaustive.

All of these things constitute part of the complexity cost for the "completeness" of the abstraction I talked about it in my previous post.

Leakiness

Returning once more to the concept of leaky abstractions, a high price has been paid in complexity (eg dynamic weaving) with a lot of provider differences from the "standard" and the abstraction is still leaky. The best example of this is:

How many of you have spent half a day trying to figure out which arcane combination of XML, properties, VM parameters and annotations will product performant SQL?

Conclusion

My goal here isn't to deride or diminish JPA or any particular provider. Like I said, I like EclipseLink (in the right applications). Even so and even after using it for a year, I'm still scratching my head trying to figure out how some of it works (eg the session management).

This quest for "simplicity" (being an object model in your persistence layer) is so incredibly complex both in use and in implementation that I believe it has reached the point of (often) being counterproductive.

This is exacerbated by a certain kind of programmer who believes that the point of an abstraction is to avoid learning or understanding the underlying technology, a philosophy I vehemently oppose. If you're doing JPA, you still need to know databases and SQL. If you're using a Web application framework, you still need to know the servlets API and how HTTP works at least at a high level.

So what's the alternative?

This leads me into something I'll discuss at length next week: Ibatis. I firmly believe that Ibatis is the premiere Java ORM framework. It is capable of doing 90-95% of what JPA can do with significantly lower complexity and a significantly lower learning curve. But more on that next week.

Next: The JPA Story

18 comments:

Roger said...

Great post. Honestly i'm fed up with the overcomplexity that comes along with Hibernate and JPA, specially when there are better solutions out there, i've used ibatis for a while and i have to say it's the closest to an optimal solution to the ORM problem , in fact i find it so great that i wonder why on earth isn't as popular...i guess people complain that you *must* know SQL in order to use ibatis...quite frankly this is one of the reasons why i believe people have to stop using hibernate and JPA, how can you not know Databases and SQL in this day and age? and Why wouldn't you want to learn SQL and Databases? do people really find it that hard?.

Donny said...

I thought toplink was the first to come up with ORM idea.

overtheline said...

This article is a bit revisionist, and I dont have much patience for high level blab about ORM anymore considering the fossilized carcus of a horse that is lying down over there -->.

But since you brought it up, JPA is probably not much more interesting than JDO at this point. Or EJB for that matter.

Tired, slow, uninteresting, expensive in terms of development cost, supposedly enterprise software.

If by enterprise they mean a whole hog waste of time, then I agree. Enterprises are the only places that have the money to waste time like this.

If you want to learn about ORM vs relational instea dof just saying you "like" something read ORM is the vietnam of computer science.

Its true, or was until hadoop and distributed key-value stores started coming on.

If JPA matters to anyone at this point it would be a soon to be out of work architect.

Sorry, no empathy here. We all knew this.

Brian Silberbauer said...

I stand to be corrected and I am just passing through, so no search to verify, but:

Sun made a big switch from JDO focus to JPA focus because of JBoss pressure to make EJB3 entities Hibernate like. JPA is basically Hibernate, toplink came from the Oracle implementation that they donated to Sun.

JSF is struts++, it was spec'ed out by the creator of struts. And it is terrible, like its mother.

Just saying.

Anonymous said...

Good article! The problem with Hibernate is that it tries to cover a lot more ground than what it really should be. My problem is that it is practicaly impossible to find a java job, these days, without having it shoved down my throat.

Andrew Lim said...

I like iBatis and I'm looking forward to your article next week.

However, recently I've come to prefer Spring JDBC over anything else. It doesn't require XML configuration (yes, you can use it standalone from the rest of Spring), supports named parameters, and easily maps SQL results to objects and collections. The main drawback (advantage?) is that it doesn't generate or hide SQL from you. You have to code the SQL yourself.

Dimitris Menounos said...

A small correction, Entities CAN be transmitted over the network, despite being bytecode enhanced, if only you design them carefully. That is mostly by keeping logic in services and by pre-loading needed relations before closing the session.

Personaly, I am using DTOs in the cases I want to display a projection or for performance reasons (cutting down / simplifying information).

Casper Bang said...

"C#, which neatly gets around this with the "var" type (which is really just syntactic sugar for reflection on properties)"

That is incorrect. the var keyword in C# is to allow local variable type inference, the compiler will fully expand and validate this at compile time. It's one of the cornerstones in making it possible to new up new objects (projections) without having to write tedious and verbose DTO's/beans constantly as we do in Java. However C# 4 introduces easier reflection with the "dynamic" keyword.

Anonymous said...

>> Of course Sun intervened and did what they always do: tried to standardize things by creating JPA 1.0 as the persistence layer of the EJB 3.0 specification.

Good joke, JPA is result of JBoss lobbing. Hibernate guys basically wrote most of the specification.

Oziel said...

Im waiting on your next post on Ibatis, and the framework take on such things as object graphs and lazy loading, if it is simple enough, you got yourself a believer....

Anonymous said...
This comment has been removed by a blog administrator.
jim said...
This comment has been removed by a blog administrator.
Anonymous said...

-Hibernate use CGLIB to perform interseptions.
- Use @Where annotation to filter collections in Hibernate
- You can always use custom query in Hibernate
- Batch processing feature presents too.
- Use cascades on DB, not in java to delete orphans or use such annotation in Hibernate
Conclusion: author doesn't RTFM at all.

William Shields said...

- CGLib: whats relevant to what point?
- @Where (and @Wheretable) are Hibernate-specific annotations. You'll note the subject if JPA;
- The need for native/custom queries is kinda the point of the post...
- Batch query hint (despite its name) isn't necessarily about batch processing. At least read what it does if you're going to comment on it;
- Cascade database features aren't universal and don't necessarily solve the problem.

Anonymous said...

-You metioned as a part of complexity point, about weaving in EclipseLink. I said that Hibernate use CGLIB without any problems.
-You are comparing all ORMs VS SQL, but you choose EclipseLink. Change your topic to EclipseLink VS SQL or give the change other JPA implementators.
- About delete cascade - show me example, I see only delete orphans as a problem.
Of course JPA doesn't solve all problems, but you can use as base and only in very specific place you can use plain sql.
In my project we have several places where JPA doesn't fit, but in this places we use custom queries with hibernate mapping. Thta means we still use some parts of hibernate.
Hibernate or other JPA's are powerfull tools, but that means you should be carefull to use it, because wrong usage can leads to big problems.

Anonymous said...

Just a note to say that I fought with hibernate for months before resorting to a much simpler system, namely SimpleORM. If "ORM is the vietnam of computer science" then the first thing to do is to keep it as simple as possible. SimpleORM lives up to its name.

It doesn't have POJOs, but then it turns out the hibernate POJOs aren't POJOs either: hibernate's so-called POJOS are post-compile byte-code enhanced POJO-mimicking bits of complexity that can't be understood with the JDK debugger.

(PS I forget why I didn't like Ibatis.)

Anonymous said...

Hi,

I need a sample code to benchmark the iBatis and JPA.
Does any body have somthing like it?

Regards,
amir_sedighi@yahoo.com

edburns said...

Dear Mr. Shields, I humbly submit that JavaServer Faces (JSF) be called the "most successful unsuccessful web framework", based on number and diversity of deployments. Please visit http://bit.ly/RealWorldJsfLinks for a sample. Also, for a solid refutation for your argument that assertions that "next release/year, it will take off" are off base, please see bit.ly/jsf-response .

Sincerely,

Ed Burns
Author and JSF Spec co-lead.

Post a Comment