ORMs vs SQL: The JPA Story

I previously wrote about ORMs vs SQL and received a lot of reaction to it--most of it positive. Some of it was predictable ("you don't know what you're talking about (because you don't agree with me)") but one reaction from a couple of people surprised me: they took my post to mean that I was against persistence abstractions. I will now expand on those points with a specific example: the Java Persistence API ("JPA").

Some History

Hibernate was the first really successful project to try and create an object model on top of a relational one. It was--and still is--quite popular. It is clearly the most popular Java ORM. Through the better part of the last decade Spring and Hibernate were the de facto Java enterprise standard.

Other projects have come along to do much the same thing (TopLink, OpenJPA, EclipseLink and so on). Of course Sun intervened and did what they always do: tried to standardize things by creating JPA 1.0 as the persistence layer of the EJB 3.0 specification.

Standardization

To quote Joel Spolsky:

When you try to unify two opposing forces by creating a third alternative, you just end up with three opposing forces. You haven't unified anything and you haven't really fixed anything.

Sun's track record here is terrible. They made a dog's breakfast of logging (JDK 1.4), have been pushing the (still) unsuccessful JavaServer Faces ("JSF") Web application framework (sorry but the defiant cries of "next release/year, it will take off" have a certain "boy who cried wolf" quality after 7-8 years), ignored the already-adopted and popular OSGi standard for their own Java module system, have made a stillborn fray into rich client territory with JavaFX and have shown an inability to lead the community on Java 7 and advancement of both the language and the platform.

JPA 1.0

Nevertheless, we did get the JPA 1.0 spec as the persistence layer for EJB 3.0, which boldly embraced the POJO philosophy trail blazed by Spring years earlier. JPA represents the lowest common denominator between the various ORMs that support it. It hasn't really unified anything and it hasn't really fixed anything. In fact, a good case can be made that standardization wasn't even necessary.

That being said, JPA isn't bad. It just has a lot of limitations such that your chances of not using any provider-specific extensions on any real project is almost zero.

Anyway, Oracle donated (part of) their TopLink product that became TopLink Essentials, which is the reference implementation of JPA 1.0. Just like Sun ignored Log4J in the logging debate (yes, yes, I know the JDK logging can wrap Log4j), one has to wonder why they bypassed Hibernate but I guess we should expect that by now.

Oracle donated TopLink to the Eclipse Foundation in 2006 and that became EclipseLink, a product I’ve used a lot and respect a lot in this space. It has some nice features that I've found no equivalent for in Hibernate (but I digress). EclipseLink 2.0 will be the reference implementation of the imminent JPA 2.0 specification as part of EJB 3.1 in JEE 6 (did you get all that?).

Complexity

While all these libraries do basically the same thing, they are fundamentally different in their implementation and that's the first problem. What happens when you try and use these outside of a J(2)EE container?

My point here is that there is an awful lot of complexity here just for one feature: lazy fetching of associated entities.

Differences

This is a list of some of the differences and extensions for some of the JPA providers. This list is by no means exhaustive but it illustrates my point:

  • EclipseLink has the @PrivateOwned annotation, for automatically deleting child records that are removed from the collection. Programmers often mistakenly think that's what CascadeType.DELETE does. Not so;
  • EclipseLink has the BATCH query hint, which is incredibly useful for mass loading of a large number of entities with discriminated type. This is something for which I have no found no Hibernate equivalent. I'll happily be proven wrong on this one;
  • Performance of differnet JPA providers can be hugely different (although I think that test doesn't do EclipseLink justice); and
  • The properties and setup are different.

Whenever there are standards there will be differences from different providers. But when common functionality is not sufficient to the point that (typically extensive) use of extensions is a given an arguably unnecessary "standard" becomes pointless or even counterproductive.

Problems

JPA is certainly not without problems. What comes to mind is:

  • Native queries are really awkward to use, returning Object arrays with multiple selects. This is, in part, Java's fault compared to, say, C#, which neatly gets around this with the "var" type (which is really just syntactic sugar for reflection on properties);
  • JPA can be a real black box for generating SQL;
  • Composite keys are really awkward to use. So much so that composite primary keys are often described as "legacy" in JPA texts, blogs and articles;
  • Entities, despite the claims of being POJOs, really aren't. They're typically unsuitable for transmission over a network, conversion to JSON and so on, typically requiring a translation layer;
  • No standard support for filtering collections. For example, a Customer entity may have several child Accounts, only 1-2 of which are active (marked with a flag). JPA doesn't really support just joining across the "active" children in this scenario; and
  • JPA QL is another language you have to learn with little to no tooling support. It's not as capable as SQL is either (hence the need for native queries).

Again, this list is illustrative not exhaustive.

All of these things constitute part of the complexity cost for the "completeness" of the abstraction I talked about it in my previous post.

Leakiness

Returning once more to the concept of leaky abstractions, a high price has been paid in complexity (eg dynamic weaving) with a lot of provider differences from the "standard" and the abstraction is still leaky. The best example of this is:

How many of you have spent half a day trying to figure out which arcane combination of XML, properties, VM parameters and annotations will product performant SQL?

Conclusion

My goal here isn't to deride or diminish JPA or any particular provider. Like I said, I like EclipseLink (in the right applications). Even so and even after using it for a year, I'm still scratching my head trying to figure out how some of it works (eg the session management).

This quest for "simplicity" (being an object model in your persistence layer) is so incredibly complex both in use and in implementation that I believe it has reached the point of (often) being counterproductive.

This is exacerbated by a certain kind of programmer who believes that the point of an abstraction is to avoid learning or understanding the underlying technology, a philosophy I vehemently oppose. If you're doing JPA, you still need to know databases and SQL. If you're using a Web application framework, you still need to know the servlets API and how HTTP works at least at a high level.

So what's the alternative?

This leads me into something I'll discuss at length next week: Ibatis. I firmly believe that Ibatis is the premiere Java ORM framework. It is capable of doing 90-95% of what JPA can do with significantly lower complexity and a significantly lower learning curve. But more on that next week.

Next: The JPA Story