Entries in Hibernate (4)

Sunday
Nov102013

Performance Testing Hibernate Query Approaches

I've known entities are expensive, but wanted to see for myself, so I built this test project to run some benchmarks. The tests aren't complete - just a simple query with no joins, but with a lot of data. The point was to see how much overhead is introduced after we have the query results.

About the Project

This is a simple Maven Spring project that creates an in-memory HSQLDB database, populates it 500,000 records, and then uses several Hibernate query strategies to fetch every one, and report on their average execution times.

Approaches Tested

  1. Using a JpaRepository interface's findAll method to return a list of attached Hibernate entities.

  2. Using Hibernate's StatelessSession interface to return a list of detached Hibernate entities.

  3. Selecting the specific fields of the entity, using Hibernate to return a simple List, and then manually converting that list to a list of detached entities (as DTOs, basically).

  4. Selecting the specific fields of the entity, then using Hibernate's AliasToBeanResultTransformer to build a list of detached entities (as DTOs, basically).

Changing Execution Parameters

By default, the database is loaded with 500,000 records, and each test is repeated in its own transaction 10 times. You can change both of these values in src/main/resources/application.properties.

Running Tests

This big of a database does take up over 256MB of memory, so you might have to increase your heap space. If you run the tests from Maven, you should be fine, since I increase it in the plugin's settings.

Download the sample project and run the following command from inside its directory:

mvn clean test

The test might take several minutes to run. At the end, the test will output the results.

My Results

The results are listed slowest to fastest:

  ---------------------------
  Testing JpaRepository query
  Total # of runs: 10
  JpaRepository avg time: 1073.5ms

  ---------------------------
  Testing stateless session query
  Total # of runs: 10
  Stateless Session avg time: 818.5ms

  ---------------------------
  Testing RowData query
  Total # of runs: 10
  RowData avg time: 317.7ms

  ---------------------------
  Testing ResultTransformer query
  Total # of runs: 10
  ResultTransformer avg time: 311.9ms

The individual times will vary on different systems - the relative performance is what's important.

The StatelessSession query was a little more efficient than returning attached entities, but still pretty slow for this big query. The AliasToBeanResultTransformer and my custom List -> DTO approaches tied as the best performers. I was hoping for this result, but worried that the reflective nature of AliasToBeanResultTransformer might have introduced some overhead. It did not.

Problems? Let me know!

I tried being as careful as I could with these tests:

  • took the average of several test runs
  • turned off Hibernate's second-level cache
  • turned off Hibernate's query cache
  • cleared the entity manager before each run
  • ran each test in its own transaction
  • ignored the first query in the transaction as to avoid any initial performance hit from opening it

I encourage you to download the project, take a look at the code, and try it out for yourself. If you see any issues with my methodology, please let me know, and I'll correct for it.

Friday
Nov082013

Exercising Caution With Hibernate Entities

Take a look at my hibernate-perf-test sample project to see the relative performance of different Hibernate query strategies.

Background

I heavily rely on the Hibernate framework on all of my database-driven Java projects. It’s a full-featured cross-database ORM framework capable of managing your database schema, persisting your entities to the database, populating entities from the database, caching queries, caching entities, registering entities for lifecycle events, indexing them in a search index, and just about everything else you could ask of an ORM.

When you fetch an entity from Hibernate, it remains attached to your EntityManager for the duration of that transaction. The EntityManager is responsible for all of the magic - it watches the entity for changes, persisting it and calling lifecycle methods when appropriate, makes sure to return the same instance of an entity for multiple queries to the same record, builds and executes additional queries if you access properties that point to associations in the database, and of course much more.

Entities are Expensive!

Entities are convenient, but if you don’t understand what they’re doing, you can get yourself into trouble. It all boils down to the fact that an entity’s getter method may run several more queries, so long as the entity is still attached to the EntityManager. A developer can be excused at first, because we’re used to getters and setters containing little to no logic besides accessing a private member variable. Hibernate is an amazing framework, but its greatest strength: being easy to use, becomes a liability: it’s too easy to use it to thrash your database.

The most common and expensive errors that I see are due to:

Accidental “eager” wiring of associated entities

When you load a Customer that has a list of Orders that are wired “eagerly”, Hibernate will build and execute a query to fetch all of those Orders, even if the code never accesses that list. The Orders can have their own eager associations, which compounds this problem. This becomes performance-crippling when your User table is eagerly self-referencing, and just about every query loads up your entire database. This is the sort of issue you might not notice during development, with 10 records in your database, but believe me, it shows itself in production.

Too many queries by accessing an association while looping over entities

When you’re displaying a table of 20 User records, and each one needs the name of their Organization, the easiest path for that developer is to loop over the list of entities and access the “organization” property on each User. This is intuitive, and makes sense with how we think about objects - you have a User, the User has an Organization, and the Organization has a name. However, with Hibernate, you need to understand that that Organization isn’t loaded until you access it. By fetching each User’s Organization this way, you’re producing at least 20 more queries. Now, imagine if the user chooses a table size of 50 rows, or if Organization had some eagerly-loading associations to surprise you with (like another User record!). You’ve just killed your action.

This comes up a lot when a developer tries to do the right thing and adds a lot of logging. Even if we don’t need the Organization for the data table we’re building, the developer might think that someone that reads the logs might want to see each User’s organization. Those logs just became very expensive, and nobody will notice until your users are complaining about page load times in production.

Yes, there are better ways to use entities

If you’re familiar with the framework, you’re probably shaking your head, mumbling something about “fetch joins”, closing the transaction before building your view, and other strategies for more efficiently fetching entities. My point is that as much as I love entities, they’re performance time bombs. It takes one careless moment, or one developer that doesn’t know Hibernate as well as you do, to touch the wrong property and cause a big issue to ripple through your system.

Querying Carefully

Rather than maintain constant vigilance and make sure that every member of your team has read all of the documentation, I prefer to tread lightly with Hibernate. I fully embrace it for helping me generate my schema, for updating the database in an infrequent-write system, and for the query language that bridges different databases.

There are several ways you can use Hibernate to query your database, from full-on magic entities to getting your hands dirty for better efficiency. In a system where reads are common and writes rare, my approach is typically to use as little ‘magic’ as I can for my read-only queries. I don’t fetch attached entities from the database, but rather, select the specific fields that I need, and use the results to build detached data transfer objects (DTOs).

Aside from avoiding the big issues above, this also forces a developer to think more in terms of SQL, and where their data is coming from - it’s more explicit, and thus, more understandable. If you have a “dumb” User DTO, and want the name of that User’s Organization, you’re going to have to either run a query on each User, query them in bulk and then join the two sets of data, or add the Organization name to the original User query. It’s immediately obvious that the first option is ridiculous, and that the second one is annoying to write. The last one takes the least effort, and makes sense when you’re thinking in terms of SQL and database access.

Here’s another point to consider. Since it’s typically considered poor form to return your entities to the client or view, then why not avoid the entity-to-DTO conversion and generate the DTOs in the first place?

Coming Up: Performance Testing Different Query Approaches

There are several ways to use Hibernate to fetch data from your database, and no shortage of conversations online about the performance of these different approaches, as well as official documentation on the subject.

In my next post, I’ll walk you through a sample project that I wrote to test out different query strategies. If you’re interested, take a look for yourself. If you see an error in my methodology, please contact me.

Tuesday
Sep032013

Don't Rely on EntityManager.persist() for Immediate Insert

I had always counted on EntityManager's persist() method to immediately insert entities. I would rely on this when writing database integration tests - I'd persist some records, then test my DAO methods to find them.

On my current project, I decided to add a configuration option to allow me to run my datbase integration tests on my development Oracle database rather than my embedded HSQLDB test database - just for an extra sanity check. The tests that tried to persist() and then retrieve those new entities failed. Adding an entityManager.flush() method after the persist() invocations solved the issue.

...But why?

From en.wikibooks.org:

The EntityManager.persist() operation is used to insert a new object into the database. persist does not directly insert the object into the database, it just registers it as new in the persistence context (transaction). When the transaction is committed, or if the persistence context is flushed, then the object will be inserted into the database. If the object uses a generated Id, the Id will normally be assigned to the object when persist is called, so persist can also be used to have an object's Id assigned. The one exception is if IDENTITY sequencing is used, in this case the Id is only assigned on commit or flush because the database will only assign the Id on INSERT. If the object does not use a generated Id, you should normally assign its Id before calling persist.

Here's how I wire up my entities' primary key:

@Id
@GeneratedValue(strategy = GenerationType.AUTO)
private Long id;

For my embedded HSQLDB database, the generation strategy is GenerationType.IDENTIY, which relies on the database to generate an autoincrementing primary key for that row. This requires an insert, so the persist() immediately inserts in HSQLDB.

Oracle, on the other hand, uses a cross-table GenerationType.SEQUENCE @Id generator, which doesn't require an insert, but the following SELECT:

select
    hibernate_sequence.nextval
from
    dual

This select is called immediately on persist() so that the EntityManager has an ID to assign the entity. That entity will only be inserted after a flush(), which is called automatically on transaction commit.

Long story short: If you're relying on your entity existing in the database after your call to persist(), but before the transaction commits, then call flush() first. Leave a comment justifying it, as manually calling flush is largely considered an anti-pattern akin to invoking the garbage collector. Delayed flush() calls give Hibernate the chance to perform more performant bulk updates.

Tuesday
Sep032013

Object/Relational Mapping: Know Your Frameworks

I've been working with Hibernate for several years now, yet I learn something new about it all the time. The more time I spend with the framework, the more concerned I am about how it will be used by developers new to it.

Mirko Novakovic Alois Reitbauer nails it in a post about O/R Mapping Anti-Patterns:

The simplicity of the entrance into the world of O/R mapping however gives a wrong impression of the complexity of these frameworks. Working with more complex applications you soon realize that you should know the details of framework implementation to be able to use them in the best possible way. In this article, we describe some common anti-patterns which may easily lead to performance problems.

This is an echo of Joel Spolsky's warnings of the Law of Leaky Abstraction:

The law of leaky abstractions means that whenever somebody comes up with a wizzy new code-generation tool that is supposed to make us all ever-so-efficient, you hear a lot of people saying "learn how to do it manually first, then use the wizzy tool to save time." Code generation tools which pretend to abstract out something, like all abstractions, leak, and the only way to deal with the leaks competently is to learn about how the abstractions work and what they are abstracting. So the abstractions save us time working, but they don't save us time learning.

Don't stop learning about a framework once you figure out how to use it - that's only the beginning.