Monday, February 23, 2015

Performance Pitfalls: hibernate query plan cache thrashing

Hibernate internally uses a cache that maps HQL statements (as strings) to query plans. The cache consists of a bounded map limited by default to 2048 elements (configurable). All HQL queries are loaded through this cache. In case of a miss, the entry is automatically added to the cache. This makes it very susceptible to thrashing - a scenario in which we constantly put new entries into the cache without ever reusing them and thus preventing the cache from bringing any performance gains (it even adds some cache management overhead). To make things worse, it is hard to detect this situation by chance - you have to explicitly profile the cache in order to notice that you have a problem there. I will say a few words on how this could be done later on.

So the cache thrashing results from new queries being generated at high rates. This can be caused by a multitude of issues. The two most common that I have seen are - bugs in hibernate which cause parameters to be rendered in the JPQL statement instead of being passed as parameters and the use of an "in" - clause. 

Due to some obscure bugs in hibernate, there are situations when parameters are not handled correctly and are rendered into the JPQL query (as an example check out HHH-6280). If you have a query that is affected by such defects and it is executed at high rates, it will thrash your query plan cache because each JPQL query generated is almost unique (containing IDs of your entities for example).

The second issue lays in the way that hibernate processes queries with an "in" clause (e.g. give me all person entities whose company id field is one of 1, 2, 10, 18). For each distinct number of parameters in the "in"-clause, hibernate will produce a different query - e.g. select x from Person x where x.company.id in (:id0_) for 1 parameter, select x from Person x where x.company.id in (:id0_, :id1_) for 2 parameters and so on. All these queries are considered different, as far as the query plan cache is concerned, resulting again in cache thrashing. You could probably work around this issue by writing a utility class to produce only certain number of parameters - e.g. 1, 10, 100, 200, 500, 1000. If you, for example, pass 22 parameters, it will return a list of 100 elements with the 22 parameters included in it and the remaining 78 parameters set to an impossible value (e.g. -1 for IDs used for foreign keys). I agree that this is an ugly hack but could get the job done. As a result you will only have at most 6 unique queries in your cache and thus reduce thrashing.

So how do you find out that you have the issue? You could write some additional code and expose metrics with the number of entries in the cache e.g. over JMX, tune logging and analyze the logs, etc. If you do not want to (or can not) modify the application, you could just dump the heap and run this OQL query against it (e.g. using mat):
SELECT l.query.toString() FROM INSTANCEOF org.hibernate.engine.query.spi.QueryPlanCache$HQLQueryPlanKey l It will output all queries currently located in any query plan cache on your heap. It should be pretty easy to spot whether you are affected by any of the aforementioned problems.

As far as the performance impact goes, it is hard to say as it depends on too many factors. I have seen a very trivial query causing 10-20 ms of overhead spent in creating a new HQL query plan. In general, if there is a cache somewhere, there must be a good reason for that - a miss is probably expensive so your should try to avoid misses as much as possible. Last but not least, your database will have to handle large amounts of unique SQL statements too - causing it to parse them and maybe create different execution plans for every one of them.

Sunday, February 22, 2015

Performance Pitfalls - Hibernate Criteria API

The criteria API is a way to programmatically create dynamic JPA queries in a type safe manner. It is supposed to be used when the JPA query cannot be statically defined up front because the actual query depends on input provided by the user - e.g. a search form with many optional filter conditions that can be flexibly combined in any possible way.

In hibernate the criteria API is used to create a graph of hibernate objects (implementations of the org.hibernate.Criteria interface and related objects) representing the required query. This graph of objects is then internally transformed to a JPQL query (basically a string). This string is then parsed and mapped to a SQL query. It is interesting to notice, that hibernate does not map the criteria object graph directly to SQL but rather goes through an intermediate JPQL query string. This process is repeated every time a criteria API query is executed.

Obviously, this process requires a lot of CPU cycles each time a query is executed. In cases when such queries are run very often this might already be a prohibitive factor. Unfortunately, it gets even worse.

The even more concerning issue is part of the process of transforming JPQL queries to SQL. Hibernate tries to load some of the tokens in the JPQL query as classes to determine whether they are actually classes. This results in hibernate calling ClassLoader.loadClass() with parameters like "a0", "b1" etc.

Class-loaders are organized in hierarchies. In an enterprise application a WAR class-loader would have a parent EAR class-loader with a parent the application server class-loader and so on. The default behaviour of each class-loader, when looking for a class, is to first try to load the class from the parent class-loader. If this fails, it will try to load the class itself. As a result of this strategy, trying to load a non-existent class is pretty much the worst-case scenario for a class-loader - it has to look for the class in the root class-loader and in each class-loader down the hierarchy to finally determine that the class is not known and throw a ClassNotFoundException. This has two issues. First of all, normally class-loaders do not cache such cases. When a class is found, some class-loaders might remember that (cache it) so that subsequent calls are much faster. Missing classes are usually not cached. This means that looking for a non-existent class (even the same one) over and over again always hits this worst-case scenario causing each class-loader to go through all of its classes. The second issue lays in the way the some class-loaders (e.g. the WebLogic application server class-loader) which assume that loading classes does not happen that often and use a very coarse-grained locking - loading a class basically locks the class-loader through the whole search process.

Lets get back to our criteria API query. One enterprise application obviously has one single class-loader instance (and the same class-loaders on the way up in the hierarchy). Each criteria API query tries to get the lock on the same class-loader while looking for JPQL tokens. This results in all criteria API queries contending for the same locks - a high lock contention in the class-loader and sync times are what follows.

In this blog post we saw that one needs to be willing to pay a very high price to use use criteria API with hibernate. Depending on the performance requirements it might be okay for some applications. For others, however, that is absolutely prohibitive and any high-load process must not use criteria API. My advice as a performance engineer is to avoid criteria APIs as much as possible. In many cases it is possible to use a few named queries (which, by the way, go through the same JPQL to SQL process but only once upon start-up) or even revert to native SQL or other technologies.