You can find the full source code for this website in the Seam package in the directory /examples/wiki. It is licensed under the LGPL.
NOTE: This is a work in progress and is meant to be constantly updated with different scenarios so we have a single place for best practices and strategies regarding ORM persistence problems and their solutions for Seam/JPA/Hibernate apps. Much of the material in this article is based on chapter 13 of JPA/Hibernate book, Optimizing fetching and caching
and Chapter 19 Improving performance
of Hibernate Community Documentation.
In any database-driven application, the goal is to always minimize database roundtrips. Ted Neward stated this over and over again in my Intro to J2EE developmentor course several years ago. It's generally accepted for n-tier web architectures that the database is the least scalable part of your system.
In Seam apps, we need to minimize db roundtrips, avoid Hibernate LazyInitializationExceptions (LIEs) and also avoid n+1 selects problem and Cartesian products problem.
How best can we achieve these goals?
The purpose of this article is to summarize the problems and some general guidelines for optimizing performance in your Seam apps while using the Seam-managed Persistence Context (SMPC) to solve the LIEs problem.
Your goal is to find the best retrieval method and fetching strategy for every use case in your application; at the same time, you also want to minimize the number of SQL queries for best performance.
pg. 561 JPA/Hibernate
default fetch plan: default fetch plan that applies to a particular entity association or collection.
fetching strategy: a strategy used by Hibernate to retrieve associated objects if the application needs to navigate the association. Fetch strategies can be declared in the O/R mapping metadata, or over-ridden by a particular HQL or Criteria query.
Join fetching: Hibernate retrieves the associated instance or collection in the same SELECT, using an OUTER JOIN.
Select fetching: a second SELECT is used to retrieve the associated entity or collection. Unless you explicitly disable lazy fetching by specifying lazy="false", this second select will only be executed when you access the association.
Subselect fetching: a second SELECT is used to retrieve the associated collections for all entities retrieved in a previous query or fetch. Unless you explicitly disable lazy fetching by specifying lazy="false", this second select will only be executed when you access the association.
Batch fetching: an optimization strategy for select fetching. Hibernate retrieves a batch of entity instances or collections in a single SELECT by specifying a list of primary or foreign keys.
LazyInitializationException (LIE): A LazyInitializationException will be thrown by Hibernate if an uninitialized collection or proxy is accessed outside of the scope of the Session, i.e., when the entity owning the collection or having the reference to the proxy is in the detached state.
Cartesian product: produces a new table consisting of all possible combinations of rows of two existing tables. In SQL, you express a Cartesian product by listing tables in the from clause: select * from ITEM i, BID b
Seam-managed persistence context (SMPC): A Seam-managed persistence context is just a built-in Seam component that manages an instance of EntityManager or Session in the conversation context. You can inject it with @In. Prevents LIEs from occurring because the EntityManager or Session is not closed until the end of the conversation (entities do not become detached prematurely).
Second Level Cache: a Hibernate feature that allows entities, collections and queries to be cached in memory beyond the scope of a transaction, reducing the number of trips to the database
LazyInitializationExceptions occur when an unitialized proxy or collection is attempted to be loaded lazily by Hibernate but the Session (or PersistenceContext) is closed. Two classical solutions to this problem have been using Open Session in View (OSIV) pattern using a servlet filter or simply eagerly loading your entities.
Because Seam is a stateful framework supporting conversations that span multiple screens or HTTP request/response cycles, it's important to keep the Persistence Context open until the LRC has ended. If the PC closed after each request/response cycle, then when object navigation occurs in the JSF EL, for example, Hibernate may throw a LIE because it is trying to access an uninitialized proxy or collection but the PC is already closed.
In Seam, the recommended solution to the LIE problem is using a SMPC. The SMPC is conversation-scoped and thus remains open until your LRC ends. In contrast, the EJB container-managed Persistence Context is either tx-scoped (default) or component-scoped (extended). Instead of injecting your EntityManager instance using @PersistenceContext, you use @In and the Seam container will inject EntityManager instance via the manager component pattern (i.e. @Unwrap) from org.jboss.seam.persistence.ManagedPersistenceContext core class into your Seam component. This means that every component that is part of the LRC will be using the same SMPC. Note that it is possible to inject more than one SMPC into the same component with each SMPC associated with a different EntityManagerFactory but that is outside the scope of this discussion - we are assuming one local RDBMS per Seam app.
To describe the n+1 selects problem, let's consider an example. We have a Customer entity. Each Customer has 1 or more (i.e. a collection) of Order entities. And we have a Product entity. Each Order may have a collection of Products.
Via dot-notation in JSF EL expression in a JSF page, we may see something like this:
<h:outputText value="#{myBean.customer.orders.products}"/>
If we are using Hibernate as our persistence provider, and assuming all relationships are LAZY by default and we do not override the global fetch plan using fetch join syntax in our HQL or JPQL query, there will be n+1 selects to get our final value of products.
The following three methods in our entity classes will be executed:
getCustomer() getOrders() getProducts()
The n+1 selects problem is a problem because there are excessive numbers of db roundtrips (extra selects being executed).
There are a few approaches you can take to remedy this problem, one of which is using EAGER fetch plan via fetch join syntax in your queries. The opposite problem with this approach is that you may end up loading too much data into your SMPC, thus taking up more memory on the server and also taking a performance hit to load the extra possibly unneeded data.
So there needs to be a balance.
The opposite of the n+1 selects problem are SELECT statements that fetch too much data. This Cartesian product problem always appears if you try to fetch several parallel
collections. More than one eagerly fetched collection per persistent class creates a product.
begin Strategies...
Batch fetching
If you have a Foo proxy that must be initialized, initialize several in the same SELECT. Batching fetching is called a blind-guess optimization because you don't know how many unitialized Foo proxies may be in a particular persistence context. You make a guess and apply a batch-size fetching strategy to your Foo class mapping.
A better optimization is subselect fetching for a collection mapping. Hibernate now initializes all bars collections for all loaded Foo objects, as soon as you force the initialization of one bars collection.
Every time I need an Item, I also need the seller of that Item.
If you can make that statement, you should go into your mapping metadata, enable eager fetching for the seller assocation, and utilize SQL joins. Hibernate will now load both an Item and its seller in a single SQL statement.
If you switch from the default strategy to queries that eagerly fetch data with joins, you may run into another problem, the Cartesian product issue. Instead of executing too many SQL statements, you may now (often as a side effect) create statements that retrieve too much data. You need to find the middle ground between the two extremes: the correct fetching strategy for each procedure and use case in your application.
In most cases, effective caching strategies for your Seam application when using Hibernate as your persistence provider are critical to successfully performance tune your app. For example, if you have read-mostly or read-only data (e.g. the 50 US states, zipcodes, area codes, etc.), then these are good candidates for Hibernate 2nd level cache with the goal being to minimize db roundtrips when possible.
Hibernate uses different types of caches. Each type of cache is used for different purposes.
1) The first cache type is the session cache. The session cache caches objects within the current session. This is analogous to first-level cache in JPA's persistence context.
2) The second cache type is the query Cache. The query cache is responsible for caching queries and their results.
3) The third cache type is the second level cache. The second level cache is responsible for caching objects across sessions.
Here is a good article describing how to set up a 2nd level cache using EHCache (you must select a cache provider):
http://solutionsfit.com/blog/2009/04/06/second-level-caching-still-an-effective-performance-tuning-technique/
In addition, you may also want to look into using query caches for your queries. Query caches require a 2nd level cache and cache provider to be configured and setup.
This is a good article on these topics as well: http://www.javalobby.org/java/forums/t48846.html
The persistence manager's persistence context serves as a first-level cache. When a JPQL/HQL query loads entities into the persistence context, they are in managed state. The persistence manager makes a copy of the loaded data to enable dirty checking (for transparent updates at flush time). Because the entity has been loaded into memory, it will be available for the duration of the life of the persistence context (i.e. until it is closed).
Using SMPC and Hibernate manual flush, you can queue up your CUD operations while the user is executing their use case in the various JSF pages, and then flush the persistence context manually when the conversation ends. This avoids unnecessary round trips to the db during the conversation and thus serves as a cache.
A completely different approach to problems with N+1 selects is to use the second-level cache. A second-level cache is useful for read-only or read-mostly data. Examples include states, zip codes, credit card types, etc. You must choose a cache provider. Examples include terracotta, Ehcache, JBoss Cache. Then you must configure your persistence-unit in persistence.xml as well as add appropriate metadata for your entities that need to be cached (typically using @Cache at the entity class level). Here is an example:
@Cache(usage = CacheConcurrencyStrategy.READ_ONLY)
The advantages of L2 caching are:
- avoids database access for already loaded entities
- faster for reading frequently accessed unmodified entities
The disadvantages of L2 caching are:
- memory consumption for large amount of objects
- Stale data for updated objects
- Concurrency for write (optimistic lock exception, or pessimistic lock)
- Bad scalability for frequent or concurrently updated entities
You should configure L2 caching for entities that are:
- read often
- modified infrequently
- Not critical if stale
You should protect any data that can be concurrently modified with a locking strategy:
- Must handle optimistic lock failures on flush/commit
- configure expiration, refresh policy to minimize lock failures
The Query cache is useful for queries that are run frequently with the same parameters, for not modified tables.
<s:cache>
Use this Seam tag to cache JSF page fragments.
The following is a list of references used while researching and writing this article.
JPA with Hibernate by CBauer, GKing
Seam in Action by DAllen
Seam Framework: Experience the Evolution of Java EE (2nd Edition) by MYuan, JOrshalick, THeute
Hibernate: Understanding Lazy Fetching http://www.javalobby.org/java/forums/t20533.html
Second-level caching: Still an effective performance tuning technique http://solutionsfit.com/blog/2009/04/06/second-level-caching-still-an-effective-performance-tuning-technique/
Hibernate: Truly Understanding the Second-Level and Query Caches http://www.javalobby.org/java/forums/t48846.html
Understanding Caching in Hibernate – Part One : The Session Cache http://blog.dynatrace.com/2009/02/16/understanding-caching-in-hibernate-part-one-the-session-cache/
Understanding Caching in Hibernate – Part Two : The Query Cache http://blog.dynatrace.com/2009/02/16/understanding-caching-in-hibernate-part-two-the-query-cache/
Understanding Caching in Hibernate – Part Three : The Second Level Cache http://blog.dynatrace.com/2009/03/24/understanding-caching-in-hibernate-part-three-the-second-level-cache/
With Read-Write hibernate 2nd level Cache strategy, why use transactional one at all http://clustermania.blogspot.com/2009/07/with-read-write-hibernate-2nd-level.html
MySQL for Developers http://blogs.sun.com/carolmcdonald/entry/mysql_for_developers
JPA Performance, Don't Ignore the Database http://weblogs.java.net/blog/caroljmcdonald/archive/2009/08/28/jpa-performance-dont-ignore-database-0
JPA Caching http://weblogs.java.net/blog/caroljmcdonald/archive/2009/08/jpa_caching.html
Hibernate reference: 19.1 fetching strategies http://docs.jboss.org/hibernate/stable/core/reference/en/html/performance.html#performance-fetching
Hibernate Community Documentation: Chapter 19. Improving performance http://docs.jboss.org/hibernate/stable/core/reference/en/html/performance.html
Seam reference documentation http://docs.jboss.org/seam/2.2.0.GA/reference/en-US/html_single/
Ehcache User Guide http://ehcache.org/EhcacheUserGuide.html
Using JBoss Cache 2 as a Hibernate Second Level Cache http://galder.zamarreno.com/wp-content/uploads/2008/09/hibernate-jbosscache-guide.pdf
Open Session in View Pattern https://www.hibernate.org/43.html
SMPC and ORM problems discussion http://seamframework.org/Community/SeamAProductInInfancyOrJustAnotherFramework#comment92557