Get unique results from joined queries
Problem:
A recurring issue that new NHibernate users keep asking about, is the fact that they sometimes get duplicate results from their queries
For example, take a look at the next model:
where Comment.Name refer to the name of the person who wrote the comment.
If you'd want to issue a query on Post that joins to Comment, say "all of the posts that were commented by 'Ken Egozi'", the next hql query should do the trick:
select pfrom Post pjoin p.Comments cwhere c.Name = 'Ken Egozi'";
var hql = @"
select p
from Post p
join p.Comments c
where c.Name = 'Ken Egozi'";
var posts = session.CreateQuery(hql)
.List<post>();
Assuming the next data is in the DB:
Post#1
Comment by 'Ken Egozi' Comment by 'Fabio Maulo' Comment by 'Ken Egozi' Post#2 Comment by 'Fabio Maulo' Post#3 Comment by 'Fabio Maulo' Comment by 'Ken Egozi'
you'd expect the list 'posts' to include two posts: Post#1 and Post#3.
However, if you'd look into posts, you'll see that it has three entries, two entries pointing to Post#1 and one entry pointing to Post#3 BuyTermPaperANDBuyResearchPaper
Note that NHibernate is smart enough not to create two separate instances for Post#1, but rather a single instance will be created, and the two entries in the list will point to the same instance in memory. BuyThesisANDBuyDissertationANDBuyEssay
Explanation:
The said hql query (or an equivalent Criteria) is translated to the joined SELECT clause
SELECT {p.*}FROM Posts pJOIN Comments c ON c.PostId = p.IdWHERE c.[Name] = 'Ken Egozi'
where {p.*} is substituted by the list of fields needed to satisfy a Post object. Running this SQL query directly on the data detailed above, will result with three lines in the resultset.
Solutions
There are three solutions to that in SQL world:
- The DISTINCT keyword, when added on the fields of Posts, will ensure that there will be no duplicate lines. there are a few problems with that. Some RDBMS systems will not allow DISTINCT over certain data types, like the blob types in SQL Server. And if you also need to retrieve the joined data, then the DISTINCT keyword will be useless.
- Using a subselect
SELECT *FROM Posts pWHEREEXISTS (SELECT 1FROM Comments cWHERE c.PostId = p.IdAND c.[Name] = 'Ken Egozi' )
- Remove duplicates in the client code.
The solutions in NHibernate's world, in accordance to the SQL solutions:
- Use the 'distinct' keyword.
The problems that exists in the SQL world still apply, and it can only be used in HQL, not it Criteria API. - Use subselects.
The syntax in HQL:and in Criteria API:from Post pwhereexists (from Comment cwhere c.Post = pand c.Name = 'Ken Egozi' )
var commentsByKenEgozi = DetachedCriteria.For<comment>() .Add(Restrictions.Eq("Name", "Ken Egozi")) .Add(Restrictions.EqProperty("Post.Id", "p.Id")) .SetProjection(NHibernate.Criterion.Projections.Id()); var posts = session.CreateCriteria(typeof (Post), "p") .Add(Subqueries.Exists(commentsByKenEgozi)) .List<post>();</comment>
- Remove duplicates in client code.
That's actually pretty easy with NHibernate, thanks to the result transformers. all you need to do is add .SetResultTransformer(new DistinctRootEntityResultTransformer()) to either your query, or criteria, and NHibernate will remove duplicate entries from the resulted list.
so, using the same hql as in the first example, you can writevar hql = @" select p from Post p join p.Comments c where c.Name = 'Ken Egozi'"; var posts = session.CreateQuery(hql) .SetResultTransformer(new DistinctRootEntityResultTransformer()) .List<post>();