Distinct vs. Group By in MySQL (and CakePHP)

Posted on 05/21/2009 at 11:27 am by Kevin Wentworth
Viewed 14,852 times | 2 comments

An interesting problem- I am returning a list of files that have already been uploaded, so the user can link to an existing file.  This creates a new record (duplicate record).  The next time the user goes to select a file, there will now be two entries for the same file.  DISTINCT to the rescue, or so I thought.

DISTINCT applies to the whole select statement

This is confusing in its own right.  The CakePHP book misleadingly demonstrates that you can setup the fields parameter like so:

  1. $this->find('all', array('fields' => array('Model.field1', 'DISTINCT Model.field2'));

I tried this and a MySQL error kept getting returned.  I switched over to mysql and ran the query until it worked.  Turns out that the DISTINCT keyword has to come first in the SELECT statement, and it applies to the whole select statement (e.g. same category, but different name value makes it distinct).  I'm not sure really how to get CakePHP to output the correct query using the DISTINCT modifier- I think I would start the fields array() with DISTINCT... let me know if you figure it out.

GROUP BY applies only to the field(s) specified

The GROUP BY command takes a different approach and essentially "groups" all results by the fields you specify and returns a single instance of each unique value (what I wanted).  The code is simple:

  1. $this->find('all', array('fields' => array('Model.field1', 'Model.field2'), 'group' => 'Model.field2');

This returned exactly what I needed.  Group By is the way to go, I think, in most cases. The only time DISTINCT is really good, is for returing all the distinct values of a single column.

To Recap

  • DISTINCT applies to the entire SELECT statement (DISTINCT across all fields listed, not just 1 field if more than 1 are listed)
  • GROUP BY applies to only the specified fields (a distinct value for the group by field only, doesn't factor in other fields being returned)
  • For most cases use GROUP BY if you are returning more than 1 field and want distinct values for a particular column in the field list

Thanks to this simple example. Cheers,

-Kevin Wentworth

Bookmark and Share

Tags for Distinct vs. Group By in MySQL (and CakePHP)

Cakephp | Web Programming | Mysql | Usage

Comments for this Posting

Posted by vitucho

on 16/3/10

Thanks a lot for this information. I was driving crazy with this 'distinct' issue. The 'group' strategy was very cool.

Thanks again.

Posted by Marco

on 12/1/11

I came across this searching for Distinct solutions in CakePHP and thought it's worth mentioning ....

Distinct, Group By, or any other operation that forces sorting of a result set is very expensive. Why not prevent the duplicate record from being inserted in the first place so that you don't need to rely on either?

You're right that Distinct or Group By hide the problem this time, but you'll wind up chasing the same problem later on unless you fix the problem at the data level.

Just sharing.

Sorry, comments are closed for this posting.

Please Email Kevin if you have any questions. Thanks!

Meet Site Avenger - Hosted Content Management System

Powered By: Site Avenger | Site Production: Saco Design