mysql - Optimizing a SQL Query with Complex Filtering -
please note question below mysql.
imagine table called cars
following structure (we can ignore lack of proper key constraints, etc. not relevant question):
create table cars ( id integer, maker_id integer, status_id integer, notes varchar(100) );
now imagine loading test data this:
insert cars (id, maker_id, status_id, notes) values (1, 1001, 0, 'test1'), (2, 1001, 0, 'test2'), (3, 1001, 0, 'test3'), (4, 1002, 0, 'test4'), (5, 1002, 0, 'test5'), (6, 1002, 1, 'test6'), (7, 1002, 1, 'test7'), (8, 1002, 2, 'test8'), (9, 1003, 3, 'test9'), (10, 1003, 3, 'test10'), (11, 1003, 4, 'test11'), (12, 1003, 4, 'test12'), (13, 1003, 5, 'test13'), (14, 1003, 5, 'test14')
there 14 records, 3 distinct
values in maker_id
(1001, 1002, 1003), , 6 distinct
values in status_id
(0,1,2,3,4,5).
now, imagine taking distinct
pairs of (maker_id
, status_id
).
select distinct maker_id, status_id cars;
here link example in sql fiddle: http://sqlfiddle.com/#!9/cb1c7/2
this results in following records (maker_id
, status_id
):
- (1001, 0)
- (1002, 0)
- (1002, 1)
- (1002, 2)
- (1003, 3)
- (1003, 4)
- (1003, 5)
the logic need return follows:
if given maker_id
value (e.g., 1001) has 1 distinct record corresponding distinct
(maker_id
, status_id
) pairs, return it. in example: (1001, 0).
if given maker_id
value has more 1 distinct record corresponding distinct
(maker_id
, status_id
) pairs, return of them except 1 status_id
value of 0. in example: (1002, 1), (1002, 2), (1003, 3), (1003, 4), , (1003, 5).
notice left out (1002, 0).
can think of conciser / more efficient (in terms of runtime) way of writing query? in real world, table has millions of records.
i have come following:
select subq.maker_id, subq.status_id ( select distinct maker_id, status_id, (select count(*) cars maker_id = c.maker_id , status_id != 0 group maker_id) counter cars c ) subq subq.counter null or (subq.counter not null , subq.status_id != 0) ;
here example in sql fiddle: http://sqlfiddle.com/#!9/cb1c7/3
there's several query patterns can return specified result. going more complicated others. there's big differences in performance.
performing group by
operation on huge set can costly (in terms of resources , elapsed time if mysql can't make use of index optimize operation. (using group by
operation 1 way count of status_id
each maker_id
.)
and correlated subqueries can expensive, when executed repeatedly. see better performance out of correlated subqueries when number of times need executed limited.
i think best shot @ getting performance this:
not tested
select c.maker_id , c.status_id cars c c.status_id > 0 union select d.maker_id , d.status_id cars d left join cars e on e.maker_id = d.maker_id , e.status_id > 0 e.maker_id null , d.status_id = 0
as whether that's more efficient or more concise other query approaches, we'd need test.
but shot @ performance query, going need index.
.. on cars (maker_id, status_id)
we expect explain output show "using index" in extra
column. , we're not expecting "using filesort".
one big downside of approach that's going 2 passes through table (or index).
the first select pretty straightforward... me rows status_id
not zero. need rows. it's possible index e.g.
... on cars (status_id, maker_id)
might of benefit query. if we're returning significant portion of table, i'd bet dollars donuts full scan of other index fast, or faster.
the second select
uses anti-join pattern. doing getting rows have status_id
equal zero, , set, "filtering out" of rows there's row, same maker_id
status_id
other zero.
we filtering outer join operation (left join
) return rows status_id=0
, along , matching rows. trick predicate in where
clause filters out rows had match. we're left rows didn't find match. is, values of maker_id
have only status_id=0
row.
we equivalent result using not exists
predicate rather anti-join. in experience, performance not good. re-write second select
(following union all
operation)
select d.maker_id , d.status_id cars d d.status_id = 0 , not exists ( select 1 cars e e.maker_id = d.maker_id , e.status_id > 0 )
and performance of query going dependent on suitable index anti-join is.
important: not omit all
keyword. union all
operation concatenates results of 2 queries. if omit all
keyword, requiring mysql perform "sort unique" operation eliminate duplicate rows.
note: reason union all
rather or
condition i've gotten better query plans union all
. mysql optimizer doesn't seem or
when predicates on different columns , conditions, , either predicate can used "drive" execution plan. union all
, breaking 2 queries, can plan both parts.
Comments
Post a Comment