mysql - Optimizing a SQL Query with Complex Filtering -


please note question below mysql.

imagine table called cars following structure (we can ignore lack of proper key constraints, etc. not relevant question):

create table cars (   id integer,   maker_id integer,   status_id integer,   notes varchar(100) ); 

now imagine loading test data this:

insert cars (id, maker_id, status_id, notes) values (1, 1001, 0, 'test1'), (2, 1001, 0, 'test2'), (3, 1001, 0, 'test3'), (4, 1002, 0, 'test4'), (5, 1002, 0, 'test5'), (6, 1002, 1, 'test6'), (7, 1002, 1, 'test7'), (8, 1002, 2, 'test8'), (9, 1003, 3, 'test9'), (10, 1003, 3, 'test10'), (11, 1003, 4, 'test11'), (12, 1003, 4, 'test12'), (13, 1003, 5, 'test13'), (14, 1003, 5, 'test14') 

there 14 records, 3 distinct values in maker_id (1001, 1002, 1003), , 6 distinct values in status_id (0,1,2,3,4,5).

now, imagine taking distinct pairs of (maker_id, status_id).

select distinct maker_id, status_id cars; 

here link example in sql fiddle: http://sqlfiddle.com/#!9/cb1c7/2

this results in following records (maker_id, status_id):

  • (1001, 0)
  • (1002, 0)
  • (1002, 1)
  • (1002, 2)
  • (1003, 3)
  • (1003, 4)
  • (1003, 5)

the logic need return follows:

if given maker_id value (e.g., 1001) has 1 distinct record corresponding distinct (maker_id, status_id) pairs, return it. in example: (1001, 0).

if given maker_id value has more 1 distinct record corresponding distinct (maker_id, status_id) pairs, return of them except 1 status_id value of 0. in example: (1002, 1), (1002, 2), (1003, 3), (1003, 4), , (1003, 5).

notice left out (1002, 0).

can think of conciser / more efficient (in terms of runtime) way of writing query? in real world, table has millions of records.

i have come following:

select   subq.maker_id,   subq.status_id (   select distinct     maker_id,     status_id,     (select count(*) cars maker_id = c.maker_id , status_id != 0 group maker_id) counter   cars c ) subq    subq.counter null   or (subq.counter not null , subq.status_id != 0) ; 

here example in sql fiddle: http://sqlfiddle.com/#!9/cb1c7/3

there's several query patterns can return specified result. going more complicated others. there's big differences in performance.

performing group by operation on huge set can costly (in terms of resources , elapsed time if mysql can't make use of index optimize operation. (using group by operation 1 way count of status_id each maker_id.)

and correlated subqueries can expensive, when executed repeatedly. see better performance out of correlated subqueries when number of times need executed limited.

i think best shot @ getting performance this:

not tested

 select c.maker_id       , c.status_id    cars c   c.status_id > 0   union   select d.maker_id       , d.status_id    cars d    left    join cars e      on e.maker_id = d.maker_id     , e.status_id > 0   e.maker_id null       , d.status_id = 0 

as whether that's more efficient or more concise other query approaches, we'd need test.

but shot @ performance query, going need index.

..  on cars (maker_id, status_id) 

we expect explain output show "using index" in extra column. , we're not expecting "using filesort".

one big downside of approach that's going 2 passes through table (or index).

the first select pretty straightforward... me rows status_id not zero. need rows. it's possible index e.g.

... on cars (status_id, maker_id) 

might of benefit query. if we're returning significant portion of table, i'd bet dollars donuts full scan of other index fast, or faster.

the second select uses anti-join pattern. doing getting rows have status_id equal zero, , set, "filtering out" of rows there's row, same maker_id status_id other zero.

we filtering outer join operation (left join) return rows status_id=0, along , matching rows. trick predicate in where clause filters out rows had match. we're left rows didn't find match. is, values of maker_id have only status_id=0 row.

we equivalent result using not exists predicate rather anti-join. in experience, performance not good. re-write second select (following union all operation)

 select d.maker_id       , d.status_id    cars d   d.status_id = 0     , not exists         ( select 1             cars e            e.maker_id = d.maker_id              , e.status_id > 0         ) 

and performance of query going dependent on suitable index anti-join is.

important: not omit all keyword. union all operation concatenates results of 2 queries. if omit all keyword, requiring mysql perform "sort unique" operation eliminate duplicate rows.

note: reason union all rather or condition i've gotten better query plans union all. mysql optimizer doesn't seem or when predicates on different columns , conditions, , either predicate can used "drive" execution plan. union all, breaking 2 queries, can plan both parts.


Comments

Popular posts from this blog

get url and add instance to a model with prefilled foreign key :django admin -

css - Make div keyboard-scrollable in jQuery Mobile? -

ruby on rails - Seeing duplicate requests handled with Unicorn -