3 неща :
- Преизчислявате едно и също нещо около зилион и половина пъти (всъщност всичко зависи само от някои параметри, които са еднакви за много редове)
- Агрегатите са по-ефективни в големи парчета (JOIN), отколкото в малки битове (подзаявки)
- MySQL е изключително бавен с подзаявки.
Така че, когато изчислявате „броя на гласовете по option_id“ (което се нуждае от сканиране на голямата таблица) и след това трябва да изчислите „броя на гласовете по poll_id“, добре, не започвайте отново голямата таблица, просто използвайте предишните резултати!
Можете да направите това с ROLLUP.
Ето заявка, която ще направи това, от което се нуждаете, работеща на Postgres.
За да накарате MySQL да направи това, ще трябва да замените всички оператори "WITH foo AS (SELECT...)" с временни таблици. Това е лесно. Временните таблици на MySQL в паметта са бързи, не се страхувайте да ги използвате, тъй като това ще ви позволи да използвате повторно резултатите от предишните стъпки и ще спестите много изчисления.
Генерирах произволни тестови данни, изглежда работят. Изпълнява се за 0,3 секунди...
WITH
-- users of interest : target group
uids AS (
SELECT DISTINCT user_id
FROM options
JOIN responses USING (option_id)
WHERE poll_id=22
),
-- votes of everyone and target group
votes AS (
SELECT poll_id, option_id, sum(all_votes) AS all_votes, sum(target_votes) AS target_votes
FROM (
SELECT option_id, count(*) AS all_votes, count(uids.user_id) AS target_votes
FROM responses
LEFT JOIN uids USING (user_id)
GROUP BY option_id
) v
JOIN options USING (option_id)
GROUP BY poll_id, option_id
),
-- totals for all polls (reuse previous result)
totals AS (
SELECT poll_id, sum(all_votes) AS all_votes, sum(target_votes) AS target_votes
FROM votes
GROUP BY poll_id
),
poll_options AS (
SELECT poll_id, count(*) AS poll_option_count
FROM options
GROUP BY poll_id
)
-- reuse previous tables to get some stats
SELECT *, ABS(total_percent - subgroup_percent) AS deviation
FROM (
SELECT
poll_id,
option_id,
v.target_votes / v.all_votes AS subgroup_percent,
t.target_votes / t.all_votes AS total_percent,
poll_option_count
FROM votes v
JOIN totals t USING (poll_id)
JOIN poll_options po USING (poll_id)
) AS foo
ORDER BY deviation DESC, poll_option_count DESC;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=14910.46..14910.56 rows=40 width=144) (actual time=299.844..299.862 rows=200 loops=1)
Sort Key: (abs(((t.target_votes / t.all_votes) - (v.target_votes / v.all_votes)))), po.poll_option_count
Sort Method: quicksort Memory: 52kB
CTE uids
-> HashAggregate (cost=1801.43..1850.52 rows=4909 width=4) (actual time=3.935..4.793 rows=4860 loops=1)
-> Nested Loop (cost=0.00..1789.16 rows=4909 width=4) (actual time=0.029..2.555 rows=4860 loops=1)
-> Seq Scan on options (cost=0.00..3.50 rows=5 width=4) (actual time=0.008..0.032 rows=5 loops=1)
Filter: (poll_id = 22)
-> Index Scan using responses_option_id_key on responses (cost=0.00..344.86 rows=982 width=8) (actual time=0.012..0.298 rows=972 loops=5)
Index Cond: (public.responses.option_id = public.options.option_id)
CTE votes
-> HashAggregate (cost=13029.43..13032.43 rows=200 width=24) (actual time=298.255..298.317 rows=200 loops=1)
-> Hash Join (cost=13019.68..13027.43 rows=200 width=24) (actual time=297.953..298.103 rows=200 loops=1)
Hash Cond: (public.responses.option_id = public.options.option_id)
-> HashAggregate (cost=13014.18..13017.18 rows=200 width=8) (actual time=297.839..297.879 rows=200 loops=1)
-> Merge Left Join (cost=399.13..11541.43 rows=196366 width=8) (actual time=9.301..230.467 rows=196366 loops=1)
Merge Cond: (public.responses.user_id = uids.user_id)
-> Index Scan using responses_pkey on responses (cost=0.00..8585.75 rows=196366 width=8) (actual time=0.015..121.971 rows=196366 loops=1)
-> Sort (cost=399.13..411.40 rows=4909 width=4) (actual time=9.281..22.044 rows=137645 loops=1)
Sort Key: uids.user_id
Sort Method: quicksort Memory: 420kB
-> CTE Scan on uids (cost=0.00..98.18 rows=4909 width=4) (actual time=3.937..6.549 rows=4860 loops=1)
-> Hash (cost=3.00..3.00 rows=200 width=8) (actual time=0.095..0.095 rows=200 loops=1)
-> Seq Scan on options (cost=0.00..3.00 rows=200 width=8) (actual time=0.007..0.043 rows=200 loops=1)
CTE totals
-> HashAggregate (cost=5.50..8.50 rows=200 width=68) (actual time=298.629..298.640 rows=40 loops=1)
-> CTE Scan on votes (cost=0.00..4.00 rows=200 width=68) (actual time=298.257..298.425 rows=200 loops=1)
CTE poll_options
-> HashAggregate (cost=4.00..4.50 rows=40 width=4) (actual time=0.091..0.101 rows=40 loops=1)
-> Seq Scan on options (cost=0.00..3.00 rows=200 width=4) (actual time=0.005..0.020 rows=200 loops=1)
-> Hash Join (cost=6.95..13.45 rows=40 width=144) (actual time=298.994..299.554 rows=200 loops=1)
Hash Cond: (t.poll_id = v.poll_id)
-> CTE Scan on totals t (cost=0.00..4.00 rows=200 width=68) (actual time=298.632..298.669 rows=40 loops=1)
-> Hash (cost=6.45..6.45 rows=40 width=84) (actual time=0.335..0.335 rows=200 loops=1)
-> Hash Join (cost=1.30..6.45 rows=40 width=84) (actual time=0.140..0.263 rows=200 loops=1)
Hash Cond: (v.poll_id = po.poll_id)
-> CTE Scan on votes v (cost=0.00..4.00 rows=200 width=72) (actual time=0.001..0.030 rows=200 loops=1)
-> Hash (cost=0.80..0.80 rows=40 width=12) (actual time=0.130..0.130 rows=40 loops=1)
-> CTE Scan on poll_options po (cost=0.00..0.80 rows=40 width=12) (actual time=0.093..0.119 rows=40 loops=1)
Total runtime: 300.132 ms