MySql下limit与row_number()分页

MySql下limit与row_number()分页

Opynicus Lv1

因为某个无形的大手导致的x创改造,项目数据库需要进行更换,从IBM DB2切换到OceanBase(MySql)。相比起来性能或是引擎优化不如原来,部分原本就是slow query直接变成sql timeout。在进过一番各种子查询和加索引优化后,得到的优化后的sql内容如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
SELECT
p.id,
p.name,
p.description,
w.wine_id,
w.wine_name,
w.vintage,
s.storage_id,
s.location,
sa.account_id,
sa.account_name,
wi.info_id,
wi.details,
wi.date_info
FROM
(
SELECT
DISTINCT p.id,
p.name,
p.description,
w.wine_id,
w.wine_name,
w.vintage,
s.storage_id,
s.location,
sa.account_id,
sa.account_name,
wi.info_id,
wi.details,
wi.date_info
FROM
portfolio
) p
LEFT JOIN wines w ON w.wine_id = p.wine_id
AND w.vintage = p.wine_vintage
LEFT JOIN storage s ON s.storage_id = p.storage_id
LEFT JOIN storageaccounts sa ON sa.warehouse_id = s.warehouse_id
AND sa.user_id = p.user_id
LEFT JOIN wineinfo wi ON wi.wine_id = w.wine_id
AND wi.vintage = w.vintage
LEFT JOIN orders o ON o.portfolio_id = p.id
LEFT JOIN customers c ON c.customer_id = o.customer_id
LEFT JOIN payments pay ON pay.order_id = o.order_id
LEFT JOIN shipments sh ON sh.order_id = o.order_id
LEFT JOIN reviews r ON r.wine_id = w.wine_id
AND r.user_id = c.customer_id
WHERE
p.userID = 'some_user'
AND w.available = 1
AND s.active = 1
AND sa.status = 'active'
AND (
sa.remark like concat('%', 'tst', '%')
or p.remark like concat('%', 'tst', '%')
ORDER BY
) wi.date_info DESC limit 0, 20;

嗯,常规情况下确实没啥问题,但是不知为什么在上述sql中存在问题(区别就是两个like + or)。

order by + limit的配合,嗯,常见的组合。当时查了下执行计划,发现走的是top-n,也正常走了该走的索引,但仍然超时。

然后被给了个建议,改用row_number()处理分页逻辑,修改后结果如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
select
*
from
(
select p.*,
row_number() over() as row
from
(
SELECT
p.id,
p.name,
p.description,
w.wine_id,
w.wine_name,
w.vintage,
s.storage_id,
s.location,
sa.account_id,
sa.account_name,
wi.info_id,
wi.details,
wi.date_info
FROM
(
SELECT
DISTINCT p.id,
p.name,
p.description,
w.wine_id,
w.wine_name,
w.vintage,
s.storage_id,
s.location,
sa.account_id,
sa.account_name,
wi.info_id,
wi.details,
wi.date_info
FROM
portfolio
) p
LEFT JOIN wines w ON w.wine_id = p.wine_id
AND w.vintage = p.wine_vintage
LEFT JOIN storage s ON s.storage_id = p.storage_id
LEFT JOIN storageaccounts sa ON sa.warehouse_id = s.warehouse_id
AND sa.user_id = p.user_id
LEFT JOIN wineinfo wi ON wi.wine_id = w.wine_id
AND wi.vintage = w.vintage
LEFT JOIN orders o ON o.portfolio_id = p.id
LEFT JOIN customers c ON c.customer_id = o.customer_id
LEFT JOIN payments pay ON pay.order_id = o.order_id
LEFT JOIN shipments sh ON sh.order_id = o.order_id
LEFT JOIN reviews r ON r.wine_id = w.wine_id
AND r.user_id = c.customer_id
WHERE
p.userID = 'some_user'
AND w.available = 1
AND s.active = 1
AND sa.status = 'active'
AND (
sa.remark like concat('%', 'tst', '%')
or p.remark like concat('%', 'tst', '%')
ORDER BY
) wi.date_info DESC
) as p
) as q
where
q.row >= 0 ad q.row < 20
)

不过令人迷惑的是只要加上这个特殊的条件判断字段(like + or + like),使用limit处理分页就会超时,而row_number()不会超时。但查询中不带这个模糊匹配字段,则使用row_number()反而会超时, limit不会。

这两种分页方式的区别在于,limit分页是通过逐行计数丢弃的方式处理分页,并非直接定位。row_number()可以通过窗口函数为结果集生成行号,然后通过子查询筛选行号范围。若row_number() over(order by indexed_column)中的列有索引,MySQL可能通过索引顺序生成行号,避免全表排序与临时排序。并且在offset过大的情况下,row_number()处理分页会有更明显的优势。

但为什么仍然会有情况导致row_number()处理分页超时,limit反而不会呢?这里我问了下R1,它给出的可能情况是这样的:

但这种回答感觉也只是针对特别情况下的特别场景罢了,核心问题还是row_number()分页超时的本质是全结果集排序与临时表开销导致的,在这些特殊情况下使用limit分页反而没有问题?这种问题甚至在执行计划上也是无法看出具体差异的,或许这整段sql逻辑都应该被重写吧(。所以还是得具体情况具体分析,做好测试什么的。

Comments
On this page
MySql下limit与row_number()分页