MySQL实战之索引

索引篇

普通索引和唯一索引，应该怎么选

查询过程

更新过程

MySQL为什么有时候会选错索引

先建一个简单的表，表里有a、b两个字段，并分别建上索引：

1
2
3
4
5
6
7
8


CREATE TABLE `t` (
  `id` int(11) NOT NULL,
  `a` int(11) DEFAULT NULL,
  `b` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `a` (`a`),
  KEY `b` (`b`)
) ENGINE=InnoDB;

我们往表t中插入10万行记录，取值按整数递增，即：(1,1,1)，(2,2,2)，(3,3,3) 直到(100000,100000,100000)。

小技巧每10000行才执行一次事务提交, 大大提升插入速度

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17


delimiter ;;
create procedure idata()
begin
  declare i int;
  set i=1;
  start transaction;
  while(i<=100000)do
    insert into t values(i, i, i);
    set i=i+1;
    if mod(i,10000)=0 then
      commit work and chain;
    end if;
  end while;
  COMMIT;
end;;
delimiter ;
call idata();

实验版本5.7 一直不能复现专栏上的现象

1
2
3
4
5
6


mysql> select @@global.tx_isolation,@@tx_isolation,version(),"session A";
+-----------------------+-----------------+-----------------------------+-----------+
| @@global.tx_isolation | @@tx_isolation  | version()                   | session A |
+-----------------------+-----------------+-----------------------------+-----------+
| REPEATABLE-READ       | REPEATABLE-READ | 5.7.19-0ubuntu0.16.04.1-log | session A |
+-----------------------+-----------------+-----------------------------+-----------+

执行步骤

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29


-- sessionA 
start transaction with consistent snapshot;

-- sessionB
delete from t;
call idata();
explain select * from t where a between 10000 and 20000;

-- sessionA
commit;

-- 结果还是选择索引a
-- 继续
set global slow_query_log='ON'; -- 开启慢日志
set long_query_time=0; -- 这样就会记录所有日志了
select * from t where a between 10000 and 20000; /*Q1*/
select * from t force index(a) where a between 10000 and 20000;/*Q2*/

-- 查看慢日志 Rows_examined 没啥区别 专栏是 没强制使用a索引 Rows_examined是10w行
# Time: 2020-04-03T02:45:31.181614Z
# User@Host: homestead[homestead] @ localhost []  Id:    17
# Query_time: 0.030649  Lock_time: 0.000075 Rows_sent: 10001  Rows_examined: 10001
SET timestamp=1585881931;
select * from t where a between 10000 and 20000;
# Time: 2020-04-03T02:45:32.545506Z
# User@Host: homestead[homestead] @ localhost []  Id:    17
# Query_time: 0.032415  Lock_time: 0.000126 Rows_sent: 10001  Rows_examined: 10001
SET timestamp=1585881932;
select * from t force index(a) where a between 10000 and 20000;

优化器逻辑

选择索引是优化器的工作
优化器选择索引的目的，是找到一个最优的执行方案，并用最小的代价去执行语句
- 扫描行数越少，访问磁盘的次数越少，消耗的CPU资源越少。
- 扫描行数不是唯一标准，还结合是否使用临时表、是否排序等因素来综合判断。
- 扫描行数无法精确判断，只能估算。
- 区分度：统计信息就是索引的“区分度”，一个索引上，不同的值越多，区分度就越好。
- 基数（cardinality）：一个索引上不同的值的个数，越大，区分度越好。
- 基数采用“采样统计”，参数innodb_stats_persistent的值，不管N是20还是8，这个基数都是很容易不准的。
  - 设置为on的时候，表示统计信息会持久化存储。这时，默认的N是20，M是10。
  - 设置为off的时候，表示统计信息只存储在内存中。这时，默认的N是8，M是16。
  - analyze table 命令修整统计信息

索引选择异常和处理

案例分析 mysql> select * from t where (a between 1 and 1000) and (b between 50000 and 100000) order by b limit 1;

方案一：强制索引force index
- 强制选索引a mysql> select * from t force index(a) where (a between 1 and 1000) and (b between 50000 and 100000) order by b limit 1;
方案二：引导MySQL使用我们期望的索引 “order by b limit 1” 改成 “order by b,a limit 1”
- 优化器选择使用索引b，是因为它认为使用索引b可以避免排序
- 修改后两个索引都需要排序，扫描行数成了影响决策的主要条件，于是此时优化器选了只需要扫描1000行的索引a。
方案三：在有些场景下，我们可以新建一个更合适的索引，来提供给优化器做选择，或删掉误用的索引。

怎么给字符串字段加索引

直接创建完整索引，这样可能比较占用空间；
创建前缀索引，节省空间，但会增加查询扫描次数，并且不能使用覆盖索引；
使用函数reverse()倒序存储，再创建前缀索引，用于绕过字符串本身前缀的区分度不够的问题；
使用函数crc32()创建hash字段索引，查询性能稳定，有额外的存储和计算消耗，跟第三种方式一样，都不支持范围扫描。

文章目录