1. 程式人生 > >MySql 100萬級別資料中隨機獲取一條或多條記錄之RAND()優化

MySql 100萬級別資料中隨機獲取一條或多條記錄之RAND()優化

處理業務中,有這樣的需求,例如:有100W甚至更多的使用者,此時我們要隨機一條男性或者女性使用者出來做資料操作。基於這個需求,我們做一下實驗。

基礎準備

  1. 準備一張使用者表,結構如下
CREATE TABLE `user` (
  `uid` int(10) unsigned NOT NULL AUTO_INCREMENT COMMENT '使用者ID',
  `name` varchar(255) DEFAULT NULL COMMENT '使用者姓名',
  `age` tinyint(3) unsigned DEFAULT 0 COMMENT '年齡',
  `gender`
tinyint(3) unsigned DEFAULT 2 COMMENT '性別 2 人妖 1 男 0 女', `create_time` int(10) unsigned DEFAULT 0 COMMENT '建立時間', PRIMARY KEY (`uid`) ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;
  1. 簡單寫個MySQL儲存
DELIMITER ;;
USE `test`;;

DROP PROCEDURE IF EXISTS `autoFull`;;

CREATE DEFINER=`root`@`localhost` PROCEDURE`autoFull`(num INT
) BEGIN #Routine body goes here... DECLARE count INT DEFAULT 0; DECLARE i INT DEFAULT 0; set @exesql = concat("insert into user(name,age,gender,create_time) values "); set @exedata = ""; while count<num do set @exedata = concat(@exedata, ",('",MD5(i), "','", floor(rand()*37
+18), "','", ROUND(RAND() * 1), "','",current_timestamp(), "')"); set count=count+1; set i=i+1; if i%1000=0 then set @exedata = SUBSTRING(@exedata, 2); set @exesql = concat("insert into user(name,age,gender,create_time) values ", @exedata); prepare stmt from @exesql; execute stmt; DEALLOCATE prepare stmt; set @exedata = ""; end if; end while; if length(@exedata)>0 then set @exedata = SUBSTRING(@exedata, 2); set @exesql = concat("insert into user(name,age,gender,create_time) values ", @exedata); prepare stmt from @exesql; execute stmt; DEALLOCATE prepare stmt; end if; END;; DELIMITER ;

具體步驟:

mysql> use test
Database changed
mysql> 

mysql> CREATE TABLE `user` (
    ->   `uid` int(10) unsigned NOT NULL AUTO_INCREMENT COMMENT '使用者ID',
    ->   `name` varchar(255) DEFAULT NULL COMMENT '使用者姓名',
    ->   `age` tinyint(3) unsigned DEFAULT 0 COMMENT '年齡',
    ->   `gender` tinyint(3) unsigned DEFAULT 2 COMMENT '性別 2 人妖 1 男 0 女',
    ->   `create_time` int(10) unsigned  DEFAULT 0 COMMENT '建立時間',
    ->   PRIMARY KEY (`uid`)
    -> ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;

mysql> DELIMITER ;;
mysql> 
mysql> DROP PROCEDURE IF EXISTS `autoFull`;;
Query OK, 0 rows affected (0.00 sec)

mysql>  
mysql> CREATE DEFINER=`root`@`localhost` PROCEDURE`autoFull`(num INT)
    -> BEGIN
    -> #Routine body goes here...
    -> DECLARE count INT DEFAULT 0;
    -> DECLARE i INT DEFAULT 0;
    -> set @exesql = concat("insert into user(name,age,gender,create_time) values ");
    -> set @exedata = "";
    -> while count<num do 
    -> set @exedata = concat(@exedata, ",('",MD5(i), "','", floor(rand()*37+18), "','", ROUND(RAND() * 1), "','",current_timestamp(), "')");
    -> set count=count+1;
    -> set i=i+1;
    -> if i%1000=0
    -> then 
    -> set @exedata = SUBSTRING(@exedata, 2);
    -> set @exesql = concat("insert into user(name,age,gender,create_time) values ", @exedata);
    -> prepare stmt from @exesql;
    -> execute stmt;
    -> DEALLOCATE prepare stmt;
    -> set @exedata = "";
    -> end if;
    -> end while;
    -> 
    -> if length(@exedata)>0 
    -> then 
    -> set @exedata = SUBSTRING(@exedata, 2);
    -> set @exesql = concat("insert into user(name,age,gender,create_time) values ", @exedata);
    -> prepare stmt from @exesql;
    -> execute stmt;
    -> DEALLOCATE prepare stmt;
    -> end if;
    -> END;;
Query OK, 0 rows affected (0.00 sec)

mysql> DELIMITER ;//還原界定符

mysql> call autoFull(1000000);
Query OK, 0 rows affected, 64 warnings (1 min 3.81 sec)
//呼叫下儲存

檢視下自己的儲存

mysql> select count(uid) from user;
+------------+
| count(uid) |
+------------+
|    1000001 |
+------------+
1 row in set (0.20 sec)

mysql> 

原始簡單粗暴的 SQL 語句 select * from user order by RAND() LIMIT 1; (切勿使用)

mysql> select * from user order by RAND() LIMIT 1;
+--------+----------------------------------+------+--------+-------------+
| uid    | name                             | age  | gender | create_time |
+--------+----------------------------------+------+--------+-------------+
| 318393 | 48f8b305de34c87af8143fe1f24732ad |   24 |      0 |        2017 |
+--------+----------------------------------+------+--------+-------------+
1 row in set (17.19 sec)

mysql> 

簡單分析下:

mysql> explain select * from user order by RAND() LIMIT 1;
+----+-------------+-------+------+---------------+------+---------+------+---------+---------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows    | Extra                           |
+----+-------------+-------+------+---------------+------+---------+------+---------+---------------------------------+
|  1 | SIMPLE      | user  | ALL  | NULL          | NULL | NULL    | NULL | 1000340 | Using temporary; Using filesort |
+----+-------------+-------+------+---------------+------+---------+------+---------+---------------------------------+
1 row in set (0.00 sec)

type => all 呵呵,全表掃描 1000340 條資料
key => null 且沒有索引,我們是隨機查詢
MySql 手冊專門有提醒在 Order by 後面不能使用 RAND() 函式,會導致全表掃描

簡單優化 SQL 語句, 使用join

 SELECT * FROM user  AS u1  JOIN (SELECT ROUND(RAND() * ((SELECT MAX(uid) FROM `user`)-(SELECT MIN(uid) FROM user))+(SELECT MIN(uid) FROM user)) AS uid) AS u2 WHERE u1.uid >= u2.uid ORDER BY u1.uid LIMIT 1
mysql> SELECT * FROM user  AS u1  JOIN (SELECT ROUND(RAND() * ((SELECT MAX(uid) FROM `user`)-(SELECT MIN(uid) FROM user))+(SELECT MIN(uid) FROM user)) AS uid) AS u2 WHERE u1.uid >= u2.uid ORDER BY u1.uid LIMIT 1
    -> ;
+--------+----------------------------------+------+--------+-------------+--------+
| uid    | name                             | age  | gender | create_time | uid    |
+--------+----------------------------------+------+--------+-------------+--------+
| 798024 | c1a781e16b45f2ddc18b42365b8b0903 |   48 |      0 |        2017 | 798024 |
+--------+----------------------------------+------+--------+-------------+--------+
1 row in set (0.14 sec)

以上測試發現快了不少,來分析下這個SQL語句,該SQL 語句的核心 Join和隨機,隨機的基本公式: RAND()*(max-min)+mix,隨機出一個 uid as u2 然後 條件查詢,uid 自建索引,效率蠻高的。

mysql> explain SELECT * FROM user  AS u1  JOIN (SELECT ROUND(RAND() * ((SELECT MAX(uid) FROM `user`)-(SELECT MIN(uid) FROM user))+(SELECT MIN(uid) FROM user)) AS uid) AS u2 WHERE u1.uid >= u2.uid ORDER BY u1.uid LIMIT 1;
+----+-------------+------------+--------+---------------+---------+---------+------+--------+------------------------------+
| id | select_type | table      | type   | possible_keys | key     | key_len | ref  | rows   | Extra                        |
+----+-------------+------------+--------+---------------+---------+---------+------+--------+------------------------------+
|  1 | PRIMARY     | <derived2> | system | NULL          | NULL    | NULL    | NULL |      1 |                              |
|  1 | PRIMARY     | u1         | range  | PRIMARY       | PRIMARY | 4       | NULL | 500170 | Using where                  |
|  2 | DERIVED     | NULL       | NULL   | NULL          | NULL    | NULL    | NULL |   NULL | No tables used               |
|  5 | SUBQUERY    | NULL       | NULL   | NULL          | NULL    | NULL    | NULL |   NULL | Select tables optimized away |
|  4 | SUBQUERY    | NULL       | NULL   | NULL          | NULL    | NULL    | NULL |   NULL | Select tables optimized away |
|  3 | SUBQUERY    | NULL       | NULL   | NULL          | NULL    | NULL    | NULL |   NULL | Select tables optimized away |
+----+-------------+------------+--------+---------------+---------+---------+------+--------+------------------------------+
6 rows in set (0.01 sec)

如果想隨機多條呢?修改LIMIT?來看看

mysql> SELECT * FROM user  AS u1  JOIN (SELECT ROUND(RAND() * ((SELECT MAX(uid) FROM `user`)-(SELECT MIN(uid) FROM user))+(SELECT MIN(uid) FROM user)) AS uid) AS u2 WHERE u1.uid >= u2.uid ORDER BY u1.uid LIMIT 5;

mysql> SELECT * FROM user  AS u1  JOIN (SELECT ROUND(RAND() * ((SELECT MAX(uid) FROM `user`)-(SELECT MIN(uid) FROM user))+(SELECT MIN(uid) FROM user)) AS uid) AS u2 WHERE u1.uid >= u2.uid ORDER BY u1.uid LIMIT 5;
+--------+----------------------------------+------+--------+-------------+--------+
| uid    | name                             | age  | gender | create_time | uid    |
+--------+----------------------------------+------+--------+-------------+--------+
| 948535 | da2c5dbe42945a0cc5b46a1e6acf746b |   46 |      0 |        2017 | 948535 |
| 948536 | 8b6f00e7098d215d4a85b160fcbbce4f |   22 |      1 |        2017 | 948535 |
| 948537 | d3caa715182997a67fdf3ab245ea53f3 |   24 |      1 |        2017 | 948535 |
| 948538 | 727659b109bfe2a21f8be7a5c1d1b301 |   27 |      0 |        2017 | 948535 |
| 948539 | 4b5038af305fb4629d38d067f806c7ab |   27 |      0 |        2017 | 948535 |
+--------+----------------------------------+------+--------+-------------+--------+
5 rows in set (0.00 sec)

如果使用以上的SQL語句,發現查詢到的資料是連續的,我們要的是隨機的,不難理解 LIMIT 5 得到當前查詢條件的前五條,所以是相對連續的,uid 是自增的,因為用的是儲存插入的,實際專案也是相對連續的。這條SQL 一次性查詢無法達到我們的需求,則可分別一條條查詢,如果要求的隨機條數較多,那就不建議使用該條SQL語句了。

再來一條SQL語句

mysql>  SELECT * FROM user WHERE uid >= ((SELECT MAX(uid) FROM user)-(SELECT MIN(uid) FROM user)) * RAND() + (SELECT MIN(uid) FROM user) LIMIT 1
    -> ;
+------+----------------------------------+------+--------+-------------+
| uid  | name                             | age  | gender | create_time |
+------+----------------------------------+------+--------+-------------+
| 2343 | c8dfece5cc68249206e4690fc4737a8d |   34 |      0 |        2017 |
+------+----------------------------------+------+--------+-------------+
1 row in set (0.01 sec)

explain 下

mysql> explain SELECT * FROM user WHERE uid >= ((SELECT MAX(uid) FROM user)-(SELECT MIN(uid) FROM user)) * RAND() + (SELECT MIN(uid) FROM user) LIMIT 1;
+----+-------------+-------+------+---------------+------+---------+------+---------+------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows    | Extra                        |
+----+-------------+-------+------+---------------+------+---------+------+---------+------------------------------+
|  1 | PRIMARY     | user  | ALL  | NULL          | NULL | NULL    | NULL | 1000340 | Using where                  |
|  4 | SUBQUERY    | NULL  | NULL | NULL          | NULL | NULL    | NULL |    NULL | Select tables optimized away |
|  3 | SUBQUERY    | NULL  | NULL | NULL          | NULL | NULL    | NULL |    NULL | Select tables optimized away |
|  2 | SUBQUERY    | NULL  | NULL | NULL          | NULL | NULL    | NULL |    NULL | Select tables optimized away |
+----+-------------+-------+------+---------------+------+---------+------+---------+------------------------------+
4 rows in set (0.00 sec)

看看隨機多條

mysql> SELECT * FROM user WHERE uid >= ((SELECT MAX(uid) FROM user)-(SELECT MIN(uid) FROM user)) * RAND() + (SELECT MIN(uid) FROM user) limit 20;
+------+----------------------------------+------+--------+-------------+
| uid  | name                             | age  | gender | create_time |
+------+----------------------------------+------+--------+-------------+
|  786 | fc8001f834f6a5f0561080d134d53d29 |   24 |      0 |        2017 |
| 1961 | e4dd5528f7596dcdf871aa55cfccc53c |   54 |      1 |        2017 |
| 2958 | db5cea26ca37aa09e5365f3e7f5dd9eb |   27 |      0 |        2017 |
| 3122 | f231f2107df69eab0a3862d50018a9b2 |   43 |      1 |        2017 |
| 3445 | 12092a75caa75e4644fd2869f0b6c45a |   31 |      0 |        2017 |
| 4121 | 1b69ebedb522700034547abc5652ffac |   48 |      0 |        2017 |
| 4682 | 4f5c422f4d49a5a807eda27434231040 |   38 |      1 |        2017 |
| 4815 | 187acf7982f3c169b3075132380986e4 |   26 |      1 |        2017 |
| 5028 | f02208a057804ee16ac72ff4d3cec53b |   19 |      0 |        2017 |
| 5182 | fa3dade3a49305f27f64203452ac954c |   32 |      1 |        2017 |
| 5245 | b49d4455d64520060ac01fb5a3c757e4 |   34 |      1 |        2017 |
| 5405 | bb1443cc31d7396bf73e7858cea114e1 |   40 |      1 |        2017 |
| 5486 | 03b2ceb73723f8b53cd533e4fba898ee |   21 |      0 |        2017 |
| 5700 | 7f848746fe2599dc199a75f0d02fc3d6 |   36 |      0 |        2017 |
| 5835 | f5f3b8d720f34ebebceb7765e447268b |   36 |      0 |        2017 |
| 5991 | 1ae6464c6b5d51b363d7d96f97132c75 |   49 |      1 |        2017 |
| 6064 | 09ccf3183d9e90e5ae1f425d5f9b2c00 |   36 |      0 |        2017 |
| 6160 | 08ad21c6f9da6bdf51ae0b971f43d96d |   31 |      1 |        2017 |
| 6306 | ccf0304d099baecfbe7ff6844e1f6d91 |   48 |      1 |        2017 |
| 6810 | 6194a1ee187acd6606989f03769e8f7f |   41 |      0 |        2017 |
+------+----------------------------------+------+--------+-------------+
20 rows in set (0.01 sec)

隨機多條也是完美的,隨機核心就是用 RAND() 隨機出一個使用者uid,或則隨機區間,然後再進行limit 即可,此處基本闡述完了,歡迎大家批評指正。