随机取出n条记录:
Sql server:select top n * from 表 order by newid()
Access:Select top n * FROM 表 orDER BY Rnd(id)
mysql:Select * From 表 order By rand() Limit n
select * from youtab where mod ( rowid , 3 ) = 0
建议把MSSQL的联机丛书中的 Transact-SQL 参考大致看一遍,很多东西就心中有数了
mysql中随机提取数据库记录
-------------------------------------------------------------------------------
select * from tablename order by rand() limit 10
select * from tablename order by rand() limit 10
sqlserver中随机提取数据库记录
--------------------------------------------------------------------------------
select top 10 * from tablename order by NEWID()
select top 10 * from tablename order by NEWID()
Access中随机提取数据库记录
-------------------------------------------------------------------------------
SELECT top 10 * FROM tablename ORDER BY Rnd(FId)
SELECT top 10 * FROM tablename ORDER BY Rnd(FId)
FId:为你当前表的ID字段名
==========================================================================================
▲随机查看前N条记录(随机读取表内容)
select * from (select * from tb_phone_no order by sys_guid())
where rownum < 10;
SELECT * FROM (SELECT * FROM chifan ORDER BY dbms_random.random) WHERE ROWNUM<=5
SQL> SELECT * FROM (SELECT * FROM A SAMPLE(0.01)) WHERE ROWNUM<=1;
DT
-----------------
20050105 14:59:52
SQL> SELECT * FROM (SELECT * FROM A SAMPLE(0.01)) WHERE ROWNUM<=1;
DT
-----------------
20050306 00:43:05
SQL> SELECT * FROM (SELECT * FROM A SAMPLE(0.01)) WHERE ROWNUM<=1;
DT
-----------------
20050406 02:59:50
注意每次取得的值都不同。
SAMPLE 是随机抽样,后面的数值是采样百分比。
我的测试表A是10万条,所以取0.01% 也就是 万分之一,这样返回记录数大约10条。
对于你的情况,你可以根据数据量来控制采样百分比。
大数据集表随机取数据
select *
from (select *
from table_name sample(10)
order by trunc(dbms_random.value(0, 1000)))
where rownum = 1;
sample(10):含义为检索表中的10%数据
从Oracle8i开始Oracle提供采样表扫描特性
Oracle访问数据的基本方法有:
1.全表扫描
2.采样表扫描
全表扫描(Full table Scan)
全表扫描返回表中所有的记录。
执行全表扫描,Oracle读表中的所有记录,考查每一行是否满足WHERE条件。Oracle顺序的读分配给该表的每一个数据块,这样全表扫描能够受益于多块读.
每个数据块Oracle只读一次.
采样表扫描(sample table scan)
采样表扫描返回表中随机采样数据。
这种访问方式需要在FROM语句中包含SAMPLE选项或者SAMPLE BLOCK选项.
SAMPLE选项:
当按行采样来执行一个采样表扫描时,Oracle从表中读取特定百分比的记录,并判断是否满足WHERE子句以返回结果。
SAMPLE BLOCK选项:
使用此选项时,Oracle读取特定百分比的BLOCK,考查结果集是否满足WHERE条件以返回满足条件的纪录.
Sample_Percent:
Sample_Percent是一个数字,定义结果集中包含记录占总记录数量的百分比。
Sample值应该在[0.000001,99.999999]之间。
1.使用SAMPLE选项
SQL> select * from employee SAMPLE(30);
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
7369 SMITH CLERK 7902 17-DEC-80 800 20
7788 SCOTT ANALYST 7566 19-APR-87 3000 20
7839 KING PRESIDENT 17-NOV-81 5000 10
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=25 Bytes=2175)
1 0 TABLE ACCESS (SAMPLE) OF 'EMPLOYEE' (Cost=2 Card=25 Bytes=2175)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
5 consistent gets
0 physical reads
0 redo size
880 bytes sent via SQL*Net to client
503 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
3 rows processed
SQL> select * from employee SAMPLE(20);
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
7654 MARTIN SALESMAN 7698 28-SEP-81 1250 1400 30
7844 TURNER SALESMAN 7698 08-SEP-81 1500 0 30
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=16 Bytes=1392)
1 0 TABLE ACCESS (SAMPLE) OF 'EMPLOYEE' (Cost=2 Card=16 Bytes=1392)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
5 consistent gets
0 physical reads
0 redo size
839 bytes sent via SQL*Net to client
503 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2 rows processed
|
2.使用SAMPLE BLOCK选项
SQL> SELECT * FROM employee SAMPLE BLOCK (50);
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
7369 SMITH CLERK 7902 17-DEC-80 800 20
7499 ALLEN SALESMAN 7698 20-FEB-81 1600 300 30
7521 WARD SALESMAN 7698 22-FEB-81 1250 500 30
7566 JONES MANAGER 7839 02-APR-81 2975 20
7654 MARTIN SALESMAN 7698 28-SEP-81 1250 1400 30
7698 BLAKE MANAGER 7839 01-MAY-81 2850 30
7782 CLARK MANAGER 7839 09-JUN-81 2450 10
7788 SCOTT ANALYST 7566 19-APR-87 3000 20
7839 KING PRESIDENT 17-NOV-81 5000 10
7844 TURNER SALESMAN 7698 08-SEP-81 1500 0 30
10 rows selected.
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=41 Bytes=3567)
1 0 TABLE ACCESS (SAMPLE) OF 'EMPLOYEE' (Cost=2 Card=41 Bytes=3567)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
4 consistent gets
0 physical reads
0 redo size
1162 bytes sent via SQL*Net to client
503 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
10 rows processed
SQL>
|
3.采样前n条记录的查询
也可以使用dbms_random包实现
SQL> select * from (
2 select * from employee
3 order by dbms_random.value )
4 where rownum <= 4;
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
7654 MARTIN SALESMAN 7698 28-SEP-81 1250 1400 30
7839 KING PRESIDENT 17-NOV-81 5000 10
7369 SMITH CLERK 7902 17-DEC-80 800 20
7788 SCOTT ANALYST 7566 19-APR-87 3000 20
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 COUNT (STOPKEY)
2 1 VIEW
3 2 SORT (ORDER BY STOPKEY)
4 3 TABLE ACCESS (FULL) OF 'EMPLOYEE'
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
3 consistent gets
0 physical reads
0 redo size
927 bytes sent via SQL*Net to client
503 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
4 rows processed
|
对比一下SAMPLE选项
SQL> SELECT * FROM employee SAMPLE (40);
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
7499 ALLEN SALESMAN 7698 20-FEB-81 1600 300 30
7521 WARD SALESMAN 7698 22-FEB-81 1250 500 30
7698 BLAKE MANAGER 7839 01-MAY-81 2850 30
7839 KING PRESIDENT 17-NOV-81 5000 10
7844 TURNER SALESMAN 7698 08-SEP-81 1500 0 30
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=33 Bytes=2871)
1 0 TABLE ACCESS (SAMPLE) OF 'EMPLOYEE' (Cost=2 Card=33 Bytes=2871)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
5 consistent gets
0 physical reads
0 redo size
961 bytes sent via SQL*Net to client
503 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
5 rows processed
SQL>
|
主要注意以下几点:
1.sample只对单表生效,不能用于表连接和远程表
2.sample会使SQL自动使用CBO
当我们获取数据时,可能会有这样的需求,即每次从表中获取数据时,是随机获取一定的记录,而不是每次都获取一样的数据,这时我们可以采取Oracle内部一些函数,来达到这样的目的.
1) select * from (select * from tablename order by sys_guid()) where rownum < N;
2) select * from (select * from tablename order by dbms_random.value) where rownum< N;
3) select * from (select * from table_name sample(10) order by trunc(dbms_random.value(0, 1000))) where rownum < N;
说明: sample(10)含义为检索表中的10%数据,sample值应该在[0.000001,99.999999]之间.
其中 sys_guid() 和 dbms_random.value都是内部函数,通过这样的方法,就可以实现我们的需求了.
注:
在使1)方法时,即使用sys_guid() 这种方法时,有时会获取到相同的记录,即和前一次查询的结果集是一样的,我查找了相关资料,有些说是和操作系统有关,在windows平台下正常,获取到的数据是随机的,而在linux等平台下始终是相同不变的数据集,有些说是因为sys_guid()函数本身的问题,即sys_guid()会在查询上生成一个16字节的全局唯一标识符,这个标识符在绝大部分平台上由一个宿主标识符和进程或进程的线程标识符组成,这就是说,它很可能是随机的,但是并不表示一定是百分之百的这样.
所以,为确保在不同的平台每次读取的数据都是随机的,我们大多采用2)和3)两种方案,其中2)方案更常用.3)方案缩小了查询的范围,在查询大表,且要提取数据不是很不多的情况下,会对查询速度上有一定的提高,
另:在Oracle中一般获取随机数的方法是:
select trunc(dbms_random.value(0, 1000)) from dual; (0-1000的整数)
select dbms_random.value(0, 1000) from dual; (0-1000的浮点数)