SQL Server 2005 和自增长主键identity说再见——NEWSEQUENTIALID()(转载)
在SQL Server 2005环境下,表的主键应该怎样设计.
目前主要用到的主键方案共三种:
- 自动增长主键
- 手动增长主键
- UNIQUEIDENTIFIER主键
1、先说自动增长主键,它的优点是简单,类型支持bigint.但是它有致命的弱点:
当我们需要在多个数据库间进行数据的复制时(SQL Server的数据分发、订阅机制允许我们进行库间的数据复制操作),自动增长型字段可能造成数据合并时的主键冲突。设想一个数据库中的Order表向另一个库中的Order表复制数据库时,OrderID到底该不该自动增长呢?
2、再说手动增长主键,它的优点是自行定制主键列,主键列的数据类型乃至数据样本都可以控制,能够稳定的获得目标键值,不会重复.但是它维护成本比较搞,首先生成键值需要自行编写存储过程来产生,网络开销大,运行时还要考虑到并发冲突等等.
3、最后就是UNIQUEIDENTIFIER主键,它利用GUID作为键值,可以直接调用newid()来获得全局唯一标识,即便合并数据表也不会有重复现象.但是GUID有两个弱点:其一,和int类型比较,GUID长度是前者4倍.其二,用newid()获得的GUID毫无规律,因为该列作为主键,必然有聚集索引,那么在插入新数据时,将是一个非常耗时的操作.这样的话UNIQUEIDENTIFIER作为主键将大大有损效率.
所以SQL Server 2000环境下DBA们往往写一个存储过程来生成与时间有关的GUID,即在GUID前面加上生成时间.这样确保生成出来的主键全局唯一并且按时间递增.不过这又回到了第二种主键方案,不便维护.
4、SQL Server 2005已经解决了这个问题,使用的是NEWSEQUENTIALID()
这个函数产生的GUID是递增的,下面看下它的用法
--创建实验表 --1创建id列的类型为UNIQUEIDENTIFIER --2ROWGUIDCOL只是这个列的别名,一个表中只能有一个 --3PRIMARY KEY确定id为主键 --4使用DEFAULT约束来自动为该列添加GUID create table jobs ( id UNIQUEIDENTIFIER ROWGUIDCOL PRIMARY KEY NOT NULL CONSTRAINT [DF_jobs_id] DEFAULT (NEWSEQUENTIALID()), account varchar(64) not null, password varchar(64) not null ) go select * from jobs --添加实验数据 insert jobs (account,password) values ('tudou','123') insert jobs (account,password) values ('ntudou','123') insert jobs (account,password) values ('atudou','123') insert jobs (account,password) values ('btudou','123') insert jobs (account,password) values ('ctudou','123') select * from jobs
结果:
--使用identity的是我们可以通过Select @@IDENTITY取到新添加的id --使用UNIQUEIDENTIFIER怎么办呢? --采取手动增长的方法select NEWSEQUENTIALID()先取出id再添加 --不行,语法不支持 --可以通过下面的方法取到新添加数据的id --在ADO.NET中的用法和Select @@IDENTITY一样 DECLARE @outputTable TABLE(ID uniqueidentifier) INSERT INTO jobs(account, password) OUTPUT INSERTED.ID INTO @outputTable VALUES('dtudou', '123') SELECT ID FROM @outputTable --对比下数据 select * from jobs
结果:
--ROWGUIDCOL是主键列的别名,可以直接当做列名来使用 --这样可以忽略主键列的名称 insert jobs (account,password) values ('etudou','123') select ROWGUIDCOL from jobs
结果:
sqlserver如何自增字符串类型ID
sqlserver自增字符串类型ID,只需要使用navicat在字段默认值框中输入(replace(newid(),’-’,’’))搞定!!!!
I know that If I run this query
select top 100 * from mytable order by newid()
it will get 100 random records from my table.
However, I'm a bit confused as to how it works, since I don't see newid()
in the select
list. Can someone explain? Is there something special about newid()
here?
5 Answers
as MSDN says:
NewID() Creates a unique value of type uniqueidentifier.
and your table will be sorted by this random values.
-
1Thanks - I know what NewID() does, I'm just trying to understand how it would help in the random selection. Is it that [1] the select statement will select EVERYTHING from mytable, [2] for each row selected, tack on a uniqueidentifier generated by NewID(), [3] sort the rows by this uniqueidentifier and [4] pick off the top 100 from the sorted list? Feb 12, 2011 at 19:50
In general it works like this:
- All rows from mytable is "looped"
- NEWID() is executed for each row
- The rows are sorted according to random number from NEWID()
- 100 first row are selected
I know what NewID() does, I'm just trying to understand how it would help in the random selection. Is it that (1) the select statement will select EVERYTHING from mytable, (2) for each row selected, tack on a uniqueidentifier generated by NewID(), (3) sort the rows by this uniqueidentifier and (4) pick off the top 100 from the sorted list?
Yes. this is pretty much exactly correct (except it doesn't necessarily need to sort all the rows). You can verify this by looking at the actual execution plan.
SELECT TOP 100 *
FROM master..spt_values
ORDER BY NEWID()
The compute scalar operator adds the NEWID()
column on for each row (2506 in the table in my example query) then the rows in the table are sorted by this column with the top 100 selected.
SQL Server doesn't actually need to sort the entire set from positions 100 down so it uses a TOP N
sort operator which attempts to perform the entire sort operation in memory (for small values of N
)
-
Got it! And yes, you're right - once I've determined the top 100 rows from the entire set, there's no need to sort the rest. Feb 12, 2011 at 22:08
-
So, is it safe to ensure that no data is written? Since this is a SELECT query, the NEWID() will calculate a randomized identifier just for the query, it won't be updating anything in the database with this new id, right?– K09PMar 25, 2019 at 11:40
-
1Yes it won't affect the tables you are selecting from. At least some of the data will be temporarily written to a worktable in tempdb to hold at least the TOP N results but nothing written to the user database Mar 25, 2019 at 12:36
-
I don't see it mentioned anywhere else, but because of how this works, this is probably a terrible way of selecting random rows from a table. If you have a table with many rows (i.e. half a billion), do not run a query like that. If I ran this on a table with 500GB of data in it, I'd be in trouble. Better off using the built-in feature "TABLESAMPLE" that's meant for selecting random rows from data pages:
SELECT * FROM Person.Person TABLESAMPLE (10 PERCENT);
– TriynkoApr 28 at 23:35
use select top 100 randid = newid(), * from mytable order by randid
you will be clarified then..
I have an unimportant query which uses newId() and joins many tables. It returns about 10k rows in about 3 seconds. So, newId() might be ok in such cases where performance is not too bad & does not have a huge impact. But, newId() is bad for large tables.
Here is the explanation from Brent Ozar's blog - https://www.brentozar.com/archive/2018/03/get-random-row-large-table/.
From the above link, I have summarized the methods which you can use to generate a random id. You can read the blog for more details.
4 ways to get a random row from a large table:
- Method 1, Bad: ORDER BY NEWID() > Bad performance!
- Method 2, Better but Strange: TABLESAMPLE > Many gotchas & is not really random!
- Method 3, Best but Requires Code: Random Primary Key > Fastest, but won't work for negative numbers.
- Method 4, OFFSET-FETCH (2012+) > Only performs properly with a clustered index.
More on method 3: Get the top ID field in the table, generate a random number, and look for that ID. For top N rows, call the code below N times or generate N random numbers and use in an IN clause.
/* Get a random number smaller than the table's top ID */
DECLARE @rand BIGINT;
DECLARE @maxid INT = (SELECT MAX(Id) FROM dbo.Users);
SELECT @rand = ABS((CHECKSUM(NEWID()))) % @maxid;
/* Get the first row around that ID */
SELECT TOP 1 *
FROM dbo.Users AS u
WHERE u.Id >= @rand;
-
I find method 3 to be the fastest. But I have a problem when I get the top N. I ran it N times, but there will be several times where the data will overlap so it's not really random N values. Do you have any ideas?- Apr 27 at 9:37
南来地,北往的,上班的,下岗的,走过路过不要错过!
======================个性签名=====================
之前认为Apple 的iOS 设计的要比 Android 稳定,我错了吗?
下载的许多客户端程序/游戏程序,经常会Crash,是程序写的不好(内存泄漏?刚启动也会吗?)还是iOS本身的不稳定!!!
如果在Android手机中可以简单联接到ddms,就可以查看系统log,很容易看到程序为什么出错,在iPhone中如何得知呢?试试Organizer吧,分析一下Device logs,也许有用.
ORDER BY
clause do not need to appear in yourSELECT
clause in SQL Server.