django queryset的一些技术
相关url:
重点讲解filter,有相关例子: https://www.cnblogs.com/ywhyme/p/11720696.html
官网关于model自带的函数:
https://docs.djangoproject.com/en/2.2/ref/models/querysets/
https://docs.djangoproject.com/en/2.2/ref/models/expressions/
https://docs.djangoproject.com/en/2.2/ref/models/database-functions
综合内容: https://www.cnblogs.com/gaoya666/p/9005753.html
一、queryset.count() 和 len(queryset) 如何选择
Note: Don't use len()
on QuerySets if all you want to do is determine the number of records in the set. It's much more efficient to handle a count at the database level, using SQL's SELECT COUNT(*)
, and Django provides a count()
method for precisely this reason.
(most crucial) When you only want to know the number of elements and you do not plan to process them in any way it's crucial to use count()
:
DO: queryset.count()
- this will perform single SELECT COUNT(*) some_table
query, all computation is carried on RDBMS side, Python just needs to retrieve the result number with fixed cost of O(1)
DON'T: len(queryset)
- this will perform SELECT * FROM some_table
query, fetching whole table O(N) and requiring additional O(N) memory for storing it. This is the worst that can be done
简单说,如果只看长度,用count,如果循环取得数据,用len
二、官方接口只有bulk_create, 如何实现bulk_update呢?
Let's assume that we want to remove 2 upvotes from all the articles of the author with id 51
.
Doing this only with Python would execute N
queries (N
being the number of articles in the queryset):
for article in Article.objects.filter(author_id=51):
article.upvotes -= 2
article.save()
# Note that there is a race condition here but this is not the focus
# of this example.
What if instead of pulling all the articles into Python, looping over them, decreasing the upvotes, and saving each updated one back to the database, there was another way?
Using an F()
expression, can do it in one query:
Article.objects.filter(author_id=51).update(upvotes=F('upvotes') - 2)
Which can be translated in the following SQL query:
UPDATE app_article SET upvotes = upvotes - 2 WHERE author_id = 51
Why is this better?
- Instead of Python doing the work, we pass the load into the database which is fine tuned to make such queries.
- Effectively cuts down on the number of database queries needed to achieve the wanted result.
对某表进行全表更新或部分更新
Student.objects.update(noanzhi_type='10')
Student.objects.filter(name='ass').update(noanzhi_type='10')
三、 对查询列的substr进行匹配
部分文字匹配时候,我们经常用 startwith, endwith, in 等
User.objects.filter(name__startwith='li') # name like "%li"
User.objects.filter(name__in=['li','wang','liu'])
实际中,我们有时候用 select name from user where substr(name,1,3) = 'li' ,这时候需要如下:
from django.db.models.functions import Substr ,Left,Right
queryset = Model.objects.all()
queryset = queryset.annotate(name_first2=Substr('name', 1, 3))
name_first2 可以看成是新增加的一个column
然后进一步filter(mame_first2='li'
)
======实际例子====================这一句会报错: var = Model.objects.values('field1', 'field2'[:5], 'field3')
改写:
queryset = Model.objects.all() queryset = queryset.annotate(field2_5=Substr('field2', 1, 5))
values = queryset.values('field1', 'field2_5', 'field3')
======================
报错:User.objects.filter(charfield_2__startswith=Substr('charfield_1', 1, 3))
改写:
User.objects.annotate(f1=Substr('charfield_1', 1, 3), f2=Substr('charfield_2', 1, 3)).filter(f1=F('f2'))
==================
以上的left,substr等是用于queryset查询操作的函数,普通的字符串等用python方式处理,例如
截取身份证字段的部分值,在匹配查询,,需要先annoate生成新的field,用新field匹配。
Student.objects.annotate(f1=Substr('sfz',7,8)).filter(f1__lt='20050901').delete()
Student.objects.annotate(f1=Substr('sfz',7,8)).filter(f1__lt='20130901').count())
一部分区县没有学生,统计这部分区县的数量
#取得有学生的区县数量
Jg.objects.filter(level='6').filter(p_code__startswith='1301'[0:4]).filter(student__isnull=False).distinct().count()
#取得没学生的区县数量
Jg.objects.filter(level='6').filter(p_code__startswith='1301'[0:4]).filter(student__isnull=True).count()
users_without_reports = User.objects.filter(report__isnull=True) users_with_reports = User.objects.filter(report__isnull=False).distinct() When you use isnull=False, the distinct() is required to prevent duplicate results.
str = ’0123456789′ print str[0:3] #截取第一位到第三位的字符 print str[:] #截取字符串的全部字符 print str[6:] #截取第七个字符到结尾 print str[:-3] #截取从头开始到倒数第三个字符之前 print str[2] #截取第三个字符 print str[-1] #截取倒数第一个字符 print str[::-1] #创造一个与原字符串顺序相反的字符串 print str[-3:-1] #截取倒数第三位与倒数第一位之前的字符 print str[-3:] #截取倒数第三位到结尾 print str[:-5:-3] #逆序截取,具体啥意思没搞明白
输出结果:
012
0123456789
6789
0123456
2
9
9876543210
78
789
96
四、select xx where id !='xxxxxx'
部分文字匹配时候,我们经常用 startwith, endwith, in 等
https://stackoverflow.com/questions/687295/how-do-i-do-a-not-equal-in-django-queryset-filtering
from myapp.models import Entry from django.db.models import Q Entry.objects.filter(~Q(id = 3))
results = Model.objects.filter(x=5).exclude(a=true)
Q可以做很复杂的查询
Complex lookups with Q
objects¶
Keyword argument queries – in filter()
, etc. – are “AND”ed together. If you need to execute more complex queries (for example, queries with OR
statements), you can use Q objects
.
A Q object
(django.db.models.Q
) is an object used to encapsulate a collection of keyword arguments. These keyword arguments are specified as in “Field lookups” above.
For example, this Q
object encapsulates a single LIKE
query:
from django.db.models import Q
Q(question__startswith='What')
Q
objects can be combined using the &
and |
operators. When an operator is used on two Q
objects, it yields a new Q
object.
For example, this statement yields a single Q
object that represents the “OR” of two "question__startswith"
queries:
Q(question__startswith='Who') | Q(question__startswith='What')
This is equivalent to the following SQL WHERE
clause:
WHERE question LIKE 'Who%' OR question LIKE 'What%'
You can compose statements of arbitrary complexity by combining Q
objects with the &
and |
operators and use parenthetical grouping. Also, Q
objects can be negated using the ~
operator, allowing for combined lookups that combine both a normal query and a negated (NOT
) query:
Q(question__startswith='Who') | ~Q(pub_date__year=2005)
Each lookup function that takes keyword-arguments (e.g. filter()
, exclude()
, get()
) can also be passed one or more Q
objects as positional (not-named) arguments. If you provide multiple Q
object arguments to a lookup function, the arguments will be “AND”ed together. For example:
Poll.objects.get(
Q(question__startswith='Who'),
Q(pub_date=date(2005, 5, 2)) | Q(pub_date=date(2005, 5, 6))
)
… roughly translates into the SQL:
SELECT * from polls WHERE question LIKE 'Who%'
AND (pub_date = '2005-05-02' OR pub_date = '2005-05-06')
Lookup functions can mix the use of Q
objects and keyword arguments. All arguments provided to a lookup function (be they keyword arguments or Q
objects) are “AND”ed together. However, if a Q
object is provided, it must precede the definition of any keyword arguments. For example:
Poll.objects.get(
Q(pub_date=date(2005, 5, 2)) | Q(pub_date=date(2005, 5, 6)),
question__startswith='Who',
)
… would be a valid query, equivalent to the previous example; but:
# INVALID QUERY Poll.objects.get( question__startswith='Who', Q(pub_date=date(2005, 5, 2)) | Q(pub_date=date(2005, 5, 6)) )
四、query的优化
1、索引
Django ORM creates indices(索引) for primary and foreign keys out of the box. However, if you perform a lot of queries on the rest of the columns (the columns that are used in a filter()
method) it makes sense to add indexes to them. This can be accomplished by adding db_index=True
or unique=True
argument to a model field.
2、N+1 problem
一对多,多对一
If you do a lot of cross-table queries (loading data from multiple tables) you might end up with N+1 problem. N+1 problem is when ORM produces an additional query for each associated row you’re fetching. This can be solved by using select_related()
and prefetch_related()
methods when making queries.
Your understanding is mostly correct. You use select_related when the object that you're going to be selecting is a single object, so OneToOneField or a ForeignKey. You use prefetch_related when you're going to get a "set" of things, so ManyToManyFields as you stated or reverse ForeignKeys. Just to clarify what I mean by "reverse ForeignKeys" here's an example:
class ModelA(models.Model):#组织机构 pass class ModelB(models.Model):#人员
a=name jg = ForeignKey(ModelA)
ModelB.objects.select_related('a').all() # Forward ForeignKey relationship,取得某人员所在机构 ModelA.objects.prefetch_related('modelb_set').all() # Reverse ForeignKey relationship,取得某机构下所有人员
假定一个个人信息系统,需要记录系统中各个人的故乡、居住地、以及到过的城市。数据库设计如下:
from django.db import models class Province(models.Model): name = models.CharField(max_length=10) def __unicode__(self): return self.name class City(models.Model): name = models.CharField(max_length=5) province = models.ForeignKey(Province) def __unicode__(self): return self.name class Person(models.Model): firstname = models.CharField(max_length=10) lastname = models.CharField(max_length=10) visitation = models.ManyToManyField(City, related_name = "visitor") hometown = models.ForeignKey(City, related_name = "birth") living = models.ForeignKey(City, related_name = "citizen") def __unicode__(self): return self.firstname + self.lastname
获取家乡是湖北的人 参考:https://www.jb51.net/article/63284.htm
#无脑的差的做法
hb = Province.objects.get(name__iexact=u"湖北省") people = [] for city in hb.city_set.all(): people.extend(city.birth.all())
#好的做法:
hb = Province.objects.prefetch_related("city_set__birth").objects.get(name__iexact=u"湖北省") people = [] for city in hb.city_set.all(): people.extend(city.birth.all())
3、Extra columns
When a table has a lot of columns, loading every column on each query has a potential to ruin your performance. Leave out the columns you don’t need and fetch the ones that you’re going to use: using values()
or only()
ORM methods will solve this problem.
参考:https://docs.djangoproject.com/en/3.0/ref/models/querysets/#only
4 、foreignkey one to one key
========================
只是更新db,而不是为了更新model变量的话,用下面方法麻烦。
e = Entry.objects.get(id=10)
e.comments_on = False
e.save()
…do this: 用下面方法更好:
Entry.objects.filter(pub_date__year=2010).update(comments_on=False)
Entry.objects.filter(id=10).update(comments_on=False)
用update()方法,就不用save()了,而且最值了资源竞争,防止等待时间数据被改。
Using update()
also prevents a race condition wherein something might change in your database in the short period of time between loading the object and calling save()
.
Finally, realize that update()
does an update at the SQL level and, thus, does not call any save()
methods on your models, nor does it emit the pre_save
or post_save
signals (which are a consequence of calling Model.save()
). If you want to update a bunch of records for a model that has a custom save()
method, loop over them and call save()
, like this:
因为update直接保存,没有调用save,pre_save等, 如果你定制了save方法方法,为了使用save,就不能用update了,而是这样写:
for e in Entry.objects.filter(pub_date__year=2010):
e.comments_on = False
e.save()
>>> Entry.objects.filter(pub_date__year=2010).update(comments_on=False)
132
If you’re just updating a record and don’t need to do anything with the model object, the most efficient approach is to call update()
, rather than loading the model object into memory. For example, instead of doing this:
e = Entry.objects.get(id=10)
e.comments_on = False
e.save()
…do this:
Entry.objects.filter(id=10).update(comments_on=False)
Using update()
also prevents a race condition wherein something might change in your database in the short period of time between loading the object and calling save()
.
Finally, realize that update()
does an update at the SQL level and, thus, does not call any save()
methods on your models, nor does it emit the pre_save
or post_save
signals (which are a consequence of calling Model.save()
). If you want to update a bunch of records for a model that has a custom save()
method, loop over them and call save()
, like this:
for e in Entry.objects.filter(pub_date__year=2010):
e.comments_on = False
e.save()