Celery 分布式任务队列快速入门

Celery介绍和基本使用

在项目中如何使用celery

启用多个workers

Celery 定时任务

与django结合

通过django配置celery periodic task

 

 

一、Celery介绍和基本使用

Celery 是一个 基于python开发的分布式异步消息任务队列,通过它可以轻松的实现任务的异步处理, 如果你的业务场景中需要用到异步任务,就可以考虑使用celery, 举几个实例场景中可用的例子:

  1. 你想对100台机器执行一条批量命令,可能会花很长时间 ,但你不想让你的程序等着结果返回,而是给你返回 一个任务ID,你过一段时间只需要拿着这个任务id就可以拿到任务执行结果, 在任务执行ing进行时,你可以继续做其它的事情。 
  2. 你想做一个定时任务,比如每天检测一下你们所有客户的资料,如果发现今天 是客户的生日,就给他发个短信祝福

 

Celery 在执行任务时需要通过一个消息中间件来接收和发送任务消息,以及存储任务结果, 一般使用rabbitMQ or Redis

1.1 Celery有以下优点:

  1. 简单:一但熟悉了celery的工作流程后,配置和使用还是比较简单的
  2. 高可用:当任务执行失败或执行过程中发生连接中断,celery 会自动尝试重新执行任务
  3. 快速:一个单进程的celery每分钟可处理上百万个任务
  4. 灵活: 几乎celery的各个组件都可以被扩展及自定制

Celery基本工作流程图

1.2 Celery安装使用

Celery的默认broker是RabbitMQ, 仅需配置一行就可以

broker_url = 'amqp://guest:guest@localhost:5672//'

使用Redis做broker也可以

安装redis组件

$ pip3 install -U "celery[redis]"

配置

Configuration is easy, just configure the location of your Redis database:

app.conf.broker_url = 'redis://localhost:6379/0'

Where the URL is in the format of:

redis://:password@hostname:port/db_number

all fields after the scheme are optional, and will default to localhost on port 6379, using database 0.

 

 

如果想获取每个任务的执行结果,还需要配置一下把任务结果存在哪

If you also want to store the state and return values of tasks in Redis, you should configure these settings:

app.conf.result_backend = 'redis://localhost:6379/0'

 

1. 3 开始使用Celery啦  

安装celery模块

$ pip install celery

创建一个celery application 用来定义你的任务列表

创建一个任务文件就叫tasks.py吧

from celery import Celery

app = Celery(
    'tasks',
    broker='redis://localhost',
    backend='redis//localhost'
)


@app.task
def add(x, y):
    print('running add', x, y)
    return x + y


@app.task
def test(x, y):
    print('running test', x, y)
    return (x, y)

启动celery worker来监听并执行任务

dandy@ubuntu01:~$ celery -A task worker -l debug

调用任务

再打开一个终端, 进行命令行模式,调用任务  

dandy@ubuntu01:~$ python3
Python 3.5.2 (default, Nov 23 2017, 16:37:01) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tasks
>>> tasks.add.delay(1,4)
<AsyncResult: 6d533f00-9a33-4683-b00d-c06bb85a5a3f>
>>> t = tasks.add.delay(1,4)
>>> t.get()  # 同步拿结果
5
>>> 

首先,tasks.py文件是放在虚拟机的~目录,也是在桌面启动的python3

看你的worker终端会显示收到 一个任务,想看任务结果的话,需要在调用 任务时 赋值个变量

>>> type(t)
<class 'celery.result.AsyncResult'>

ready方法是用来返回这个进程是否结束的结果的:

可以通过在tasks.py里面设置time sleep来延长时间测试

>>> result.ready() # 检查任务是否完成
False

你可以定义等待结果的执行完成时间,但是这是很少使用的因为它将异步调用变成了同步调用

>>> result.get(timeout=1)
8

假设遇到任务异常,get方法会重新引起异常,但是你可以重写这个通过指定的传播参数

>>> result.get(propagate=False)  # 如果出错,获取错误结果,不触发异常

如果任务引起异常,你也可以获得接口访问原始回溯(相当于result.get(propagate=False)的详细异常报错):

>>> result.traceback  # 打印异常详细结果

 

在项目中如何使用celery 

可以把celery配置成一个应用

目录格式如下:

celery_pro-----
               |---- celery.py
               |---- tasks.py
         |---- tasks2.py

celery.py:

from __future__ import absolute_import, unicode_literals  # 因为这个文件也叫了celery防止导入本身的错误,需要这样申明一下,表示从安装目录导入
from celery import Celery
# from .celery import Celery  # 表示从当前目录导入
 
app = Celery('proj',
             broker='redis://localhost',
             backend='redis://localhost',
             include=['celery_pro.tasks', 'celery_pro.tasks2'])  # 可以导入多个文件
 
# Optional configuration, see the application user guide.
app.conf.update(
    result_expires=3600,
)
 
if __name__ == '__main__':
    app.start()

tasks.py

from __future__ import absolute_import, unicode_literals
from .celery import app


@app.task
def add(x, y):
    return x + y


@app.task
def mul(x, y):
    return x * y


@app.task
def xsum(numbers):
    return sum(numbers)

tasks2.py

from __future__ import absolute_import, unicode_literals
from .celery import app
import time, random

@app.task
def randnum(start, end):
    time.sleep(5)
    return random.randint(start, end)

启动worker

dandy@ubuntu01:~$ celery -A celery_pro worker -l info

使用:

dandy@ubuntu01:~$ python3 
Python 3.5.2 (default, Nov 23 2017, 16:37:01) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from celery_pro import tasks, tasks2
>>> t = tasks.add.delay(3,4)
>>> tt = tasks2.randnum.delay(1, 1000)
2018-07-09 22:20:29,454: INFO/ForkPoolWorker-1] Task celery_pro.tasks.add[2412ac1f-351f-4af5-80ed-2bef879aff1b] succeeded in 0.004707950998636079s: 7
[2018-07-09 22:21:26,361: INFO/MainProcess] Received task: celery_pro.tasks2.randnum[334f2d69-d3e3-4fbd-b6f6-a96463d90456]  
[2018-07-09 22:21:31,368: INFO/ForkPoolWorker-1] Task celery_pro.tasks2.randnum[334f2d69-d3e3-4fbd-b6f6-a96463d90456] succeeded in 5.006172604000312s: 585

关于分布式:

首先可以启动两个mac 终端的terminal,分别用

dandy@ubuntu01:~$ celery -A celery_pro worker -l info

起celery worker的服务。

此时,直接在另外一个terminal里面启动python,一直调用服务:

tt = tasks2.randnum.delay(1, 1000)
tt = tasks2.randnum.delay(1, 1000)
tt = tasks2.randnum.delay(1, 1000)
tt = tasks2.randnum.delay(1, 1000)
tt = tasks2.randnum.delay(1, 1000)
tt = tasks2.randnum.delay(1, 1000)
tt = tasks2.randnum.delay(1, 1000)

就可以看到worker的两个终端都会执行的。。

现在把两个终端关闭,输入以下命令,可以发现celery worker都已经随着终端关闭而停止:

dandy@ubuntu01:~$ ps -ef | grep celery
dandy     12193  12030  0 22:28 pts/1    00:00:00 grep --color=auto celery

 

如何后台启动celery不关闭?

dandy@ubuntu01:~$ celery multi start w1 -A celery_pro -l info
celery multi v4.2.0 (windowlicker)
> Starting nodes...
    > w1@ubuntu01: OK

dandy@ubuntu01:~$ celery multi start w2 -A celery_pro -l info
celery multi v4.2.0 (windowlicker)
> Starting nodes...
    > w2@ubuntu01: OK

dandy@ubuntu01:~$ celery multi start w3 -A celery_pro -l info
celery multi v4.2.0 (windowlicker)
> Starting nodes...
    > w3@ubuntu01: OK
dandy@ubuntu01:~$ ps -ef | grep celery
dandy     12859      1  0 10:13 ?        00:00:00 /usr/bin/python3 -m celery worker -A celery_pro -l info --logfile=w1%I.log --pidfile=w1.pid --hostname=w1@ubuntu01
dandy     12863  12859  0 10:13 ?        00:00:00 /usr/bin/python3 -m celery worker -A celery_pro -l info --logfile=w1%I.log --pidfile=w1.pid --hostname=w1@ubuntu01
dandy     12864  12859  0 10:13 ?        00:00:00 /usr/bin/python3 -m celery worker -A celery_pro -l info --logfile=w1%I.log --pidfile=w1.pid --hostname=w1@ubuntu01
dandy     12890      1  0 10:14 ?        00:00:00 /usr/bin/python3 -m celery worker -l info -A celery_pro --logfile=w2%I.log --pidfile=w2.pid --hostname=w2@ubuntu01
dandy     12894  12890  0 10:14 ?        00:00:00 /usr/bin/python3 -m celery worker -l info -A celery_pro --logfile=w2%I.log --pidfile=w2.pid --hostname=w2@ubuntu01
dandy     12895  12890  0 10:14 ?        00:00:00 /usr/bin/python3 -m celery worker -l info -A celery_pro --logfile=w2%I.log --pidfile=w2.pid --hostname=w2@ubuntu01
dandy     12909      1  0 10:14 ?        00:00:00 /usr/bin/python3 -m celery worker -A celery_pro -l info --logfile=w3%I.log --pidfile=w3.pid --hostname=w3@ubuntu01
dandy     12913  12909  0 10:14 ?        00:00:00 /usr/bin/python3 -m celery worker -A celery_pro -l info --logfile=w3%I.log --pidfile=w3.pid --hostname=w3@ubuntu01
dandy     12914  12909  0 10:14 ?        00:00:00 /usr/bin/python3 -m celery worker -A celery_pro -l info --logfile=w3%I.log --pidfile=w3.pid --hostname=w3@ubuntu01
dandy     13002  12964  0 10:17 pts/2    00:00:00 grep --color=auto celery

重启celery:

dandy@ubuntu01:~$ celery multi restart w1 w2 -A celery_pro
celery multi v4.2.0 (windowlicker)
> Stopping nodes...
    > w2@ubuntu01: TERM -> 13111
    > w1@ubuntu01: TERM -> 13102
> Waiting for 2 nodes -> 13111, 13102......
    > w2@ubuntu01: OK
> Restarting node w2@ubuntu01: OK
> Waiting for 2 nodes -> None, None....
    > w1@ubuntu01: OK
> Restarting node w1@ubuntu01: OK
> Waiting for 1 node -> None...

停止celery:

dandy@ubuntu01:~$ celery multi stop w1 w2 -A celery_pro
celery multi v4.2.0 (windowlicker)
> Stopping nodes...
    > w1@ubuntu01: TERM -> 13141
    > w2@ubuntu01: TERM -> 13130

The stop command is asynchronous so it won’t wait for the worker to shutdown. You’ll probably want to use the stopwait command instead, this ensures all currently executing tasks is completed before exiting:

$ celery multi stopwait w1 -A proj -l info

查看celery日志

dandy@ubuntu01:~$ ls
celery_pro   Starting  w1.log    w2.log        w3-2.log  w3@ubuntu01:
dump.rdb     w1-1.log  w2-1.log  w2@ubuntu01:  w3.log
__pycache__  w1-2.log  w2-2.log  w3-1.log      w3.pid
dandy@ubuntu01:~$ tail -f w1.log  # 监视文件默认后10行
[2018-07-10 10:13:18,551: INFO/MainProcess] Connected to redis://localhost:6379//
[2018-07-10 10:13:18,558: INFO/MainProcess] mingle: searching for neighbors
[2018-07-10 10:13:19,571: INFO/MainProcess] mingle: all alone
[2018-07-10 10:13:19,578: INFO/MainProcess] w1@ubuntu01 ready.
[2018-07-10 10:13:19,767: INFO/MainProcess] Received task: celery_pro.tasks2.randnum[a13b0bb8-4d71-448f-9ca1-253094518376]  
[2018-07-10 10:14:26,281: INFO/MainProcess] sync with w2@ubuntu01
[2018-07-10 10:14:46,337: INFO/MainProcess] sync with w3@ubuntu01

 

 

Celery 定时任务

celery支持定时任务,设定好任务的执行时间,celery就会定时自动帮你执行, 这个定时任务模块叫celery beat 

写一个脚本 叫periodic_task.py,放在celery_pro文件夹内:
 
from __future__ import absolute_import, unicode_literals
from .celery import app
from celery.schedules import crontab


@app.on_after_configure.connect
def setup_periodic_tasks(sender, **kwargs):
    # Calls test('hello') every 10 seconds.
    sender.add_periodic_task(10.0, test.s('hello'), name='add every 10')  # add_periodic_task 添加定时任务

    # Calls test('world') every 30 seconds
    sender.add_periodic_task(30.0, test.s('world'), expires=10)

    # Executes every Monday morning at 7:30 a.m.
    sender.add_periodic_task(
        crontab(hour=7, minute=30, day_of_week=1),
        test.s('Happy Mondays!'),
    )

@app.task
def test(arg):
    print(arg)

上面是通过调用函数添加定时任务,也可以像写配置文件 一样的形式添加, 下面是每30s执行的任务

app.conf.beat_schedule = {
    'add-every-30-seconds': {
        'task': 'tasks.add',
        'schedule': 30.0,
        'args': (16, 16)
    },
}
app.conf.timezone = 'UTC'
app = Celery('celery_pro',
             broker='redis://localhost',
             backend='redis://localhost',
             include=['celery_pro.tasks', 'celery_pro.tasks2', 'celery_pro.periodic_task'])

# Optional configuration, see the application user guide.
app.conf.update(
    result_expires=3600,
)

app.conf.beat_schedule = {
    'add-every-30-seconds': {
        'task': 'tasks.add',
        'schedule': 5.0,
        'args': (16, 16)
    },
}
app.conf.timezone = 'UTC'
 
if __name__ == '__main__':
    app.start()

 

任务添加好了,需要让celery单独启动一个进程来定时发起这些任务, 注意, 这里是发起任务,不是执行,这个进程只会不断的去检查你的任务计划, 每发现有任务需要执行了,就发起一个任务调用消息,交给celery worker去执行。

这里之前定义了一个include,需要把文件添加进去。

from __future__ import absolute_import, unicode_literals
from celery import Celery

app = Celery('celery_pro',
             broker='redis://localhost',
             backend='redis://localhost',
             include=['celery_pro.tasks', 'celery_pro.tasks2', 'celery_pro.periodic_task'])  # 这里把文件添加进来
 
# Optional configuration, see the application user guide.
app.conf.update(
    result_expires=3600,
)
 
if __name__ == '__main__':
    app.start()

 

 

启动celery

dandy@ubuntu01:~$ celery -A celery_pro worker -l debug  # 注意路径

celery:

tasks]
  . celery.accumulate
  . celery.backend_cleanup
  . celery.chain
  . celery.chord
  . celery.chord_unlock
  . celery.chunks
  . celery.group
  . celery.map
  . celery.starmap
  . celery_pro.periodic_task.test
  . celery_pro.tasks.add
  . celery_pro.tasks.mul
  . celery_pro.tasks.xsum
  . celery_pro.tasks2.randnum

[2018-07-10 11:33:58,066: DEBUG/MainProcess] | Worker: Starting Hub
[2018-07-10 11:33:58,066: DEBUG/MainProcess] ^-- substep ok
[2018-07-10 11:33:58,066: DEBUG/MainProcess] | Worker: Starting Pool
[2018-07-10 11:33:58,100: DEBUG/MainProcess] ^-- substep ok
[2018-07-10 11:33:58,100: DEBUG/MainProcess] | Worker: Starting Consumer
[2018-07-10 11:33:58,101: DEBUG/MainProcess] | Consumer: Starting Connection
[2018-07-10 11:33:58,111: INFO/MainProcess] Connected to redis://localhost:6379//
[2018-07-10 11:33:58,112: DEBUG/MainProcess] ^-- substep ok
[2018-07-10 11:33:58,112: DEBUG/MainProcess] | Consumer: Starting Events
[2018-07-10 11:33:58,120: DEBUG/MainProcess] ^-- substep ok
[2018-07-10 11:33:58,120: DEBUG/MainProcess] | Consumer: Starting Mingle
[2018-07-10 11:33:58,120: INFO/MainProcess] mingle: searching for neighbors
[2018-07-10 11:33:59,138: INFO/MainProcess] mingle: all alone
[2018-07-10 11:33:59,139: DEBUG/MainProcess] ^-- substep ok
[2018-07-10 11:33:59,140: DEBUG/MainProcess] | Consumer: Starting Gossip
[2018-07-10 11:33:59,144: DEBUG/MainProcess] ^-- substep ok
[2018-07-10 11:33:59,145: DEBUG/MainProcess] | Consumer: Starting Tasks
[2018-07-10 11:33:59,148: DEBUG/MainProcess] ^-- substep ok
[2018-07-10 11:33:59,148: DEBUG/MainProcess] | Consumer: Starting Control
[2018-07-10 11:33:59,150: DEBUG/MainProcess] ^-- substep ok
[2018-07-10 11:33:59,151: DEBUG/MainProcess] | Consumer: Starting Heart
[2018-07-10 11:33:59,152: DEBUG/MainProcess] ^-- substep ok
[2018-07-10 11:33:59,153: DEBUG/MainProcess] | Consumer: Starting event loop
[2018-07-10 11:33:59,153: DEBUG/MainProcess] | Worker: Hub.register Pool...
[2018-07-10 11:33:59,154: INFO/MainProcess] celery@ubuntu01 ready.
[2018-07-10 11:33:59,155: DEBUG/MainProcess] basic.qos: prefetch_count->8

此时,已经启动了celery来准备好执行任务,也就是代表worker已经就绪,根据之前的图,我们是知道worker是负责分布式处理的。

正如一开始,我们是自己调用函数执行worker,这里需要beat来进行任务调度。

 

启动任务调度器 celery beat

dandy@ubuntu01:~$ celery -A celery_pro.periodic_task beat -l debug

来看下任务执行:

# beat
celery beat v4.2.0 (windowlicker) is starting.
__    -    ... __   -        _
LocalTime -> 2018-07-10 11:46:48
Configuration ->
    . broker -> redis://localhost:6379//
    . loader -> celery.loaders.app.AppLoader
    . scheduler -> celery.beat.PersistentScheduler
    . db -> celerybeat-schedule
    . logfile -> [stderr]@%DEBUG
    . maxinterval -> 5.00 minutes (300s)
[2018-07-10 11:46:48,257: DEBUG/MainProcess] Setting default socket timeout to 30
[2018-07-10 11:46:48,258: INFO/MainProcess] beat: Starting...
[2018-07-10 11:46:48,274: DEBUG/MainProcess] Current schedule:
<ScheduleEntry: add every 10 celery_pro.periodic_task.test('hello') <freq: 10.00 seconds>
<ScheduleEntry: celery_pro.periodic_task.test('world') celery_pro.periodic_task.test('world') <freq: 30.00 seconds>
<ScheduleEntry: celery_pro.periodic_task.test('Happy Mondays!') celery_pro.periodic_task.test('Happy Mondays!') <crontab: 30 7 1 * * (m/h/d/dM/MY)>
[2018-07-10 11:46:48,274: DEBUG/MainProcess] beat: Ticking with max interval->5.00 minutes
[2018-07-10 11:46:48,276: DEBUG/MainProcess] beat: Waking up in 9.98 seconds.
[2018-07-10 11:46:58,268: DEBUG/MainProcess] beat: Synchronizing schedule...
[2018-07-10 11:46:58,279: INFO/MainProcess] Scheduler: Sending due task add every 10 (celery_pro.periodic_task.test)
[2018-07-10 11:46:58,289: DEBUG/MainProcess] celery_pro.periodic_task.test sent. id->6d482eb0-5407-4e23-9307-bef9ca773ff7
[2018-07-10 11:46:58,289: DEBUG/MainProcess] beat: Waking up in 9.97 seconds.
[2018-07-10 11:47:08,272: INFO/MainProcess] Scheduler: Sending due task add every 10 (celery_pro.periodic_task.test)
[2018-07-10 11:47:08,274: DEBUG/MainProcess] celery_pro.periodic_task.test sent. id->9ae91cb8-b931-4a5c-b0fb-2c635495ae1a
[2018-07-10 11:47:08,275: DEBUG/MainProcess] beat: Waking up in 9.98 seconds.
# worker
[2018-07-10 11:47:48,287: WARNING/ForkPoolWorker-2] hello
[2018-07-10 11:47:48,287: DEBUG/MainProcess] Task accepted: celery_pro.periodic_task.test[f7cae8d2-e9cb-44a8-a4fb-e0cfdb04ee6b] pid:14113
[2018-07-10 11:47:48,288: WARNING/ForkPoolWorker-1] world
[2018-07-10 11:47:48,292: INFO/ForkPoolWorker-2] Task celery_pro.periodic_task.test[f7cae8d2-e9cb-44a8-a4fb-e0cfdb04ee6b] succeeded in 0.004459112999029458s: None
[2018-07-10 11:47:48,294: DEBUG/MainProcess] Task accepted: celery_pro.periodic_task.test[d7f6aff8-bd3f-4054-b2f2-70e45798d01e] pid:14112
[2018-07-10 11:47:48,296: INFO/ForkPoolWorker-1] Task celery_pro.periodic_task.test[d7f6aff8-bd3f-4054-b2f2-70e45798d01e] succeeded in 0.007812611998815555s: None
[2018-07-10 11:47:58,285: INFO/MainProcess] Received task: celery_pro.periodic_task.test[dd3365c0-33d8-429f-a7a5-825b1a70de64]  
[2018-07-10 11:47:58,285: DEBUG/MainProcess] TaskPool: Apply <function _fast_trace_task at 0x7f0689418b70> (args:('celery_pro.periodic_task.test', 'dd3365c0-33d8-429f-a7a5-825b1a70de64', {'origin': 'gen14371@ubuntu01', 'lang': 'py', 'correlation_id': 'dd3365c0-33d8-429f-a7a5-825b1a70de64', 'group': None, 'kwargsrepr': '{}', 'expires': None, 'parent_id': None, 'id': 'dd3365c0-33d8-429f-a7a5-825b1a70de64', 'eta': None, 'shadow': None, 'delivery_info': {'exchange': '', 'priority': 0, 'redelivered': None, 'routing_key': 'celery'}, 'reply_to': '1ba9e463-48e0-38ed-bbd3-3bf5498de62d', 'argsrepr': "('hello',)", 'retries': 0, 'root_id': 'dd3365c0-33d8-429f-a7a5-825b1a70de64', 'task': 'celery_pro.periodic_task.test', 'timelimit': [None, None]}, b'[["hello"], {}, {"callbacks": null, "errbacks": null, "chain": null, "chord": null}]', 'application/json', 'utf-8') kwargs:{})
[2018-07-10 11:47:58,287: WARNING/ForkPoolWorker-2] hello
[2018-07-10 11:47:58,287: INFO/ForkPoolWorker-2] Task celery_pro.periodic_task.test[dd3365c0-33d8-429f-a7a5-825b1a70de64] succeeded in 0.0007550369991804473s: None
[2018-07-10 11:47:58,288: DEBUG/MainProcess] Task accepted: celery_pro.periodic_task.test[dd3365c0-33d8-429f-a7a5-825b1a70de64] pid:14113
[2018-07-10 11:48:08,287: INFO/MainProcess] Received task: celery_pro.periodic_task.test[a84ecf0f-89cc-40d0-bacb-e9d359602220]  
[2018-07-10 11:48:08,287: DEBUG/MainProcess] TaskPool: Apply <function _fast_trace_task at 0x7f0689418b70> (args:('celery_pro.periodic_task.test', 'a84ecf0f-89cc-40d0-bacb-e9d359602220', {'origin': 'gen14371@ubuntu01', 'lang': 'py', 'correlation_id': 'a84ecf0f-89cc-40d0-bacb-e9d359602220', 'group': None, 'kwargsrepr': '{}', 'expires': None, 'parent_id': None, 'id': 'a84ecf0f-89cc-40d0-bacb-e9d359602220', 'eta': None, 'shadow': None, 'delivery_info': {'exchange': '', 'priority': 0, 'redelivered': None, 'routing_key': 'celery'}, 'reply_to': '1ba9e463-48e0-38ed-bbd3-3bf5498de62d', 'argsrepr': "('hello',)", 'retries': 0, 'root_id': 'a84ecf0f-89cc-40d0-bacb-e9d359602220', 'task': 'celery_pro.periodic_task.test', 'timelimit': [None, None]}, b'[["hello"], {}, {"callbacks": null, "errbacks": null, "chain": null, "chord": null}]', 'application/json', 'utf-8') kwargs:{})
[2018-07-10 11:48:08,288: DEBUG/MainProcess] Task accepted: celery_pro.periodic_task.test[a84ecf0f-89cc-40d0-bacb-e9d359602220] pid:14113

 

此时观察worker的输出,是不是每隔一小会,就会执行一次定时任务。

注意:Beat needs to store the last run times of the tasks in a local database file (named celerybeat-schedule by default), so it needs access to write in the current directory, or alternatively you can specify a custom location for this file:

$ celery -A periodic_task beat -s /home/celery/var/run/celerybeat-schedule

更复杂的定时配置  

上面的定时任务比较简单,只是每多少s执行一个任务,但如果你想要每周一三五的早上8点给你发邮件怎么办呢?哈,其实也简单,用crontab功能,跟linux自带的crontab功能是一样的,可以个性化定制任务执行时间

linux crontab http://www.cnblogs.com/peida/archive/2013/01/08/2850483.html 

from celery.schedules import crontab
 
app.conf.beat_schedule = {
    # Executes every Monday morning at 7:30 a.m.
    'add-every-monday-morning': {
        'task': 'tasks.add',
        'schedule': crontab(hour=7, minute=30, day_of_week=1),
        'args': (16, 16),
    },
}

上面的这条意思是每周1的早上7.30执行tasks.add任务

关于crontab的定时:

Example Meaning
crontab() Execute every minute.
crontab(minute=0, hour=0) Execute daily at midnight.
crontab(minute=0, hour='*/3') Execute every three hours: midnight, 3am, 6am, 9am, noon, 3pm, 6pm, 9pm.
crontab(minute=0,
hour='0,3,6,9,12,15,18,21')
Same as previous.
crontab(minute='*/15') Execute every 15 minutes.
crontab(day_of_week='sunday') Execute every minute (!) at Sundays.
crontab(minute='*',
hour='*',day_of_week='sun')
Same as previous.
crontab(minute='*/10',
hour='3,17,22',day_of_week='thu,fri')
Execute every ten minutes, but only between 3-4 am, 5-6 pm, and 10-11 pm on Thursdays or Fridays.
crontab(minute=0,hour='*/2,*/3') Execute every even hour, and every hour divisible by three. This means: at every hour except: 1am, 5am, 7am, 11am, 1pm, 5pm, 7pm, 11pm
crontab(minute=0, hour='*/5') Execute hour divisible by 5. This means that it is triggered at 3pm, not 5pm (since 3pm equals the 24-hour clock value of “15”, which is divisible by 5).
crontab(minute=0, hour='*/3,8-17') Execute every hour divisible by 3, and every hour during office hours (8am-5pm).
crontab(0, 0,day_of_month='2') Execute on the second day of every month.
crontab(0, 0,
day_of_month='2-30/3')
Execute on every even numbered day.
crontab(0, 0,
day_of_month='1-7,15-21')
Execute on the first and third weeks of the month.
crontab(0, 0,day_of_month='11',
month_of_year='5')
Execute on the eleventh of May every year.
crontab(0, 0,
month_of_year='*/3')
Execute on the first month of every quarter.

 

上面能满足你绝大多数定时任务需求了,甚至还能根据潮起潮落来配置定时任务, 具体看 http://docs.celeryproject.org/en/latest/userguide/periodic-tasks.html#solar-schedules

 

最佳实践之与django结合 

django 可以轻松跟celery结合实现异步任务,只需简单配置即可

首先,在django项目的同名目录,即settings.py文件的位置,新建一个celery.py文件

from __future__ import absolute_import, unicode_literals
import os
from celery import Celery
 
# set the default Django settings module for the 'celery' program.
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'proj.settings')  # 设置成自己项目名的settings文件
 
app = Celery('proj')
 
# Using a string here means the worker don't have to serialize
# the configuration object to child processes.
# - namespace='CELERY' means all celery-related configuration keys
#   should have a `CELERY_` prefix.
app.config_from_object('django.conf:settings', namespace='CELERY') # 可以将所有的celery的设置放在Django settings内部定义,CELERY开头
 
# Load task modules from all registered Django app configs.
app.autodiscover_tasks()  # 自动去每个项目内部抓去tasks文件
 
 
@app.task(bind=True)
def debug_task(self):
    print('Request: {0!r}'.format(self.request))

再在此目录下修改__init__.py文件。这确保当Django启动时加载应用程序,以便@shared_task装饰器使用它:

from __future__ import absolute_import, unicode_literals
import pymysql # This will
make sure the app is always imported when # Django starts so that shared_task will use this app. from .celery import app as celery_app __all__ = ['celery_app'] pymysql.install_as_MySQLdb()

 

请注意,此示例项目布局适用于大的项目,对于简单的项目,可以使用一个简单包含的模块来定义app和tasks,例如:First Steps with Celery tutorial.

让我们分解以下在第一个模块中发生的事,首先我们从__future__导入absolute_import,这样我们的celery.py模块就不会与安装目录图书馆发生冲突。

from __future__ import absolute_import

然后,我们为celery命令行程序设置默认的DJANGO_SETTINGS_MODULE环境变量:

os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'proj.settings')

你不需要这一行,但它可以使你从始终避免在设置模块到可以不忽略在celery程序,它必须总是在创建应用程序实例之前出现,正如我们接下来做的

(英文:You don’t need this line, but it saves you from always passing in the settings module to the celery program. It must always come before creating the app instances, as is what we do next:)

app = Celery('proj')

这就是我们的实例。

We also add the Django settings module as a configuration source for Celery. This means that you don’t have to use multiple configuration files, and instead configure Celery directly from the Django settings; but you can also separate them if wanted.

The uppercase name-space means that all Celery configuration options must be specified in uppercase instead of lowercase, and start with CELERY_, so for example the task_always_eager` setting becomes CELERY_TASK_ALWAYS_EAGER, and the broker_url setting becomes CELERY_BROKER_URL.

You can pass the object directly here, but using a string is better since then the worker doesn’t have to serialize the object.

app.config_from_object('django.conf:settings', namespace='CELERY')

Next, a common practice for reusable apps is to define all tasks in a separate tasks.pymodule, and Celery does have a way to  auto-discover these modules:

app.autodiscover_tasks()

With the line above Celery will automatically discover tasks from all of your installed apps, following the tasks.py convention:

- app1/
    - tasks.py
    - models.py
- app2/
    - tasks.py
    - models.py

Finally, the debug_task example is a task that dumps its own request information. This is using the new bind=True task option introduced in Celery 3.1 to easily refer to the current task instance.

然后在具体的app里的tasks.py里写你的任务

# Create your tasks here
from __future__ import absolute_import, unicode_literals
from celery import shared_task
 
 
@shared_task
def add(x, y):
    return x + y
 
 
@shared_task
def mul(x, y):
    return x * y
 
 
@shared_task
def xsum(numbers):
    return sum(numbers)

到另一个项目里再建一个:

dandy@ubuntu01:~/PerfectCRM$ vim xadmin/tasks.py

# Create your tasks here
from __future__ import absolute_import, unicode_literals
from celery import shared_task


@shared_task
def sayhi(name):
    return "hello %s" % name

settings文件:

CELERY_BROKER_URL = 'redis://localhost'
CELERY_RESULT_BACKEND = 'redis://localhost'

view.py

import random
from celery.result import AsyncResult


# Create your views here.


def celery_call(request):
    ran_num = random.randint(1, 1000)
    print(ran_num)
    t = tasks.add.delay(ran_num, 6)
    return HttpResponse(t.id)


def celery_result(request):
    task_id = request.GET.get('id')
    res = AsyncResult(id=task_id)
    if res.ready():
        return HttpResponse(res.get())
    else:
        return HttpResponse(res.ready())

此时,起django网站:

dandy@ubuntu01:~/PerfectCRM$ python3 manage.py runserver 0.0.0.0:9000
Performing system checks...

System check identified some issues:

WARNINGS:
crm.Customer.tags: (fields.W340) null has no effect on ManyToManyField.

System check identified 1 issue (0 silenced).
July 10, 2018 - 19:39:03
Django version 2.0.7, using settings 'PerfectCRM.settings'
Starting development server at http://0.0.0.0:9000/
Quit the server with CONTROL-C.

启动worker:

dandy@ubuntu01:~/PerfectCRM$ celery -A PerfectCRM worker -l info
 
 -------------- celery@ubuntu01 v4.2.0 (windowlicker)
---- **** ----- 
--- * ***  * -- Linux-4.4.0-116-generic-x86_64-with-Ubuntu-16.04-xenial 2018-07-10 19:56:33
-- * - **** --- 
- ** ---------- [config]
- ** ---------- .> app:         proj:0x7f0255d93f28
- ** ---------- .> transport:   redis://localhost:6379//
- ** ---------- .> results:     redis://localhost/
- *** --- * --- .> concurrency: 2 (prefork)
-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
--- ***** ----- 
 -------------- [queues]
                .> celery           exchange=celery(direct) key=celery
                

[tasks]
  . PerfectCRM.celery.debug_task
  . crm.tasks.add
  . crm.tasks.mul
  . crm.tasks.xsum
  . xadmin.tasks.sayhi

[2018-07-10 19:56:33,206: INFO/MainProcess] Connected to redis://localhost:6379//
[2018-07-10 19:56:33,216: INFO/MainProcess] mingle: searching for neighbors
[2018-07-10 19:56:34,235: INFO/MainProcess] mingle: all alone
[2018-07-10 19:56:34,245: WARNING/MainProcess] /home/dandy/.local/lib/python3.5/site-packages/celery/fixups/django.py:200: UserWarning: Using settings.DEBUG leads to a memory leak, never use this setting in production environments!
  warnings.warn('Using settings.DEBUG leads to a memory leak, never '
[2018-07-10 19:56:34,246: INFO/MainProcess] celery@ubuntu01 ready.
[2018-07-10 19:56:34,416: INFO/MainProcess] Received task: crm.tasks.add[4eeb9530-7e5c-4bcb-a54d-f583ae38e171]  
[2018-07-10 19:56:34,423: INFO/ForkPoolWorker-2] Task crm.tasks.add[4eeb9530-7e5c-4bcb-a54d-f583ae38e171] succeeded in 0.0037738879982498474s: 743

登陆事先定义好的路由:

得到任务的id,现在去获取结果:

django实战总结:

  1、celery.py  ==>  需要一个定制好的celery.py文件,放在项目同名根目录下,修改内容,定义了django的设置模块指向settings,设置celery设置参数的统一前缀,定义了从所有被注册的app内加载tasks

  2、settings.py  ==>  增加任务id的目的位置 redis or RabbitMQ

  3、__init__.py  ==>  同名根目录下__init__.py,确保了当Django启动时应用程序被加载了,@shared_task装饰器会使用它

    4、tasks.py  ==>  增加各个app的tasks文件,书写你的任务

  5、url.py  ==>  需要分配任务,获取结果的定制路由

  6、view.py   ==>  导入项目的tasks,调用其方法

 

 

在django中使用计划任务功能  

1、安装依赖包

dandy@ubuntu01:~/PerfectCRM/crm$ pip3 install django-celery-beat

2、把安装的依赖包注册到settings的installed_app里:

INSTALLED_APPS = (
        ...,
        'django_celery_beat',
    )

3、建表,不需要makemigrations

python3 manage.py migrate

4、开始celery beat服务使用django scheduler

celery -A PerfectCRM beat -l info -S django

5、进入django admin做设置

登陆admin可以发现有三张表

配置完成:

此时启动你的celery beat 和worker,会发现每隔2分钟,beat会发起一个任务消息让worker执行scp_task任务

注意,经测试,每添加或修改一个任务,celery beat都需要重启一次,要不然新的配置不会被celery beat进程读到

 

posted @ 2018-07-07 17:03  dandyzhang  阅读(2724)  评论(1编辑  收藏  举报