Better performance with tf.function(TensorFlow/学习/TensorFlow Core/指南)

本文原始页面链接:https://tensorflow.google.cn/guide/function

在TensorFlow 2.x系列的版本,eager execution功能是默认打开的,用户接口更直观和灵活,但是代价体现在性能和部署的方面。
想要得到性能和便携性都很好的模型,可以使用tf.function从你的程序中生成图(graph)。但是关于tf.function也有一些问题/陷阱需要注意,本文帮助你理解tf.function到底在做些什么,以方便你掌握它。


** 追踪(trace/tracing):tensorflow依据输入构建图(Graph)的过程称为追踪(tracing)。 **
** 回溯(retracing):当参数发生变化,重新构建图(Graph)的过程。 **

主要结论和建议包括:

  • 使用Eager execution模式debug,使用tf.function装饰函数 / Debug in Eager mode, then decorate with @tf.function.
  • 不要依赖python副作用,如对象转变或者是列表append / Don't rely on Python side effects like object mutation or list appends.
  • tf.function与TensorFlow op搭配工作的最好,numpy和python调用被转换成常量 / tf.function works best with TensorFlow ops; NumPy and Python calls are converted to constants.

开始(Setup)

import tensorflow as tf

定义一个帮助函数,提示可能遇到的错误类型:

import traceback
import contextlib

# Some helper code to demonstrate the kinds of errors you might encounter.
@contextlib.contextmanager
def assert_raises(error_class):
  try:
    yield
  except error_class as e:
    print('Caught expected exception \n  {}:'.format(error_class))
    traceback.print_exc(limit=2)
  except Exception as e:
    raise e
  else:
    raise Exception('Expected {} to be raised but no error was raised!'.format(
        error_class))

基本(Basics)

你自己定义的函数,经过tf.function装饰后,就成为像tensorflow的核心操作一样,可以eager execution,求梯度等等。。

比如定义add函数,执行eager execution,

@tf.function
def add(a, b):
  return a + b

add(tf.ones([2, 2]), tf.ones([2, 2]))  #  [[2., 2.], [2., 2.]]

输出是:

<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[2., 2.],
       [2., 2.]], dtype=float32)>

对add求梯度,

v = tf.Variable(1.0)
with tf.GradientTape() as tape:
  result = add(v, 1.0)
tape.gradient(result, v)

其输出是,

<tf.Tensor: shape=(), dtype=float32, numpy=1.0>

也可以在tf.function函数中使用定义过的tf.function函数,

@tf.function
def dense_layer(x, w, b):
  return add(tf.matmul(x, w), b)

dense_layer(tf.ones([3, 2]), tf.ones([2, 2]), tf.ones([2]))
<tf.Tensor: shape=(3, 2), dtype=float32, numpy=
array([[3., 3.],
       [3., 3.],
       [3., 3.]], dtype=float32)>

如此定义的tf.function函数比立即执行的代码要快,特别是对于有许多小op操作的图(Graph),但是对于有一些计算量大的操作(比如卷积conv),也许速度就不会有明显提升,

import timeit
conv_layer = tf.keras.layers.Conv2D(100, 3)

@tf.function
def conv_fn(image):
  return conv_layer(image)

image = tf.zeros([1, 200, 200, 100])
# warm up
conv_layer(image); conv_fn(image)
print("Eager conv:", timeit.timeit(lambda: conv_layer(image), number=10))
print("Function conv:", timeit.timeit(lambda: conv_fn(image), number=10))
print("Note how there's not much difference in performance for convolutions")
Eager conv: 0.004070537999723456
Function conv: 0.0023154040000008536
Note how there's not much difference in performance for convolutions

调试(Debugging)

一般情况下,在eager模式下调试代码比在tf.function函数中调试更简单。使用tf.function装饰你的函数之前,应该确保在eager模式下没有运行错误。为了帮助调试,可以使用tf.config.run_functions_eagerly(True)全局地禁用/允许tf.function功能。

在调试一些只在tf.function定义的函数内部出现的问题时,有以下几点提示:

  • 只在tracing的时候调用python的print,帮助跟踪函数. / Plain old Python print calls only execute during tracing, helping you track down when your functions get (re)traced.
  • tf.print调用每次都会执行,在执行时追踪中间值。/ tf.print calls will execute every time, and can help you track down intermediate values during execution.
  • 当出现NaN与Inf时,使用tf.debugging.enable_check_numerics 帮助追踪./ tf.debugging.enable_check_numerics is an easy way to track down where NaNs and Inf are created.
  • pdb 能帮助理解tracing时程序的详细信息。/ pdbcan help you understand what's going on during tracing. (Caveat: PDB will drop you into AutoGraph-transformed source code.)

追踪和多态性(Tracing and polymorphism)

python的动态类型意味着你可以使用各种不同的参数类型调用函数,而python会对应不同的参数执行不同的行为。

相反的是,TensorFlow的图(Graph)要求静态的数据类型和数据维度(形状);tf.function弥补了这一差距,它在需要时会回溯函数,从而产生正确的图。ft.function使用时的微妙之处大多数都源自于这种回溯行为。

使用不同的参数调用函数,观察如下,

# Functions are polymorphic

@tf.function
def double(a):
  print("Tracing with", a)
  return a + a

print(double(tf.constant(1)))
print()
print(double(tf.constant(1.1)))
print()
print(double(tf.constant("a")))
print()
Tracing with Tensor("a:0", shape=(), dtype=int32)
tf.Tensor(2, shape=(), dtype=int32)

Tracing with Tensor("a:0", shape=(), dtype=float32)
tf.Tensor(2.2, shape=(), dtype=float32)

Tracing with Tensor("a:0", shape=(), dtype=string)
tf.Tensor(b'aa', shape=(), dtype=string)

想要控制这种追溯回溯行为,可以使用如下的技术:

创建一个新的tf.function,独立的tf.function函数对象已经确保不会共享追踪。

def f():
  print('Tracing!')
  tf.print('Executing')

tf.function(f)()
tf.function(f)()
Tracing!
Executing
Tracing!
Executing

使用get_concrete_function来获取一个特定的追踪:

print("Obtaining concrete trace")
double_strings = double.get_concrete_function(tf.TensorSpec(shape=None, dtype=tf.string))
print("Executing traced function")
print(double_strings(tf.constant("a")))
print(double_strings(a=tf.constant("b")))
print("Using a concrete trace with incompatible types will throw an error")
with assert_raises(tf.errors.InvalidArgumentError):
  double_strings(tf.constant(1))
Obtaining concrete trace
Tracing with Tensor("a:0", dtype=string)
Executing traced function
tf.Tensor(b'aa', shape=(), dtype=string)
tf.Tensor(b'bb', shape=(), dtype=string)
Using a concrete trace with incompatible types will throw an error
Caught expected exception 
  <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>:

Traceback (most recent call last):
  File "<ipython-input-3-73d0ca52e838>", line 8, in assert_raises
    yield
  File "<ipython-input-10-5351d0a2eda2>", line 8, in <module>
    double_strings(tf.constant(1))
tensorflow.python.framework.errors_impl.InvalidArgumentError: cannot compute __inference_double_183 as 
      input #0(zero-based) was expected to be a string tensor but is a int32 tensor [Op:__inference_double_183]

在tf.function中指定input_signature来限制追踪:

@tf.function(input_signature=(tf.TensorSpec(shape=[None], dtype=tf.int32),))
def next_collatz(x):
  print("Tracing with", x)
  return tf.where(x % 2 == 0, x // 2, 3 * x + 1)

print(next_collatz(tf.constant([1, 2])))
# We specified a 1-D tensor in the input signature, so this should fail.
with assert_raises(ValueError):
  next_collatz(tf.constant([[1, 2], [3, 4]]))

Tracing with Tensor("x:0", shape=(None,), dtype=int32)
tf.Tensor([4 1], shape=(2,), dtype=int32)
Caught expected exception 
  <class 'ValueError'>:

Traceback (most recent call last):
  File "<ipython-input-3-73d0ca52e838>", line 8, in assert_raises
    yield
  File "<ipython-input-11-9939c82c1507>", line 9, in <module>
    next_collatz(tf.constant([[1, 2], [3, 4]]))
ValueError: Python inputs incompatible with input_signature:
  inputs: (
    tf.Tensor(
[[1 2]
 [3 4]], shape=(2, 2), dtype=int32))
  input_signature: (
    TensorSpec(shape=(None,), dtype=tf.int32, name=None))

何时回溯(When to retrace)

一个多态的tf.function会保持追踪时产生的函数实体的缓存。缓存的keys是从function的args和kwargs参数产生的有效tuple。对于一个tf.Tensor参数产生的key是它的维度数目和类型,对python参数产生的key是它的值。对于其他的python类型,keys基于对象的id()产生,因此对类的每一个实例的方法都能够单独追踪。未来,TensorFlow有可能会给python对象增加更复杂的缓存,使对象能够安全地转换为Tensor。

可参考Concrete functions

Python还是Tensor参数?(Python or Tensor args?)

通常情况下,Python参数用于控制超参数与图的构建,如num_layers=10,training=True,nonlinearity=relu。如果python的参数改变了,就必须要回溯图。

然而,不用python参数去控制图的构建也是可能的。python的值改变会引发不必要的回溯,例如在这个训练循环中,AutoGraph会动态展开,尽管有多个追踪,产生的图实际上是相同的,这样效率有些低。

def train_one_step():
  pass

@tf.function
def train(num_steps):
  print("Tracing with num_steps = {}".format(num_steps))
  for _ in tf.range(num_steps):
    train_one_step()

train(num_steps=10)
train(num_steps=20)

Tracing with num_steps = 10
Tracing with num_steps = 20

如果你的参数不影响产生的图的形状,可以将你的参数转换成Tensor,

train(num_steps=tf.constant(10))
train(num_steps=tf.constant(20))
Tracing with num_steps = Tensor("num_steps:0", shape=(), dtype=int32)

tf.function的副作用

一般情况下,python的副作用(print,转换对象等)只在追踪(tracing)时发生,那么如何可靠的从tf.function中触发python的副作用呢?

经验就是只在调试时使用python副作用。否则,TensorFlow的操作(ops)像 tf.Variable.assign, tf.print, tf.summary等,是保证你的代码被TensorFlow Runtime追踪以及执行的最好选择。一般使用函数风格会产生最好的结果,

@tf.function
def f(x):
  print("Traced with", x)
  tf.print("Executed with", x)

f(1)
f(1)
f(2)

Traced with 1 //只在追踪时执行python print
Executed with 1
Executed with 1
Traced with 2 //只在追踪时执行python print
Executed with 2

如果你想在每次调用tf.function时执行python代码,tf.py_function是一个出口。tf.py_function的缺陷是它既不便携,也不特别高效,而且在分布式设备(multi-GPU, TPU)也不能很好地工作。而且,为了可微分性,tf.py_function必须接入图,它将(与图之间的)所有的输入/输出强制转换为Tensor。

external_list = []

def side_effect(x):
  print('Python side effect')
  external_list.append(x)

@tf.function
def f(x):
  tf.py_function(side_effect, inp=[x], Tout=[])

f(1)
f(1)
f(1)
assert len(external_list) == 3
# .numpy() call required because py_function casts 1 to tf.constant(1)
assert external_list[0].numpy() == 1
Python side effect
Python side effect
Python side effect

注意python的状态(Beware of Python state)

python的许多特征,如迭代器或者生成器,依靠python runtime去追踪状态。一般来说,这些构造器在Eager模式下会按照预期的那样工作,由于追踪行为(tracing behavior),它们在tf.function内部会引发许多意外的结果。

比如,向前推进迭代器的状态是Python的一个副作用,因此只在跟踪期间发生,

external_var = tf.Variable(0)
@tf.function
def buggy_consume_next(iterator):
  external_var.assign_add(next(iterator))
  tf.print("Value of external_var:", external_var)

iterator = iter([0, 1, 2, 3])
buggy_consume_next(iterator)
# This reuses the first value from the iterator, rather than consuming the next value.
buggy_consume_next(iterator)
buggy_consume_next(iterator)

Value of external_var: 0
Value of external_var: 0
Value of external_var: 0

变量(Variables)

我们可以使用与提升代码的预期执行顺序所相同的思想,使得在tf.function中创建和使用变量非常容易。

一个很重要的警告,使用变量时,可以写出在eager模式和图模式下表现不同的代码。
具体地说,当你每次调用的时候都创建一个新的变量,这种情况就有可能发生。由于追踪语义,tf.function会在每次调用时复用同一个变量,但是eager模式会在每次调用时创建一个新的变量。 为了解决这个错误,当检测到危险的变量创建行为时,tf.function会引发错误警告。

@tf.function
def f(x):
  v = tf.Variable(1.0)
  v.assign_add(x)
  return v

with assert_raises(ValueError):
  f(1.0)

WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/ops/resource_variable_ops.py:1817: 
                              calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint
                              is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
Caught expected exception 
  <class 'ValueError'>:

Traceback (most recent call last):
  File "<ipython-input-3-73d0ca52e838>", line 8, in assert_raises
    yield
  File "<ipython-input-17-73e410646579>", line 8, in <module>
    f(1.0)
ValueError: in user code:

    <ipython-input-17-73e410646579>:3 f  *
        v = tf.Variable(1.0)
    /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/ops/variables.py:261 __call__  **
        return cls._variable_v2_call(*args, **kwargs)
    /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/ops/variables.py:255 _variable_v2_call
        shape=shape)
    /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/ops/variables.py:66 getter
        return captured_getter(captured_previous, **kwargs)
    /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py:511 invalid_creator_scope
        "tf.function-decorated function tried to create "

    ValueError: tf.function-decorated function tried to create variables on non-first call.

无歧义的代码是没有问题的,

v = tf.Variable(1.0)

@tf.function
def f(x):
  return v.assign_add(x)

print(f(1.0))  # 2.0
print(f(2.0))  # 4.0

tf.Tensor(2.0, shape=(), dtype=float32)
tf.Tensor(4.0, shape=(), dtype=float32)

你只要能够保证这些变量只在tf.function函数第一次执行时被创建,也可以在tf.function中创建变量,

class C:
  pass

obj = C()
obj.v = None

@tf.function
def g(x):
  if obj.v is None:
    obj.v = tf.Variable(1.0)
  return obj.v.assign_add(x)

print(g(1.0))  # 2.0
print(g(2.0))  # 4.0

tf.Tensor(2.0, shape=(), dtype=float32)
tf.Tensor(4.0, shape=(), dtype=float32)

变量的初始化可以通过函数的参数以及其他变量的值,我们可以使用与产生控制依赖项相同的方法来确定的初始化顺序,

state = []
@tf.function
def fn(x):
  if not state:
    state.append(tf.Variable(2.0 * x))
    state.append(tf.Variable(state[0] * 3.0))
  return state[0] * x * state[1]

print(fn(tf.constant(1.0)))
print(fn(tf.constant(3.0)))

tf.Tensor(12.0, shape=(), dtype=float32)
tf.Tensor(36.0, shape=(), dtype=float32)

AutoGraph 转换(AutoGraph Transformations)

Autograph是在tf.function中默认启用的库,它将python中一部分子集代码转换为图兼容的TensorFlow操作(ops),包含控制流如if,for,while等。

TensorFlow ops如tf.condtf.while_loop可以工作,但是用python代码写的控制流更容易编写和理解。

# Simple loop

@tf.function
def f(x):
  while tf.reduce_sum(x) > 1:
    tf.print(x)
    x = tf.tanh(x)
  return x

f(tf.random.uniform([5]))
[0.42992723 0.425026417 0.735794306 0.224515557 0.623353]
[0.405260503 0.401156455 0.626597464 0.2208177 0.553458273]
[0.384441048 0.380938053 0.555704892 0.217297286 0.503107667]
...
...
[0.203162178 0.202645645 0.219479173 0.161014691 0.215892553]

<tf.Tensor: shape=(5,), dtype=float32, numpy=
array([0.20041241, 0.19991657, 0.2160216 , 0.1596375 , 0.21259971],
      dtype=float32)>

通过下面的语句,可以检查自动图(AutoGraph)转换所生成的代码,

print(tf.autograph.to_code(f.python_function))

def tf__f(x):
    do_return = False
    retval_ = ag__.UndefinedReturnValue()
    with ag__.FunctionScope('f', 'fscope', ag__.ConversionOptions(
      recursive=True, user_requested=True,optional_features=(), internal_convert_user_code=True)) as fscope:

        def get_state():
            return (x,)

        def set_state(loop_vars):
            nonlocal x
            (x,) = loop_vars

        def loop_body():
            nonlocal x
            ag__.converted_call(tf.print, (x,), None, fscope)
            x = ag__.converted_call(tf.tanh, (x,), None, fscope)

        def loop_test():
            return (ag__.converted_call(tf.reduce_sum, (x,), None, fscope) > 1)
        ag__.while_stmt(loop_test, loop_body, get_state, set_state, ('x',), {})
        try:
            do_return = True
            retval_ = fscope.mark_return_value(x)
        except:
            do_return = False
            raise
    (do_return,)
    return ag__.retval(retval_)

· 条件

AutoGraph会转换一些条件语句(if<condition>)变成相同的tf.cond调用。如果<condition>是Tensor的话,AutoGraph就会执行转换,否则,if就按照python语句执行。

Python条件只在追踪时(tracing)执行,因此条件的一个分支会被加入图中。没有AutoGraph,如果存在依赖于数据的控制流,则此跟踪图(Tracing Graph)将无法执行选择分支。

tf.cond追踪和增加所有条件分支到图中,在执行时动态选择一个分支。追踪也会也发意外的副作用,更多信息参考AutoGraph tracing effects

@tf.function
def fizzbuzz(n):
  for i in tf.range(1, n + 1):
    print('Tracing for loop')
    if i % 15 == 0:
      print('Tracing fizzbuzz branch')
      tf.print('fizzbuzz')
    elif i % 3 == 0:
      print('Tracing fizz branch')
      tf.print('fizz')
    elif i % 5 == 0:
      print('Tracing buzz branch')
      tf.print('buzz')
    else:
      print('Tracing default branch')
      tf.print(i)

fizzbuzz(tf.constant(5))
fizzbuzz(tf.constant(20))

Tracing for loop
Tracing fizzbuzz branch
Tracing fizz branch
Tracing buzz branch
Tracing default branch
1
2
fizz
4
buzz
1
2
fizz
4
buzz
fizz
7
8
fizz
buzz
11
fizz
13
14
fizzbuzz
16
17
fizz
19
buzz

参考reference documentation有关AutoGraph转换的if语句的额外限制。

· 循环

AutoGraph转换一些forwhile语句成为等同的TensorFlow ops,比如tf.while_loop。如果没有转换,forwhile循环当作python语句执行。

这个转换在以下情况下进行:

  • for x in y: 如果y是一个Tensor,转换为tf.while_loop。在y是一个tf.data.Dataset的特殊情况下,会生成一个tf.data.Datasetops的连接。
  • while<condition>: 如果<condition>是Tensor,转换为tf.while_loop

Python循环在追踪期间(tracing)执行,每次循环迭代都对图(Graph)增加额外的ops。

TensorFlow的循环对循环体进行追踪,在Runtime动态地选择迭代次数。循环体在生成的tf.Graph中只出现一次。

参考reference documentation有关AutoGraph转换的forwhile语句的额外限制。

· 对python数据的循环(Looping over Python data)

一个常见的问题是在tf.function内循环Numpy/python的数据,这个循环在追踪过程中执行,每次迭代都复制你的模型到tf.Graph中。

如果想要把整个训练循环都包括在tf.function中,最安全的方式就是将你的数据包裹为tf.data.Dataset,这样AutoGraph会动态地展开训练数据。

def measure_graph_size(f, *args):
  g = f.get_concrete_function(*args).graph
  print("{}({}) contains {} nodes in its graph".format(
      f.__name__, ', '.join(map(str, args)), len(g.as_graph_def().node)))

@tf.function
def train(dataset):
  loss = tf.constant(0)
  for x, y in dataset:
    loss += tf.abs(y - x) # Some dummy computation.
  return loss

small_data = [(1, 1)] * 3
big_data = [(1, 1)] * 10
measure_graph_size(train, small_data)
measure_graph_size(train, big_data)

measure_graph_size(train, tf.data.Dataset.from_generator(
    lambda: small_data, (tf.int32, tf.int32)))
measure_graph_size(train, tf.data.Dataset.from_generator(
    lambda: big_data, (tf.int32, tf.int32)))

train([(1, 1), (1, 1), (1, 1)]) contains 11 nodes in its graph
train([(1, 1), (1, 1), (1, 1), (1, 1), (1, 1), (1, 1), (1, 1), (1, 1), (1, 1), (1, 1)]) contains 32 nodes in its graph
train(<FlatMapDataset shapes: (<unknown>, <unknown>), types: (tf.int32, tf.int32)>) contains 8 nodes in its graph
train(<FlatMapDataset shapes: (<unknown>, <unknown>), types: (tf.int32, tf.int32)>) contains 8 nodes in its graph

当把Numpy/Python数据包裹进Dataset中,注意区别使用tf.data.Dataset.from_generatortf.data.Dataset.from_tensors,前者会将数据保持python的形式,通过tf.py_function取数据,性能方面会有影响;后者将数据复制后作为图上的一个tf.constant节点,内存方面会有影响。

从TFRecordDataset/CsvDataset/etc读取数据是最高效的形式,无需python的参与,TensorFlow会自动管理数据的异步载入和预存取。更多信息参考 tf.data guide

· 在循环中累积值(Accumulating values in a loop)

通常操作是在循环中累积中间值,正常情况下,这是通过python的扩展列表(list)或者增加字典实体来实现的。由于python的副作用,在动态展开的循环中,这些操作不会像预期的那样工作。使用tf.TensorArray从动态展开循环中累积结果/中间值。

batch_size = 2
seq_len = 3
feature_size = 4

def rnn_step(inp, state):
  return inp + state

@tf.function
def dynamic_rnn(rnn_step, input_data, initial_state):
  # [batch, time, features] -> [time, batch, features]
  input_data = tf.transpose(input_data, [1, 0, 2])
  max_seq_len = input_data.shape[0]

  states = tf.TensorArray(tf.float32, size=max_seq_len)
  state = initial_state
  for i in tf.range(max_seq_len):
    state = rnn_step(input_data[i], state)
    states = states.write(i, state)
  return tf.transpose(states.stack(), [1, 0, 2])
  
dynamic_rnn(rnn_step,
            tf.random.uniform([batch_size, seq_len, feature_size]),
            tf.zeros([batch_size, feature_size]))

<tf.Tensor: shape=(2, 3, 4), dtype=float32, numpy=
array([[[0.96471524, 0.233114  , 0.1417228 , 0.14083493],
        [1.6257136 , 0.9389272 , 0.73989546, 0.8011714 ],
        [2.233508  , 1.827873  , 1.1567426 , 1.5585394 ]],

       [[0.67377114, 0.42712367, 0.5697857 , 0.71173656],
        [1.5520021 , 0.806401  , 0.9260858 , 1.5265073 ],
        [1.8115815 , 1.6316041 , 1.2245122 , 1.9724467 ]]], dtype=float32)>

Further reading

要了解在追踪了tf.function以后更多的图优化内容,参考Grappler guide
要连接如何优化数据流水线和配置模型,参考Profiler guide

posted @ 2020-07-10 16:49  banluxinshou  阅读(677)  评论(0编辑  收藏  举报