如何进行函数式编程

周末有人问我,如何进行函数式编程,我的回答是:使用你现在的语言编写纯粹的函数。

      对纯粹的函数而言,它唯一的输入是它的参数列表,它唯一的输出是它的返回值。如果你未曾接受过这样的概念,也许你会误以为所有的函数都是所谓的纯粹的函数,因为所有的函数都会根据输入进行输出。

但是传统的编程通常带有其它方式的信息传递。比方说,一个非函数式的函数可能会依赖于一个外部变量或者向数据库写入数据。这样,函数就带有了其返回值以外的副作用。

      你可以通过任意一种语言进行函数式编程,尽管这种方法在不同的语言中难度不一。比方说没人愿说Fortran是一种函数式的语言但是却有不少人喜欢用Fortran编写函数式的程序。

      为什么要写纯粹的、不带副作用的函数呢?纯函数的参数是透明的,这意味这在相同输入的情况下它将获得相同的输出。函数的输出不会受到系统时间、数据库状态等诸多未在参数列表中的因素影响。这将意味着纯粹的函数将更容易理解、调试与测试。

      然而你不可能把所有的函数都写成纯函数。如果你需要在数据库中放置一个数据,它将很难通过纯粹的函数来实现。异或你在访问一个随机数生成器,你当然不会希望它始终有确定的输出。将所有函数都写成纯粹函数是不显示的,一些人认为在程序中有85%的纯粹函数是比较恰当的。

为什么人们不喜欢使用纯粹的函数式函数呢?

一方面,函数式的函数是的函数的参数列表变得复杂。在面向对象的语言中,对象的方法可以根据对象的状态而发生变化。使用简单的参数传递的代价就是,你会自己也弄不清楚这个方法究竟要进行怎样的操作,因为它取决于你的环境与状态。我更喜欢通过使用明确的、完整的函数名来避免使用过多的纯函数。

另外一个重要的原因是,人们不希望在函数之间进行过大的参数传递,然而指针可以解决这种问题。你完全可以将一个对象的指针传给某个函数而避免参数的拷贝。

      也许你会以效率为由拒绝使用纯函数。比方说,Mike谈到最近使用记忆功能大幅改善了程序的性能。一个使用记忆功能的函数是不符合函数式要求的函数,但是这种功能具备透明性,在拥有相同输入的时候总是产生相同的输出。你也许会认为不能将这类函数归入函数式函数的范畴,但是它确实是的。如果你一定要坚持己见的话,所有的函数都有副作用。

 

 

 [原文]

 

How to get started with functional programming

by John on July 24, 2011

Someone asked me this weekend how to get started with functional programming. My answer: Start by writing pure functions in the programming language you’re currently using.

The only input to a pure function is its argument list and the only output is its return value. If you haven’t seen this before, you might think all functions are pure. After all, any function takes in values and returns a value. But in conventional programming there are typically out-of-band ways for information to flow in or out of a function. For example, an impure function may depend on a global variable or class member data. In that case, it’s behavior is not entirely determined by its arguments. Similarly, an impure function might set a global variable or write to a database. In that case the function has a side effect in addition to its return value.

You can write pure functions in any language, though it’s easier in some languages than others. For example, no one would call Fortran a functional language, but there are people (M. J. D. Powell comes to mind) who discipline themselves to write pure functions in Fortran.

Why write pure functions? A pure function has referential transparency, meaning it will always return the same value when given the same inputs. The output does not depend on the system time, the state of a database, which functions were called previously, or anything else that is not explicitly passed as an argument to the function. This means pure functions are easier to understand (and hence easier to debug and test).

You can’t always write pure functions. If you need to stick a value in a database, this cannot be accomplished with a pure function. Or if you’re calling a random number generator, you don’t want it to have referential transparency, always returning the same output! But the goal is to use pure functions when practical. You want to eliminate out-of-band communication when it’s convenient to do so. Total purity is not practical; some argue that the sweet spot is about 85% purity.

So why don’t programmers use pure functions more often? One reason is that pure functions require longer argument lists. In an object oriented language, object methods can have shorter argument lists by implicitly depending on object state. The price to pay for shorter method signatures is that you can’t understand a method by itself. You have to know the state of the object when the method is called. Is it worthwhile to give up referential transparency in order to have shorter method signatures? It depends on your context and your taste, though in my opinion its often worthwhile to use longer function signatures in exchange for more pure functions.

Another reason people give for not writing pure functions is that its too expensive to copy large data structures to pass them into a function. But that’s what pointers are for. You can conceptually pass an object into a function without having to actually make a copy of the object’s bits.

You can also fake purity for the sake of efficiency. For example, Mike Swaim left a comment recently giving an example of how memoization sped up a program by several orders of magnitude. (Memoization is a technique of caching computations. When a function is asked to compute something, it first looks to see whether it has already done the calculation. If so, it returns the cached value. If not, it does the calculation and adds its output to the cache.) A function that uses memoization is not strictly pure — its calculations have a persistent impact on the state of its cache — but such a function can still have referential transparency, always returning the same output given the same input. You could say it’s cheating to call such functions pure, and it is, but if you’re really a stickler about it, all pure functions have side effects.

 

 posted on 2011-08-26 16:43  Jiang, X.  阅读(211)  评论(0编辑  收藏  举报