令我感到惊诧的性能问题——Delegate

最近在组织一个.GAME FRAMEWORK的项目，由于这是一个和游戏相关的项目，所以必须要考虑性能的问题。比如说事件，就让我想起一篇比较久远的文章，里面提到了delegate的效率问题。里面的数据是：

平均	最小	调用类型
0.2	0.2	inlined static call
6.1	6.1	static call
1.1	1.0	inlined instance call
6.8	6.8	instance call
0.2	0.2	inlined this inst call
6.2	6.2	this instance call
6.2	6.2	this instance call
5.4	5.4	this virtual call
6.6	6.5	interface call
1.1	1.0	inst interface instance call
0.2	0.2	this interface instance call
5.4	5.4	inst interface virtual call
5.4	5.4	this interface virtual call
41.1	40.9	delegate invoke

可以看到delegate的效率是如此的低下，所耗费的时间至少是其他调用方法里面最慢的那个的5倍以上！为了减轻这个负担，我和dudu专门研究了一下这个问题。dudu找到了不少的文章，大家可以看一下：
http://blogs.msdn.com/brada/archive/2004/02/05/68415.aspx
http://blog.monstuff.com/archives/000037.html
http://blog.monstuff.com/archives/000040.html

看完这几篇文章，大家都应该知道了，MS对Delegate做了一些手脚，使得我们完全没有办法自己来提高Delegate的性能。实际上这个性能并不是这几篇文章所描述的那么简单的。在这里我给大家一段代码，大家可以测试一下：
public class Test { /// <summary> /// 应用程序的主入口点。 /// </summary> [STAThread] static void Main() { Tester t = new Tester(); t.Start(); } }


    public interface IFacecall
    {
        void Facecall(object sender, EventArgs e);
    }
    public class TestEvent
    {
        public event EventHandler Event;
//        public ArrayList face = new ArrayList();
        public IFacecall[] face;
        public int TestTime = 10000000;
        public void TestTheEvent()
        {
            for (int i = 0; i < TestTime; i++)
            {
                if (Event != null)
                {
                    Event(this, EventArgs.Empty);
                }
            }
        }
        public void TestTheFace()
        {
//            IFacecall item;
            int c;
            for (int i = 0; i < TestTime; i++)
            {
//                c = face.Count;
                c = face.Length;
                for (int j = 0; j < c; j++)
                {
//                    item = face[j] as IFacecall;
//                    if (item != null)
//                    {
                        face[j].Facecall(this, EventArgs.Empty);
//                    }
                }
                
            }
        }
    }
    public class Tester: IFacecall
    {
        public TestEvent te;
        public Tester()
        {
            te = new TestEvent();
            te.Event += new EventHandler(te_Event);
//            te.Event += new EventHandler(te2_Event);
//            te.Event += new EventHandler(te3_Event);
//            te.Event += new EventHandler(te4_Event);
//            te.face = new IFacecall[4] { this, this, this, this};
            te.face = new IFacecall[1] { this};
//            te.face.Add(this);
//            te.face.Add(this);
//            te.face.Add(this);
//            te.face.Add(this);
        }
        public void Start()
        {
            long l1 = DateTime.Now.Ticks;
            te.TestTheEvent();
            l1 = DateTime.Now.Ticks - l1;
            long l2 = DateTime.Now.Ticks;
            te.TestTheFace();
            l2 = DateTime.Now.Ticks - l2;
            MessageBox.Show( l1.ToString() + ":" + l2.ToString());
        }
        #region IFacecall 成员
        public void Facecall(object sender, EventArgs e)
        {
        }
        #endregion
        private void te_Event(object sender, EventArgs e)
        {
        }
        private void te2_Event(object sender, EventArgs e)
        {
        }
        private void te3_Event(object sender, EventArgs e)
        {
        }
        private void te4_Event(object sender, EventArgs e)
        {
        }
    }

大家运行一下，看看结果是什么？实际上这个例子里面有大量的代码被注释了，大家可以看看如果将注释的代码和现有的一些代码进行一定的调整，会有什么结果。比如说：用ArrayList代替数组，注册四个事件和四个接口调用。当然了，用ArrayList肯定比数组慢，但是似乎也不会慢很多。但是注册四个事件对性能的消耗却远远比注册四个接口调用要高得多！为什么呢？这个我们就不得不研究MultiCastDelegate了，因为event是基于MultiCastDelegate的。实际上Delegate会调用一个叫做DynamicInvokeImpl的函数，该函数如下：
protected sealed override object DynamicInvokeImpl(object[] args) { if (this._prev != null) { this._prev.DynamicInvokeImpl(args); } return base.DynamicInvokeImpl(args); }
也就是说，多触发能力实际上是MultiCastDelegate的DynamicInvokeImpl函数通过递归调用遍历由_prev所维护的单链表提供的。大家应该知道了吧，性能损失就在这里。顺便说一下，对一个大小为10000的int数组求总值，也就需要2.2的时间，相当于两次调用（参见最上面的那个表）。也就是说，在这里面会消耗非常大的CPU资源。不过事实上是否是这样，那我就不知道了。

如果采取interface的话，估计会有很多人不能适应，毕竟event所提供的便利性很难让人放弃。但是看看MSDN上面一篇文章，你就不得不思考效率的问题了。如果你不想看文字的话，看这个图也就已经一目了然了：

posted on 2004-05-23 21:39 Sumtec 阅读(4002) 评论(12) 收藏举报

刷新页面返回顶部

SUMTEC -- There's a thing in my bloglet.

公告

令我感到惊诧的性能问题——Delegate