【论文考古】How to do Research At the MIT AI Lab
今天分享一篇科研经验总结的论文,发布在1988年的MIT AI实验室中。
这是一篇三十多页的文章,分享了在科研中关于reading,writing,researching的一些经验,避免新手科研人员走弯路。虽然距离现在已经三十年有余,但仍不失为一篇重要的参考。另外一篇Craft of Research,之后有空也会在博客上总结分享。
这次的分享形式,会有一个大的分类,然后将感触较深的文段摘抄点评。
阅读
阅读有多重要?
Many researchers spend more than half their time reading.
巧妇难为无米之炊,多读才能避免瞎想。同时花大量时间读文章了解进展想法是正常的,不必急于求成。
读什么?
A reference graph is a web of citations.
It's very valuable to understand as many approaches as possible-often more so than understanding one approach in greater depth.
没读一篇文章,找它的参考文献,然后读新的参考文献。当这个过程收敛,也就是所有参考文献都有印象后,这个方向就熟悉了。
读文章中有意思的部分,了解进行多的方法比深入了解某一种方法更有效。
交流
这部分我其实是很赞同文章的,但是感觉在目前的科研坏境中很难搭建Secret Paper Passing Network,大家都倾向于保留自己最新的科研想法。不过至少要知道与其他人交流、讨论的重要性,埋头造车指定不行。
Talks should almost never go on for more than an hour.
Don't try to cram everything you know into a talk.
这里顺便总结Talk方面的经验。首先是时间别太长。当然根本的原因就是不要塞太多东西到一个talk当中,把亮点讲出来,要搞懂肯定还得去看论文的。
学习与科研
文章提出了很多学习的方法,因为AI通常是不同领域的交叉融合。比较有效的应该是参加Talks,和这个领域的研究生hang out交流。自己做的是和通信交叉的方向,其中通信在本科已经有比较全面的授课,对于计算机领域里的AI需要学习新的东西。
Take as much math as you can early, while you still can; other fields are more easily picked up later.
永远不要忽视数学。
学习到什么程度?
There's a trap here: thinking "if only I knew more X, this problem would be easy," for all X.
There's always more to know that could be relevant. Eventually you have to sit down and solve the problem.
Like papers, programs can be over-polished.
科研说到底应该是问题驱动的。如果一直想着要学这个学那个,最终反而会本末倒置地忽略本职工作。
同时,关于拷贝别人代码后修改使用而言,一条经验是
It's sometimes better to write your own.
作为被代码屎山折磨过的人认为这说的很对。
在AI领域,所用的科研讨论是混杂的,甚至相互冲突的。在选择科研重点(实用or形式)的时候,不要人云亦云,要有自己的想法和选择。要有证明定理的能力,但是也要有怀疑定理的态度。在多种方法混合的时候,运用之妙,存乎一心。
Any one piece of work, and any one person, should aim for a judicious balance, formalizing subproblems that seem to cry for it while keeping honest to the Big Picture.
那么科研的风格有哪些呢?
Some work is like science. You look at how people learn arithmetic, how the brain works, how kangaroos hop, and try to figure it out and make a testable theory.
Some work is like engineering: you try to build a better problem solver or shape-from algorithm.
Some work is like mathematics: you play with formalisms, try to understand their properties, hone them, prove things about them.
Some work is example-driven, trying to explain specific phenomena.
The best work combines all these and more.
最好的AI工作就是将这些有效结合的工作。
The rule of thumb is that any given subtask will take three times as long as you expect.
You can get a lot more work done by regularly setting short and medium term goals, weekly and monthly for instance.
在工作的时候,一般会估计所需要的时间,一般都会估计时间的三倍,不管你考虑这个规律。。。不过短期的设定目标是一个提高工作效率的好方法。
If you find yourself seriously stuck, with nothing at all happening for a week or more, promise to work one hour a day.
If you find yourself inexplicably "unable" to get work done, ask whether you are avoiding putting your ideas to the test.
除了提高效率,如果科研的时候卡住了怎么办?首先不要放弃,每天至少干一个小时重新适应。其次如果真的一筹莫展,本质还是“猜想-尝试”的循环停顿了。这个时候一方面要多找思路,另一方面要积极尝试。
It's easy not to see the progress you have made. "If I can do it, it's trivial. My ideas are all obvious." They may be obvious to you in retrospect, but probably they are not obvious to anyone else. Explaining your work to lots of strangers will help you keep in mind just how hard it is to understand what now seems trivial to you. Write it up.
A common and important part of any scientific progress is constant critical evaluation, and is some amount of uncertainty over the value of the work is an inevitable part of the process.
这是对工作重要性的评价。首先不要自轻觉得很简单,因为只有自己是想通的。和其他人讨论会是一个比较好的方式来辨别哪些部分比较难,或许这可以作为sexy stuff在论文里强调。当然,时刻对工作怀疑也是成为诺贝尔奖得主的必经之路,除了孔明半场开香槟的都凉了。
笔记
Record in your notebook ideas as they come up.
Put in speculations, current problems in your work, possible solutions. Work through possible solutions there. Summarize for future reference interesting things you read.
Some people make a monthly summary for easy reference.
科研上的想法不一定是“买久不买新”,可能之前弹出来的想法反而是更有价值的。随时将弹出来的想法记录下来,同时及时整理,不然可能就成遗珠了。之前就是到处新建文件,反而找不到找不全。要坚持一种规划。
写作
写作的重要性就不言而喻了。
When writing a paper, read books that are well-written, thinking in background mode about the syntactic mechanics.
这也是学习的一个敲门,一定要思考作者为什么这么写,如果是我来写会怎么写,有啥区别,这样才能找到和优秀行文的差距。光是死记硬背的话,毫无作用,学我者生像我者死。
Writing is sometimes painful, and it can seem a distraction from doing the "real" work.
Realize that writing is a debugging process. Write something sloppy first and go back and fix it up.
Starting sloppy gets the ideas out and gets you into the flow. If you "can't" write text, write an outline. Make it more and more detailed until it's easy to write the subsubsubsections.
写作确实是痛苦的,那写不出来的时候咋办呢?能写一点写一点,写作本身就是一个反复debug的过程。那如果实在写不出来咋办呢?
Type whatever comes into your head, even if it seems like garbage. After you've got a lot of text out, turn the knob back up and edit what you've written into something sensible.
别害怕!就算写的是一坨屎,也能慢慢改回来。
Writing is hard work and takes a long time; don't get frustrated and give up if you find you write only a page a day.
写作确实是痛苦又消磨人的,做好心理准备,别泄气。
Put the sexy stuff up front, at all levels of organization from paragraph up to the whole paper.
这是这次写论文犯的错误,没把自己做得东西明明白白地写出来,这就没显示出自己的工作量了。
Often you'll write a clause or sentence or paragraph that you know is bad, but you won't be able to find a way to fix it. This happens because you've worked yourself into a corner and no local choice can get you out. You have to back out and rewrite the whole passage.
经常遇到这种问题。就是自己知道这一段的写作有问题,但是真改不了了。这其实就是“陷进去了”。最好是重起炉灶,这比屎上雕花好多了。
Don't just explain how your system is built and what it does, also explain why it works and why it's interesting.
Write for people, not machines. It's not enough that your argument be correct, it has to be easy to follow.
写论文不是写说明书,故事要编圆整,有解释容易读才能环环相扣让人直呼确实。
After you have written a paper, delete the first paragraph or the first few sentences. You'll probably find that they were content-free generalities, and that a much better introductory sentence can be found at the end of the first paragraph of the beginning of the second.
文章开头一般都还没找到感觉,会说一些废话。所以每一次写作需要整块的时间来找到感觉。
Once you start working on a research project, it's a good idea to get into the habit of writing an informal paper explaining what you are up to and what you've learned every few months.
Take two days to write it.
这是一个以后要遵守的好习惯。这样做的话,首先能够锻炼自己的写作能力,同时写作的过程中也能够梳理当前的思路,最后这些非正式的论文也可以作为之后论文的重要参考。
Be sure as a matter of courtesy to to run the paper through a spelling corrector before asking for comments.
对于Latex文件的话,仅仅改写应该不够,之后应该扔到word里面检查语法错误。
Papers get rejected-don't get dejected.
正如胜败乃兵家常事,论文被拒也不要气馁,继续冲!
The result is that you are still working on a paper years after you thought you were through with it and after the whole topic has become utterly boring.
虽然没经历过,但是投稿确实是一个漫长的过程,如果本身不喜欢这个主题,估计就摆烂了。可能这也是老板不主动提具体方向,而是让自己探索的原因吧。
Typically the resemblance is actually only superficial, so show the paper to some wise person who knows your work and ask them what they think.
写论文的时候遇到了相似文章怎么办?不要恐慌!完全一致的概率是很小的,一般都是表面重复。这个时候只需要静下心来找更本质的区别。
论文
The essential requirement of a Master's thesis is that it literally demonstrate mastery: that you have fully understood the state of the art in your subfield and that you are capable of operating at that level.
PhD theses are required to extend the state of the art.
其实对于硕士而言,关键的任务在于掌握而不是创新。如果能了解到行业尖端,同时具备操作这个行业尖端的能力,就可以了。PhD阶段会进一步的提升。
Your thesis can't achieve your vision, but it can point the way.
当然愿景是很宏大的,不过对于论文而言,先说明一些具体问题已经不错了。
"what's the thesis of your thesis?" What are you trying to show? You should have one-sentence, one-paragraph, and five-minute answers.
这也是很多“飘了”的博士论文外审被挂的原因。因为塞了太多东西反而找不到主旨了。要能够提炼出论文主旨,同时不同级别的提炼精度都要心里有数。