文/张巡
在说Hadoop之前,作为一个铁杆粉丝先粉一下Google。Google的伟大之处不仅在于它建立了一个强悍的搜索引擎,它还创造了几项革命性的技术:GFS,MapReduce,BigTable,即所谓的Google三驾马车。Google虽然没有公布这几项技术的实现代码,但它发表了详细的设计论文,这给业界带来了新鲜气息,很快就出现了类似于Google三驾马车的开源实现,Hadoop就是其中的一个。
关于MapReduce
Hadoop说起来很简单,一个存储系统(HDFS),一个计算系统(MapReduce)。仅此而已。模型虽然简单,但我觉得它的精妙之处也就在这里。目前,通过提高CPU主频来提升计算性能的时代已经结束了,因此并行计算、分布式计算在业界发展了起来,但是这也往往意味着复杂的设计与实现,如果能找到一种方法,只需要写简单的程序就能实现分布式计算,那就太好了。
我们可以回想下当初做的课堂作业,它可能是一个处理数据的程序,没有多线程,没有进程间通信,数据输入都是来自键盘(stdin),处理完数据之后打印到屏幕(stdout),这时的程序非常简单。后来我们学习了多线程、内存管理、设计模式,写出的程序越来越大,也越来越复杂。但是当学习使用Hadoop时,我们发现又回到了最初,尽管底层是一个巨大的集群,可是我们操作文件时就像在本地一样简单,写MapReduce程序时也只需要在单线程里实现数据处理。
Hadoop就是这样一个东西,简单的文件系统(HDFS),简单的计算模型(MapReduce)。这其中,我觉得HDFS是很自然的东西,如果我们自己实现一个,也很可能是这样的。但是仔细琢磨下MapReduce,会发现它似乎是个新奇事物,不像HDFS的界面那样通俗易懂老少皆宜。MapReduce虽然模型简单,却是一个新的、不广为人所知的概念。比如说,如果说要简化计算,为何要做成Map和Reduce两个阶段,而不是一个函数足矣呢?另外,它也不符合我们耳熟能详的那些设计模式。一句话,MapReduce实在太另类了。
Hadoop的思想来源于Google的几篇论文,Google的那篇MapReduce论文里说:Our abstraction is inspired by the map and reduce primitives present in Lisp and many other functional languages。这句话提到了MapReduce思想的渊源,大致意思是,MapReduce的灵感来源于函数式语言(比如Lisp)中的内置函数map和reduce。函数式语言也算是阳春白雪了,离我们普通开发者总是很远。简单来说,在函数式语言里,map表示对一个列表(List)中的每个元素做计算,reduce表示对一个列表中的每个元素做迭代计算。它们具体的计算是通过传入的函数来实现的,map和reduce提供的是计算的框架。不过从这样的解释到现实中的MapReduce还太远,仍然需要一个跳跃。再仔细看,reduce既然能做迭代计算,那就表示列表中的元素是相关的,比如我想对列表中的所有元素做相加求和,那么列表中至少都应该是数值吧。而map是对列表中每个元素做单独处理的,这表示列表中可以是杂乱无章的数据。这样看来,就有点联系了。在MapReduce里,Map处理的是原始数据,自然是杂乱无章的,每条数据之间互相没有关系;到了Reduce阶段,数据是以key后面跟着若干个value来组织的,这些value有相关性,至少它们都在一个key下面,于是就符合函数式语言里map和reduce的基本思想了。
这样我们就可以把MapReduce理解为,把一堆杂乱无章的数据按照某种特征归纳起来,然后处理并得到最后的结果。Map面对的是杂乱无章的互不相关的数据,它解析每个数据,从中提取出key和value,也就是提取了数据的特征。经过MapReduce的Shuffle阶段之后,在Reduce阶段看到的都是已经归纳好的数据了,在此基础上我们可以做进一步的处理以便得到结果。这就回到了最初,终于知道MapReduce为何要这样设计。
Hadoop, More and More
Hadoop/MapReduce的Job是一个昙花一现的东西,它仅仅是一个计算过程而已,所有数据都是从HDFS来,到HDFS去。当最后一个Reduce完成之后,整个Job就消失了,只在历史日志中还保存着它曾经存在的证据。
Hadoop中的节点是对等的,因此每个节点都是可替代的。也因此,JobTracker可以把任务分配到任何一个节点执行,而不影响最后的产出结果。如果不是这样,就破坏了Hadoop的对等性。对等性可以说是Hadoop扩展能力、容错能力的基础,它意味着计算不再局限于单个节点的能力。当然事情总有两面性,对等性造成的结果是,如果我们想让某些节点做特殊的事情,会发现很困难。
就像很多分布式系统中的文件、对象一样,Hadoop对数据的操作是原子性的,一个Job跑完之后,最终的数据是在一瞬间呈现的,如果Job失败了,则什么都没有,连部分数据也得不到。由于Job中的task都是并行的,因此这里适用于木桶理论,最后一个完成的那个task决定了整个Job完成的时刻。所以如果Hadoop发现某些Task特别慢,会在其它节点运行这些Task的副本,当其中某个副本完成后,Hadoop将Kill掉其余的副本,这也是基于对等性的一个应用,使得那些慢的节点不会拖慢Job。如果某个task在重试若干次之后仍然失败了,那么整个Job宣告失败。
Hadoop包括HDFS、MapReduce,此外还有Hbase、Hive等,普通的应用大概是存储海量的数据,并进行批量的处理、数据挖掘等,不过也有些比较巧妙的实际应用,比如用Hadoop搭建HTTP下载服务器,或FTP服务器等,还有人用Hadoop搭建视频点播服务器。我觉得Hadoop对这些应用的最大价值是可以降低硬件成本,以前也许需要购买昂贵的硬件,现在购买廉价的服务器就可以实现。不过,做这些应用都需要对Hadoop做定制,修改代码啥的,似乎还没有整体的解决方案。
Hadoop是Java写的。可能有不少人会想,如果Hadoop是用C或C++写的,会不会更快些?关于这个问题网上也有不少讨论,简单地说就是,不会更快。而且我觉得需要强调的一点是,当面对Hadoop要面对的问题域时,编程语言不是首先要考虑的问题,或者说,依赖于具体的实现方案。Hadoop要处理的是大批量数据,它强调的是吞吐量,当整个集群跑起来之后,系统的瓶颈往往是在磁盘IO和网络IO,这种情况下,用C/C++实现Hadoop不见得比Java实现性能更高。相反,由于Java语言的严谨、统一和一些高级特性,用Java实现Hadoop对开发者而言会更容易。一般来讲,无论是用C++写还是用Java写,当系统发展较成熟时,都已经经过了相当的优化,这时系统的瓶颈往往不在于软件了,而在于硬件的限制。当然,这并非意味着目前Hadoop已经实现得很好了,Hadoop还有很大的优化空间,它的开发进程依然火爆,很多人在优化Hadoop,还包括用C++来改写Hadoop的部分代码的。另外,对于超大的集群(比如几千台服务器),调度器的优化也很关键,否则它就可能成为整个集群的瓶颈。我想这就是Hadoop虽然已经广泛应用,但是版本号依然还是零点几的原因吧。
虽说Hadoop的模型很简单,但是开发Hadoop的Job并不容易,在业务逻辑与HadoopAPI之间,我们还必须写大量的胶水代码,这无异于在WindowsSDK基础上写一个画图板程序,或者在Linux系统调用基础上写一个支持多用户在线的聊天服务器,这些似乎都不难,但是比较繁琐,需要堆砌大量代码,写完一个之后,你大概不会再写第二个。我把这种情况称为用汇编语言写程序,也就是说,你面对的是业务问题,却必须不遗巨细与机器打交道。
面对上述困境,人们想出了很多解决办法,开发出高级语言来屏蔽底层的细节,让开发过程更加简单,我把这种情况称为用脚本写程序,可以很快的修改&调试。其中Facebook开发了Hive,这是类似于SQL的脚本语言,开发者基本上只要面对自己的业务数据,就能写出Hive脚本,虽然有一定的入门门槛,但是比起用HadoopAPI写程序,可以称得上是巨简单了。另外Yahoo开发了PIG-latin,也是类SQL的脚本语言,Hive和PIG都有很广泛的应用
- Product Marketing Manager
- Product Manager
- Software Engineer
- Software Engineer in Test
- Quantitative Compensation Analyst
- Engineering Manager
- AdWords Associate
这篇Blog例举了Google用来面试下面这几个职位的面试题。很多不是很容易回答,不过都比较经典与变态,是Google,Microsoft,Amazon之类的公司的风格。对于本文,我没有翻译,因为我相信,英文问题是最好的。不过对于有些问题,我做了一些注释,不一定对,但希望对你有帮助启发。对于一些问题,如果你百思不得其解,可以Google一下,StackOverflow或是Wikipedia上可能会给你非常全面的答案。
- Why do you want to join Google?
- What do you know about Google’s product and technology?
- If you are Product Manager for Google’s Adwords, how do you plan to market this?
- What would you say during an AdWords or AdSense product seminar?
- Who are Google’s competitors, and how does Google compete with them?
- Have you ever used Google’s products? Gmail?
- What’s a creative way of marketing Google’s brand name and product?
- If you are the product marketing manager for Google’s Gmail product, how do you plan to market it so as to achieve 100 million customers in 6 months?
- How much money you think Google makes daily from Gmail ads?
- Name a piece of technology you’ve read about recently. Now tell me your own creative execution for an ad for that product.
- Say an advertiser makes $0.10 every time someone clicks on their ad. Only 20% of people who visit the site click on their ad. How many people need to visit the site for the advertiser to make $20?
- Estimate the number of students who are college seniors, attend four-year schools, and graduate with a job in the United States every year.
- How would you boost the GMail subscription base?
- What is the most efficient way to sort a million integers? (陈皓:merge sort)
- How would you re-position Google’s offerings to counteract competitive threats from Microsoft?
- How many golf balls can fit in a school bus? (陈皓:这种题一般来说是考你的解题思路的,注意,你不能单纯地把高尔夫球当成一个小立方体,其是一个圆球,堆起来的时候应该是错开的——也就是三个相邻的球的圆心是个等边三角形)
- You are shrunk to the height of a nickel and your mass is proportionally reduced so as to maintain your original density. You are then thrown into an empty glass blender. The blades will start moving in 60 seconds. What do you do?
- How much should you charge to wash all the windows in Seattle?
- How would you find out if a machine’s stack grows up or down in memory?
- Explain a database in three sentences to your eight-year-old nephew. (陈皓:用三句话向8岁的侄子解释什么是数据库,考你的表达能力了)
- How many times a day does a clock’s hands overlap?(陈皓:经典的时钟问题)
- You have to get from point A to point B. You don’t know if you can get there. What would you do?
- Imagine you have a closet full of shirts. It’s very hard to find a shirt. So what can you do to organize your shirts for easy retrieval? (陈皓:很不错的一道题,不要以为分类查询很容易,想想图书馆图书的分类查询问题吧。另外,你处想想如何在你在你的衣柜里实现一个相当于Hash表或是一个Tree之类的数据结构)
- Every man in a village of 100 married couples has cheated on his wife. Every wife in the village instantly knows when a man other than her husband has cheated, but does not know when her own husband has. The village has a law that does not allow for adultery. Any wife who can prove that her husband is unfaithful must kill him that very day. The women of the village would never disobey this law. One day, the queen of the village visits and announces that at least one husband has been unfaithful. What happens? (陈皓:这个问题很有限制级,哈哈,非常搞的一个问题,注意wife们的递归,这类的问题是经典的分布式通讯问题,上网搜 一搜吧。)
- In a country in which people only want boys, every family continues to have children until they have a boy. If they have a girl, they have another child. If they have a boy, they stop. What is the proportion of boys to girls in the country?(陈皓:第一反应是——这个国家是中国。一个概率问题,其实,无论你怎么生,50%的概率是永远不变的。)
- If the probability of observing a car in 30 minutes on a highway is 0.95, what is the probability of observing a car in 10 minutes (assuming constant default probability)?
- If you look at a clock and the time is 3:15, what is the angle between the hour and the minute hands? (The answer to this is not zero!)
- Four people need to cross a rickety rope bridge to get back to their camp at night. Unfortunately, they only have one flashlight and it only has enough light left for seventeen minutes. The bridge is too dangerous to cross without a flashlight, and it’s only strong enough to support two people at any given time. Each of the campers walks at a different speed. One can cross the bridge in 1 minute, another in 2 minutes, the third in 5 minutes, and the slow poke takes 10 minutes to cross. How do the campers make it across in 17 minutes?(陈皓:经典的过桥问题)
- You are at a party with a friend and 10 people are present including you and the friend. your friend makes you a wager that for every person you find that has the same birthday as you, you get $1; for every person he finds that does not have the same birthday as you, he gets $2. would you accept the wager?
- How many piano tuners are there in the entire world?
- You have eight balls all of the same size. 7 of them weigh the same, and one of them weighs slightly more. How can you find the ball that is heavier by using a balance and only two weighings?(陈皓:经典的称重问题。这样的问题花样很多,不过都不难回答)
- You have five pirates, ranked from 5 to 1 in descending order. The top pirate has the right to propose how 100 gold coins should be divided among them. But the others get to vote on his plan, and if fewer than half agree with him, he gets killed. How should he allocate the gold in order to maximize his share but live to enjoy it? (Hint: One pirate ends up with 98 percent of the gold.)
- You are given 2 eggs. You have access to a 100-story building. Eggs can be very hard or very fragile means it may break if dropped from the first floor or may not even break if dropped from 100th floor. Both eggs are identical. You need to figure out the highest floor of a 100-story building an egg can be dropped without breaking. The question is how many drops you need to make. You are allowed to break 2 eggs in the process. (陈皓:Binary Search,二分查)
- Describe a technical problem you had and how you solved it.
- How would you design a simple search engine?
- Design an evacuation plan for San Francisco.
- There’s a latency problem in South Africa. Diagnose it. (陈皓:这个问题完全是在考你的解决问题的能力。没有明确的答案。不过,解决性能问题的第一步通常是找出瓶颈,找瓶颈有很多种方法,工具,二分查,时间记录等等。)
- What are three long term challenges facing Google?
- Name three non-Google websites that you visit often and like. What do you like about the user interface and design? Choose one of the three sites and comment on what new feature or project you would work on. How would you design it?
- If there is only one elevator in the building, how would you change the design? How about if there are only two elevators in the building? (陈皓:经典的电梯设计问题,这种问题千变万化,主要是考你的设计能力和需求变化的适变能力,与此相似的是酒店订房系统。)
- How many vacuum’s are made per year in USA?
- Why are manhole covers round? (陈皓:为什么下水井盖是圆的?这是有N种答案的,上Wiki看看吧)
- What is the difference between a mutex and a semaphore? Which one would you use to protect access to an increment operation?
- A man pushed his car to a hotel and lost his fortune. What happened? (陈皓:脑筋急转弯?他在玩大富翁游戏?!!)
- Explain the significance of “dead beef”.(陈皓:要是你看到的是16进制 DEAD BEEF,你会觉得这是什么?IPv6的地址?)
- Write a C program which measures the the speed of a context switch on a UNIX/Linux system.
- Given a function which produces a random integer in the range 1 to 5, write a function which produces a random integer in the range 1 to 7.(陈皓:上StackOverflow看看吧,经典的问题)
- Describe the algorithm for a depth-first graph traversal.
- Design a class library for writing card games. (陈皓:用一系列的类来设计一个扑克游戏,设计题)
- You need to check that your friend, Bob, has your correct phone number, but you cannot ask him directly. You must write a the question on a card which and give it to Eve who will take the card to Bob and return the answer to you. What must you write on the card, besides the question, to ensure Bob can encode the message so that Eve cannot read your phone number?(陈皓:协议+数字加密,我试想了一个,纸条上可以这样写,“Bob,请把我的手机号以MD5算法加密后的字符串,比对下面的字符串——XXXXXX,它们是一样的吗?”)
- How are cookies passed in the HTTP protocol?
- Design the SQL database tables for a car rental database.
- Write a regular expression which matches a email address. (陈皓:上StackOverflow查相当的问题吧。)
- Write a function f(a, b) which takes two character string arguments and returns a string containing only the characters found in both strings in the order of a. Write a version which is order N-squared and one which is order N.(陈皓:算法题,不难,不说了。一个O(n^2)和一个O(n)的算法复杂度)
- You are given a the source to a application which is crashing when run. After running it 10 times in a debugger, you find it never crashes in the same place. The application is single threaded, and uses only the C standard library. What programming errors could be causing this crash? How would you test each one? (陈皓:和随机数有关系?或是时间?)
- Explain how congestion control works in the TCP protocol.
- In Java, what is the difference between final, finally, and finalize?
- What is multithreaded programming? What is a deadlock?
- Write a function (with helper functions if needed) called to Excel that takes an excel column value (A,B,C,D…AA,AB,AC,… AAA..) and returns a corresponding integer value (A=1,B=2,… AA=26..).
- You have a stream of infinite queries (ie: real time Google search queries that people are entering). Describe how you would go about finding a good estimate of 1000 samples from this never ending set of data and then write code for it.
- Tree search algorithms. Write BFS and DFS code, explain run time and space requirements. Modify the code to handle trees with weighted edges and loops with BFS and DFS, make the code print out path to goal state.
- You are given a list of numbers. When you reach the end of the list you will come back to the beginning of the list (a circular list). Write the most efficient algorithm to find the minimum # in this list. Find any given # in the list. The numbers in the list are always increasing but you don’t know where the circular list begins, ie: 38, 40, 55, 89, 6, 13, 20, 23, 36. (陈皓:循环排序数组的二分查找问题)
- Describe the data structure that is used to manage memory. (stack)
- What’s the difference between local and global variables?
- If you have 1 million integers, how would you sort them efficiently? (modify a specific sorting algorithm to solve this)
- In Java, what is the difference between static, final, and const. (if you don’t know Java they will ask something similar for C or C++).
- Talk about your class projects or work projects (pick something easy)… then describe how you could make them more efficient (in terms of algorithms).
- Suppose you have an NxN matrix of positive and negative integers. Write some code that finds the sub-matrix with the maximum sum of its elements.(陈皓:以前见过一维数组的这个问题,现在是二维的。感觉应该是把二维的第一行的最大和的区间算出来,然后再在这个基础之上进行二维的分析。思路应该是这个,不过具体的算法还需要想一想)
- Write some code to reverse a string.
- Implement division (without using the divide operator, obviously).(陈皓:想一想手算除法的过程。)
- Write some code to find all permutations of the letters in a particular string.
- What method would you use to look up a word in a dictionary? (陈皓:使用排序,哈希,树等算法和数据结构)
- Imagine you have a closet full of shirts. It’s very hard to find a shirt. So what can you do to organize your shirts for easy retrieval?
- You have eight balls all of the same size. 7 of them weigh the same, and one of them weighs slightly more. How can you fine the ball that is heavier by using a balance and only two weighings?
- What is the C-language command for opening a connection with a foreign host over the internet?
- Design and describe a system/application that will most efficiently produce a report of the top 1 million Google search requests. These are the particulars: 1) You are given 12 servers to work with. They are all dual-processor machines with 4Gb of RAM, 4x400GB hard drives and networked together.(Basically, nothing more than high-end PC’s) 2) The log data has already been cleaned for you. It consists of 100 Billion log lines, broken down into 12 320 GB files of 40-byte search terms per line. 3) You can use only custom written applications or available free open-source software.
- There is an array A[N] of N numbers. You have to compose an array Output[N] such that Output[i] will be equal to multiplication of all the elements of A[N] except A[i]. For example Output[0] will be multiplication of A[1] to A[N-1] and Output[1] will be multiplication of A[0] and from A[2] to A[N-1]. Solve it without division operator and in O(n).(陈皓:注意其不能使用除法。算法思路是这样的,把output[i]=a[i]左边的乘积 x a[i]右边的乘积,所以,我们可以分两个循环,第一次先把A[i]左边的乘积放在Output[i]中,第二次把A[i]右边的乘积算出来。我们先看第一次的循环,使用迭代累积的方式,代码如下:for(r=1; i=0; i<n-1; i++){ Output[i]=r; r*=a[i]; },看明白了吧。第二次的循环我就不说了,方法一样的。)
- There is a linked list of numbers of length N. N is very large and you don’t know N. You have to write a function that will return k random numbers from the list. Numbers should be completely random. Hint: 1. Use random function rand() (returns a number between 0 and 1) and irand() (return either 0 or 1) 2. It should be done in O(n).(陈皓:本题其实不难。在遍历链表的同时一边生成随机数,一边记录最大的K个随机数和其链接地址。)
- Find or determine non existence of a number in a sorted list of N numbers where the numbers range over M, M>> N and N large enough to span multiple disks. Algorithm to beat O(log n) bonus points for constant time algorithm.(陈皓:使用bitmap,如果一个长整形有64位,那么我们可以使用M/64个bitmap)
- You are given a game of Tic Tac Toe. You have to write a function in which you pass the whole game and name of a player. The function will return whether the player has won the game or not. First you to decide which data structure you will use for the game. You need to tell the algorithm first and then need to write the code. Note: Some position may be blank in the game। So your data structure should consider this condition also.
- You are given an array [a1 To an] and we have to construct another array [b1 To bn] where bi = a1*a2*…*an/ai. you are allowed to use only constant space and the time complexity is O(n). No divisions are allowed.(陈皓:前面说过了)
- How do you put a Binary Search Tree in an array in a efficient manner. Hint :: If the node is stored at the ith position and its children are at 2i and 2i+1(I mean level order wise)Its not the most efficient way.(陈皓:按顺序遍历树)
- How do you find out the fifth maximum element in an Binary Search Tree in efficient manner. Note: You should not use use any extra space. i.e sorting Binary Search Tree and storing the results in an array and listing out the fifth element.
- Given a Data Structure having first n integers and next n chars. A = i1 i2 i3 … iN c1 c2 c3 … cN.Write an in-place algorithm to rearrange the elements of the array ass A = i1 c1 i2 c2 … in cn(陈皓:这个算法其实就是从中间开始交换元素,代码:for(i=n-1; i>1; i++) { for(j=i; j<2*n-i; j+=2) { swap(a[j], a[j+1]); } },不好意思写在同一行上了。)
- Given two sequences of items, find the items whose absolute number increases or decreases the most when comparing one sequence with the other by reading the sequence only once.
- Given That One of the strings is very very long , and the other one could be of various sizes. Windowing will result in O(N+M) solution but could it be better? May be NlogM or even better?
- How many lines can be drawn in a 2D plane such that they are equidistant from 3 non-collinear points?
- Let’s say you have to construct Google maps from scratch and guide a person standing on Gateway of India (Mumbai) to India Gate(Delhi). How do you do the same?
- Given that you have one string of length N and M small strings of length L. How do you efficiently find the occurrence of each small string in the larger one?
- Given a binary tree, programmatically you need to prove it is a binary search tree.
- You are given a small sorted list of numbers, and a very very long sorted list of numbers – so long that it had to be put on a disk in different blocks. How would you find those short list numbers in the bigger one?
- Suppose you have given N companies, and we want to eventually merge them into one big company. How many ways are theres to merge?
- Given a file of 4 billion 32-bit integers, how to find one that appears at least twice? (陈皓:我能想到的是拆分成若干个小数组,排序,然后一点点归并起来)
- Write a program for displaying the ten most frequent words in a file such that your program should be efficient in all complexity measures.(陈皓:你可能需要看看这篇文章Finding Frequent Items in Data Streams)
- Design a stack. We want to push, pop, and also, retrieve the minimum element in constant time.
- Given a set of coin denominators, find the minimum number of coins to give a certain amount of change.(陈皓:你应该查看一下这篇文章:Coin Change Problem)
- Given an array, i) find the longest continuous increasing subsequence. ii) find the longest increasing subsequence.(陈皓:这个题不难,O(n)算法是边遍历边记录当前最大的连续的长度。)
- Suppose we have N companies, and we want to eventually merge them into one big company. How many ways are there to merge?
- Write a function to find the middle node of a single link list. (陈皓:我能想到的算法是——设置两个指针p1和p2,每一次,p1走两步,p2走一步,这样,当p1走到最后时,p2就在中间)
- Given two binary trees, write a compare function to check if they are equal or not. Being equal means that they have the same value and same structure.(陈皓:这个很简单,使用递归算法。)
- Implement put/get methods of a fixed size cache with LRU replacement algorithm.
- You are given with three sorted arrays ( in ascending order), you are required to find a triplet ( one element from each array) such that distance is minimum. Distance is defined like this : If a[i], b[j] and c[k] are three elements then distance=max(abs(a[i]-b[j]),abs(a[i]-c[k]),abs(b[j]-c[k]))” Please give a solution in O(n) time complexity(陈皓:三个指针,a, b, c分别指向三个数组头,假设:a[0]<b[0]<c[0],推进a直到a[i]>b[0],计算 abs(a[i-1] – c[0]),把结果保存在min中。现在情况变成找 a[i], b[0],c[0],重复上述过程,如果有一个新的值比min要小,那就取代现有的min。)
- How does C++ deal with constructors and deconstructors of a class and its child class?
- Write a function that flips the bits inside a byte (either in C++ or Java). Write an algorithm that take a list of n words, and an integer m, and retrieves the mth most frequent word in that list.
- What’s 2 to the power of 64?
- Given that you have one string of length N and M small strings of length L. How do you efficiently find the occurrence of each small string in the larger one? (陈皓:我能想到的是——把那M个小字串排个序,然后遍历大字串,并在那M个字串中以二分取中的方式查找。)
- How do you find out the fifth maximum element in an Binary Search Tree in efficient manner.
- Suppose we have N companies, and we want to eventually merge them into one big company. How many ways are there to merge?
- There is linked list of millions of node and you do not know the length of it. Write a function which will return a random number from the list.
- You need to check that your friend, Bob, has your correct phone number, but you cannot ask him directly. You must write a the question on a card which and give it to Eve who will take the card to Bob and return the answer to you. What must you write on the card, besides the question, to ensure Bob can encode the message so that Eve cannot read your phone number?
- How long it would take to sort 1 trillion numbers? Come up with a good estimate.
- Order the functions in order of their asymptotic performance: 1) 2^n 2) n^100 3) n! 4) n^n
- There are some data represented by(x,y,z). Now we want to find the Kth least data. We say (x1, y1, z1) > (x2, y2, z2) when value(x1, y1, z1) > value(x2, y2, z2) where value(x,y,z) = (2^x)*(3^y)*(5^z). Now we can not get it by calculating value(x,y,z) or through other indirect calculations as lg(value(x,y,z)). How to solve it?
- How many degrees are there in the angle between the hour and minute hands of a clock when the time is a quarter past three?
- Given an array whose elements are sorted, return the index of a the first occurrence of a specific integer. Do this in sub-linear time. I.e. do not just go through each element searching for that element.
- Given two linked lists, return the intersection of the two lists: i.e. return a list containing only the elements that occur in both of the input lists. (陈皓:把第一个链表存入hash表,然后遍历第二个链表。不知道还没有更好的方法。)
- What’s the difference between a hashtable and a hashmap?
- If a person dials a sequence of numbers on the telephone, what possible words/strings can be formed from the letters associated with those numbers?(陈皓:这个问题和美国的电话有关系,大家可以试着想一下我们发短信的手机,按数字键出字母,一个组合的数学问题。)
- How would you reverse the image on an n by n matrix where each pixel is represented by a bit?
- Create a fast cached storage mechanism that, given a limitation on the amount of cache memory, will ensure that only the least recently used items are discarded when the cache memory is reached when inserting a new item. It supports 2 functions: String get(T t) and void put(String k, T t).
- Create a cost model that allows Google to make purchasing decisions on to compare the cost of purchasing more RAM memory for their servers vs. buying more disk space.
- Design an algorithm to play a game of Frogger and then code the solution. The object of the game is to direct a frog to avoid cars while crossing a busy road. You may represent a road lane via an array. Generalize the solution for an N-lane road.
- What sort would you use if you had a large data set on disk and a small amount of ram to work with?
- What sort would you use if you required tight max time bounds and wanted highly regular performance.
- How would you store 1 million phone numbers?(陈皓:试想电话是有区段的,可以把区段统一保存,Flyweight设计模式)
- Design a 2D dungeon crawling game. It must allow for various items in the maze – walls, objects, and computer-controlled characters. (The focus was on the class structures, and how to optimize the experience for the user as s/he travels through the dungeon.)
- What is the size of the C structure below on a 32-bit system? On a 64-bit? (陈皓:注意编译器的对齐)
struct foo {
- Efficiently implement 3 stacks in a single array.
- Given an array of integers which is circularly sorted, how do you find a given integer.
- Write a program to find depth of binary search tree without using recursion.
- Find the maximum rectangle (in terms of area) under a histogram in linear time.
- Most phones now have full keyboards. Before there there three letters mapped to a number button. Describe how you would go about implementing spelling and word suggestions as people type.
- Describe recursive mergesort and its runtime. Write an iterative version in C++/Java/Python.
- How would you determine if someone has won a game of tic-tac-toe on a board of any size?
- Given an array of numbers, replace each number with the product of all the numbers in the array except the number itself *without* using division.
- Create a cache with fast look up that only stores the N most recently accessed items.
- How to design a search engine? If each document contains a set of keywords, and is associated with a numeric attribute, how to build indices?
- Given two files that has list of words (one per line), write a program to show the intersection.
- What kind of data structure would you use to index annagrams of words? e.g. if there exists the word “top” in the database, the query for “pot” should list that.
- What is the yearly standard deviation of a stock given the monthly standard deviation?
- How many resumes does Google receive each year for software engineering?
- Anywhere in the world, where would you open up a new Google office and how would you figure out compensation for all the employees at this new office?
- What is the probability of breaking a stick into 3 pieces and forming a triangle?
- You’re the captain of a pirate ship, and your crew gets to vote on how the gold is divided up. If fewer than half of the pirates agree with you, you die. How do you recommend apportioning the gold in such a way that you get a good share of the booty, but still survive?
- How would you work with an advertiser who was not seeing the benefits of the AdWords relationship due to poor conversions?
- How would you deal with an angry or frustrated advertisers on the phone?