TUP Masters系列第四期 搜索与云计算首席科学家Raghu Ramakrishnan:深入云计算实战 现场QA实录

Q:So what is the performance if you have lots of updates, very intensive data mining queries that would run a long time?

A:If you look at the workloads of yahoo i have mentioned, data mining kinds of queries would not be on PNUTS.What you would do is analyze them on Hadoop, data in hadoop can be moved into sherpa or from sherpa into hadoop effieciently, and you would do that and anlyzed data that way. In PNUTS it supports insert delete modify of single records. Time-line consistensy is important. The main factor is avalibility instead of performance.


Q:Why Cata use two clusters, one cluster for hadoop, on cluster for indexing share?
A:I'm not famliar of Cata. So I could not answer this question.

Q:I want to ask a general question. What is the impact of cloud computing on databases?
A:If you look at some of the things we talked about today, hopefully answer to your question. So Cloud computing essentially means you are building a system that you can multitenant. And the tenants maybe users or developers are using the cloud. They can ask for more capacity at any time and they could get it instantly.
And this is your goal: You need to be able to add capacity to your systems and your systems could automatically distribute them all.
If you want to that, that requires many kinds of protocols i have talked about. You need high-availability, and that means many kinds of failures in very large distributed system, how could they be masked. That require some of the protocols i have talked about.
Think about this way: a conventional database system that is very large, it has a dozen notes. To date a failure of a replication. At this scale, it requires a different think.


Q:One questions for the database serving in cloud. I wonder that hadoop only store the data. I mean do you store any like stored procedures on database.
A:The bottomline is at today, we don't provide any special support for this. Some developers had tried that. As you know, we could store a code block and we could execute it some way. We may do something to support that in the future.
Other things that people had asked for, is be able to date make sure arounds in the location that as same as it established. That is a feature we don't support yet.
So things like that we know we need to do, but we don't get it.


Q:You mentioned hadoop and PNUTS, do you think traditional database management systems have any chance in cloud computing environment?
A:There are many kinds of application scenarios. The scenario I had talked about today is Yahoo scenario. Take yahoo login system as an example. It is a logging database that supporting 640 million users. Support something like this large-distributed system requires kinds of things i have talked about today.
Let me give you a very very different example of cloud: There is a company called ADP(Associated Data Processing). They provide database services for other companies. For them, what they actually doing is managing and running hundreds thousands of small databases. Any of these databases may run on a single box. They want transactions, ACID, for-SQL, but they don't need anything more than simple mysql style asynchrounous application.  The design of cloud system for that would be completely different.
It would still contains some features i have talked about. In that availibility is important, in that multitenant is important, in that elasticity is important. But they need a very different design. They will try to run lots of copies of small traditional relational databases on a farm of thousand servers, and be able to move this to one farm to another but the unit is one entire database.
Many of commercial vendors are thinking about how to cloudify their stuff. If you take Microsoft Azure, it is a way of adapting SQL server to support simple database deployments on the cloud.
So the short answer is yes, it is a long story.

我在这里举另外一个很不同的云应用场景为例:有一家公司叫ADP(Associated Data Processing),它们为其它一些小公司提供数据服务。ADP所做的工作就是管理长千上万个小规模数据库并维护使其正常运转,这些数据库可能在同一台机器上,也可能不是。这些数据库的用户的需求很简单:事务,ACID,SQL操作,最多也就是用MYSQL风格的异步数据访问程序。设计这样的云系统于Yahoo的登录系统截然不同。
此外,一些大型软件提供商也在试图把他们的产品"云化"(Cloudify),就拿微软的Azure为例,实际上它就是一个支持在云中进行简单数据库部署的SQL Server。

Q:I have two questions.
First is I want to know does PNUTS support multi-key query?
And the second is, have you heard of Greenplum, how do you say about the difference between Hadoop and Greenplum?
A:When you say multi-key query, it means I give you several keys and aske you to give me all the matched records?
A:So that is the answer of the first question.(Laugh).
The second question, greenplum fundamentally is a OLAP system——Traditional relational OLAP system, but likely it has also started supporting implementations of mapreduce. Because it's popular and it got customers asking for it. implementation of mapreduce with some OLAP capabilities.
Easy way to summarize. Hadoop is a particular implementation of mapreduce. Greenplum is another implementation of mapreduce, and it also have traditional OLAP capabilities.

对第二个问题,从本质上来说,Greenplum是一个OLAP(Online Analytical Processing)系统——一个传统的关系OLAP系统,当然它也开始支持实现MapReduce。一方面是因为MapReduce越来越流行,另外一方面来自客户的要求。总的来说,Greenplum以一个带有OLAP能力的MapReduce实现。

Q:Imagine you want some information about Madonna. You mentioned that the user type Madonna in the search box. Send more requests before the final results presents to the user? Different component
A:The question I understood it is someone say: tell me all about Madonna. What are the different steps that the request flow through before the user see the results?
A:Ok, the story has to begin before the user issued the question. If I really want to give you everything to know about Madonna, I have to anticipate that you are interested in celebrities. And therefore, I have to be able to get all the relevant information about celebrities from different feeds, from people who maintain feeds on video or movies, from calling the web to do the information extraction. I have to get the data, integrate them, and create the relevant tables in the web of concepts. All of this happens before you type the query.
Once you type the query, it go through the steps: that analyzing your query, enlight similar query that other users use, to make sure we understand what you are really looking for, are you looking for Madonna the actress, Madonna the musician, Madonna the mother of Christ. Ensure your major intent.Then invokes a call to the results of your previous aggregation. This maybe a system like .Which has necessary data.
By the way, in order to interpret your query Madonna. I probably aslo has the profile data about you. And every time you do something, I am updating your profile. That profile data are very likely stored in PNUTS or other system. And we look it in hadoop to interpret what you really want.
And the first pass of gathering your profile data, and create semantic aggregation for Madonna, all of that used Hadoop.

Q:您在演讲中提到了下一代的搜索(Next-Gen search),在这里我想问一下,假设您打算搜索与Madonna的信息,您在搜索框中输入Madonna,点击搜索,然后Yahoo返回搜索结果(这些结果很可能是因人而异的),在这个过程中都涉及到了那些操作呢?

Q:OK, I want to ask the last question: Because you are very successful in research and academic field. And now moved into industrial field. I think today there are a lot of students from the IT majors at the conference. I want to ask: what is the most important capacities to prepare for their future careers. To be successful in research or in industrial field.
A:I think the most important capacity is to find good people to work with.
And, when I was a student I was fortunate to find a good teacher, and as a teacher I was fortunate to find a good student. And in industrial I was fortunate to find good colleagues.
I have to mention that Yahoo has opened the Beijing lab recently. They are doing some work like web of concepts I have mentioned. You could join us and work with us.
And the short answer is you learn from good people, you work with good people, and you will be successful.

总而言之,三人行,必有我师焉,择其善者而从之,其不善者而改之。(You learn from good people, you work with good people, and you will be successful)

posted @ 2011-04-06 21:50  _Luc_  阅读(1237)  评论(0编辑  收藏  举报