Core Model of Sql

Let's talk about SQL in two appects: Data Model and Computation Model.

Data Model

　　On one hand, each row in a table is actually an "Object". Columns of a row are fields of an object. Someone may ask: in a programming language we can define recursive data structures, how can SQL achieve it? Yes, it can, only a bit circuitous. For example, we have a TreeNode class:

class TreeNode{

TreeNode left_node;

TreeNode right_node;

int value;

}

We can define a similar structure in sql table(where left_node_id points to another row whose id is "left_node_id"):

create table TreeNode(

node_id int

left_node_id int,

right_node_id int,

value int

)

　On the other hand, a table is a "Hash Table". The key of this hash table is the primary key or the unique key, the value is the entire row.

Finally, above "object" and "Hash Table " views do not conflict, but can be unified. When a program runs, all the objects in its memory actually compose a big hashtable. The mapping is just: Object Name->Object data in memory. So, each row is an object; the entire table is a hash table.

Computation Model

sql computation is driven by 3 operations: Group, Filter,Join. All the other operations and functions are auxiliary.

Group(group by, distribute by)

　　What indeed does a group operation do ?

　　1. divide rows into partitions.

　　 all the rows in the same partition has the same key.

　　2. use a function to do transformation on the partition.

　　　 when it's an aggregate function(such as sum/count/avg/min/max/first/last/order by ......), then:

　　　　　　rows in a partition=> one row

　　　 when it's not an aggregate function(such as "distribute by" operation), then:

　　　　　　rows in a partition=> rows in a partition

Filter(select where)

　　Silly simple: rows => rows (via where statement)

Join: set operation

As is shown in "Data Model", a table is actually a Hash Table. So a table is also a Set. (Because a Set is really a special form of Hash Table, the elements of the Set, are keys of the Hash Table)

　　Join without any filter condition: SetA + SetB

　　Inner Join: SetA intersection SetB

　　Left/Right Outer Join: SetA - SetB

Hive and Map Reduce

　　The key-value nature(Hash Table) of sql table fits very well with MapReduce. That's why Hive was born. Further more, SQL is a procedural language more than a query language. I implemented Dijkstra algorithm using SQL, click here

posted on 2015-03-25 20:30 一生只想往前飞阅读(177) 评论(0) 编辑收藏举报

刷新页面返回顶部

变回善良那个我

Core Model of Sql

导航

公告