stand on the shoulders of giants

[译] Microsoft Data Development Technologies: Past, Present, and Future

http://msdn.microsoft.com/library/ee730343.aspx (Kraig Brockschmidt, Program Manager, Microsoft Corporation)

过去二十年,微软开发了很多data access solution. 你可能已经注意到Some technologies have been retired, but most of them are still active and more keep being developed.

首先我们要明白的是:Data 才是核心! 它outlive these all, hardware, design architecture, database technology, data access APIs, applications that are built on data access technologies.
Data>Database engine>Data access technology>application

The Foundations of Data Development: Win32 ("Native") Technologies

(1) 1989 SQL Server 1.0

  • there was a single programmatic or “call-level” API called DB-Library.
    Through its 150 functions, an MS-DOS or OS/2 console application or a Windows or OS/2 GUI application could party on data with the create-retrieve-update-delete operations we still know and love today
  • Also available was Embedded SQL for C (ESQL for C in its short form),
    a precompiler that allowed SQL statements directly in source code, a simple foreshadowing of what we’ll see later in LINQ. (If you’re interested, a basic piece of ESQL for C code can be found in the SQL Server 2000 documentation.)

History_Win32_1.png

Figure 1: Data access technologies circa 1990; Microsoft’s offerings were only those for SQL Server.

(2) 1992.9 ODBC

1992年9月, 微软发布了Open Database Connectivity specification or ODBC.
ODBC, 仍然被广泛应用, 是一个50个函数的call-level API,通过ODBC驱动与其他underlying databases or database-like stores通信. ODBC基于专用APIs,进行了数据访问抽象。 使得应用程序能够以一个统一的方式访问各种data sources(从oldest legacy dinosaur 到the latest cutting-edge technology) Database vendors,  can also supply a native (that is, no-middlemen) ODBC driver to achieve better performance (as Microsoft does with SQL Server).

Applications, of course, are still free to use proprietary APIs; ODBC simply provided a way for applications to isolate themselves from such APIs, as when those applications needed to operate against multiple databases from different vendors. 就是ODBC直接和stores相连,如下图中间。

History_Win32_2.png

Figure 2: Data access technologies in September 1992.

(3) 1996. RDO 和 OLE DB

几年之后, 面向对象技术 came into the mainstream, 因此需要新的面向对象数据访问层.

  • Remote Data Objects (RDO), a VB-compatible object layer on top of ODBC.
  • 1996年8月, 微软发布了更加通用的OLE DB technology to create an object-oriented access layer alongside ODBC, working again on a data store-specific provider model.
  • 可以看出OLE DB和ODBC是一层,只是OLE DB是面向对象的

OLE DB,仍然被广泛应用,对比ODBC,并没有提升抽象等级; 只是提供了适应某种开发模式的编程模型. 比ODBC强的就是支持更多数据源,尤其是those that could be represented as tabular data (such as spreadsheets and text files).

History_Win32_3.png

Figure 3: Data access technologies in August 1996.

(4) 1996.10 ADO (ActiveX Data Object)

随着Internet快速发展, web applications变得重要. OLE DB, 是面向pointer-capable languages like C and C++, 而scripting languages like VBScript and JavaScript不支持pointers, OLE DB变得落后. 需要新的技术支持web applications访问data stores.

1996年10月,微软发布ActiveX Data Object, or ADO, a higher-level object abstraction built on top of OLE DB, available to pointer-capable and pointerless programming languages alike.

History_Win32_4.png

Figure 4: Data Access in October 1996

There were then six different Microsoft APIs for data access depending on how an application was written and the kinds of data stores it wanted to access:

1996 Choices

SQL Server DBs

DBs w/ ODBC driver

DBs w/ OLE DB driver

Apps written in C/C++

DB-Library, ESQL/C
ODBC, OLE DB, ADO

ODBC

LD DB, ADO

Apps written in VB

RDO

RDO

ADO

Web applications

ADO

 

ADO

 

(5) old is retired

随着发展,优胜劣汰

  • DB-Library ended with SQL Server 2000, 被新的和更好的API代替, SQL Server Native Client, was introduced with SQL Server 2005.
  • Similarly, the development of Language-Integrated Query (LINQ) ended the long run of ESQL for C with SQL Server 2008.
  • RDO, similarly, was supported through Visual Basic version 6, after which Visual Basic became Visual Basic .NET and the focus shifted to ADO.NET (as we'll see in the next section).

.NET Framework (version 1.0 in February 2002 and 1.1 in April 2003), 将开发带入了managed-code space.
因此ODBC, OLE DB, and ADO technologies变得老化, 主要提供非托管应用程序(win32)数据访问技术. 如今它们被收集在Microsoft Data Access Components (MDAC) or the Windows Data Access Components (WDAC), and are part of the Windows SDK. 当然对他们也有一定的更新进化, 例如ADO Multi-Dimensional (ADOMD), ADO Extensions for DDL and Security (ADOX), the SQL Server Java Database Connectivity (JDBC) driver, and the PHP driver for SQL Server.  因为不管有没有.NET Framework, 都需要从非托管环境direct, optimized access to data stores like SQL Server.

History_Win32_5.png

Figure 5: Data Access Technologies for Unmanaged Code, current to April 2010.

Current Choice Matrix
(unmanaged code)

SQL Server DBs

DBs w/ ODBC driver

DBs w/ OLE DB driver

Apps written in C/C++

ODBC, OLE DB, ADO,
SQL Native Client

ODBC

OLD DB, ADO

Web applications

ADO, PHP Driver,
JDBC Driver

Accessible through the Microsoft OLE DB Provider for ODBC (MSDASQL)

ADO

Data Development Today, Part 1: Direct Data Access

.NET Framework改变了应用程序的开发方式, 但是并没有改变fundamental need for applications to access data.

(6)ADO.NET

对于数据访问, ADO变成了ADO.NET,

ADO.NET提供了一些primary classes, 这些类组成了ADO.NET的核心!
SqlConnection,
SqlCommand,
DataReader,
DataSet,
DataTable (along with server controls for ASP.NET applications).

ADO.NET 是ADO的.NET同胞兄弟, For a comparison of the two technologies, see ADO.NET for the ADO Programmer.

与前辈技术一样, ADO.NET 通过abstraction layer隐藏底层数据存储技术细节. 这一层通过ADO.NET的”data providers”来提供,
微软提供了native provider for SQL Server, providers for OLEDB, and providers for ODBC (see .NET Framework Data Providers in the .NET 1.1 documentation).
而且, ADO.NET 支持transparent access to XML sources in the same manner (ADO并不支持).
see below Figure 6. (Note: for a more comprehensive version of this diagram, see the .NET Data Access Architecture Guide article.
注意DataSet class是属于.NET System.Data assembly, 而DataReader and DataAdapter是由Porvider实现的,SqlConnection and SqlCommand也是provider-specific classes

History_DotNet_1.png

Figure 6: The introduction of ADO.NET in 2002/2003.

需要指出的是,在ADO.NET时代,数据库编程仍然离不开:

  1. 1. Open a database connection
    2. Execute a query in the database
    3. Get back a set of results
    4. Process those results
    5. Release the results
    6. Close the connection

换句话说,  APIs and other details 已经变了很多, 但是 代码本质机构还是没变. 这并不是坏事,因为它符合现实数据库的基本目的, to ask questions and get back answers.

(7) LINQ and Entity Framework

另外需要强调.NET Framework 3.5 带来的两项重要技术,

  • Language-Integrated Query (LINQ)
  • ADO.NET Entity Framework,
  • 比较ADO到ADO.NET的transition,它们是更重要的strides.

2007.11, LINQ keeps alive the ghost of ESQL for C:

LINQ的思想来源于直接在代码中嵌入Queries. As evidenced in its overview, LINQ is a far, far better solution than its distant ancestor(ESQL for C) in these ways:

  • LINQ can be used from any .NET language that supports it (like C# and Visual Basic)
  • LINQ allows for many modern performance optimizations
  • LINQ can operate against (that is, query and update) arbitrary data sources

Those data sources can be anything
in-memory CLR objects
XML
databases
ADO.NET DataSets
any other source upon which someone happens to implement a .NET class with the IEnumerable or IQueryable interfaces. These are shown in Figure 7.

History_DotNet_2.png

Figure 7: The introduction of LINQ in November 2007 including LINQ to Objects,
LINQ to XML, LINQ to DataSet, and LINQ to SQL.

不仅有微软提供的一部分“LINQ to XYZ” , 还有其他第三方的,LINQ to Flickr, LINQ to Amazon, LINQ to CSV, LINQ to MAPI, and LINQ to Twitter.等等
IQueryable如果看起来是a queryable data store, you can put an IQueryable interface on it and make it work with LINQ.

虽然LINQ提供了remarkably simple yet powerful query mechanism, 尤其是对DataSet和SQL Server databases,但并不改变应用程序怎么看待数据库:relational data. 我们目前看到的所有的技术都是针对relational data,实际上DB-Library, ESQL for C, OBDC, OLEDB, ADO, ADO.NET, and LINQ—are all united in that they inherently work directly against the relational structure of database tables.

这对simple databases and simple applications是很好的. 然而随着数据库更加庞大,应用程序更加复杂, 我们看到应用程序看数据的方式(概念上和对象模型上)和信息在数据库中的组织方式(存储和关系模型),两者之间的分歧不断加大.
例如” A manufacturer's ordering system, 可能将订货信息存储在多个关系表中,如果应用程序确实需要a single, conceptual “order” entity,那么就不得不 having to do a complex JOIN in every order-related query.

基于这个原因,程序员通常都会实现他们自己的mapping layers来创建那些特别的, 概念上的entities. 例如这样的层可以提供a single “Order” object, 并且有Retrieve and Update methods, 这些方法在内部实际上会进行数据库层面的qurries,在Order对象和其他下层表之间shuttle information。

Mapping layers, 很方便的将特定的数据库逻辑结构和应用程序隔离,它是一种常用的模式,你可以在那些由DB Library,ADO.NET and LINQ to SQL开发的Projects中发现Mapping layers. 但是不方便的是这些Mapping layers是tedious to implement and maintain.

2008.8 Entity Framework with .NET Framework 3.5 SP1

That changes with the Entity Framework introduced in August 2008 with the .NET Framework version 3.5 SP1 (see overview) and greatly improved with the .NET Framework 4 in April 2010.
The Entity Framework, as it’s simply called, 自动创建 a mapping layer of .NET classes from an existing database, along with a concise textual description of the mapping between conceptual and relational levels, which you can customize however you like. Whatever the case, the Entity Framework provides full LINQ access to the resulting entities (LINQ to Entities), and as it’s built on top of ADO.NET the Entity Framework directly leverages the ADO.NET providers.

用Visual studio的设计器,我们很容易创建概念和关系表象。 Entity Framework designer会创建默认的Mapping,用Entity Framework不会是你的工作更复杂,相反,会更容易 for core scenarios and offers much more flexibility beyond the basics. And the bottom line of it all is that you can focus your efforts on what you really care about—the conceptual objects and mapping—and not labor over all the tedious, error-prone plumbing code that’s traditionally left to summer interns.

This brings us to the current state of the art as of April 2010 as shown in Figure 8.

History_DotNet_3.png

Figure 8: The Entity Framework, first released in August 2008, automates the hard work of conceptual mapping. An Entity Data Model is used at compile time to generate classes for a mapping layer.

Data Development Today, Part 2: Data in the Cloud

As subtly betrayed by the title of the last section, everything we’ve seen to this point applies primarily to applications that can access a database in some direct manner, such as the file system, named pipes, etc. This applies to client applications accessing databases on an intranet as well as web applications accessing backing stores on their web servers (and web applications written in ASP.NET have full access to ADO.NET and Entity Framework, of course).

What we haven’t talked about are applications that access data more indirectly, such as web or rich internet applications that wish to create and consume REST-based data services.

To be honest, this subject starts to bridge us into the world of the Windows Communication Foundation (WCF) and the world of web services in general, which goes beyond the scope of this article. We mention it, however, to introduce WCF Data Services (formerly ADO.NET Data Services and code name “Astoria”), a framework that facilitates this communication behind the scenes of what we’ve been exploring so far (see Figure 9).

History_DotNet_4.png

Figure 9: WCF Data Services facilitates creating and consuming
REST-based data services.

The goal of Data Services is to facilitate near-turnkey creation of flexible data services that are naturally integrated with the web, using URIs to point to pieces of data (such as entities in an Entity Data Model) and simple, well-known formats to represent that data (such as JSON and XML). Other web applications (and agents) can interact with the resulting REST-style resource collection using the usual HTTP verbs such as GET, POST, or DELETE. Indeed, so universal are the conventions for working with data services that they have been formalized as the Open Data Protocol, information on which can be found at www.odata.org.

Many of the Microsoft cloud data services, such as Windows Azure tables, SQL Data Services, and SharePoint, expose data using the Data Services protocol. This means that the Data Services client libraries and developer tools can be used to work with both on-premise and hosted cloud services.

Data Development Futures: Modeling

If we take a step back from the whole arc of the discussion we’ve had in this article, we can see a definite trend toward greater levels of abstraction. The earliest data access solutions for SQL Server, like DB-Library and ESQL for C, were sufficient though quite rudimentary. Technologies like ODBC created an abstraction layer above the proprietary APIs of different databases, a model that OLE DB, ADO, and ADO.NET continue to follow.

Most recently, the Entity Framework has gone a step further to create an additional abstraction layer not over the data access API but over the structure of a relational database itself. Similarly, Data Services transform any number of diverse data sources into something accessible through a simple REST-based exchange protocol. (In fact, Microsoft expects that using such protocols will become increasingly popular, as it allows data providers and consumers to evolve independently from their programming model.)

Taken as a whole, this trend can be described as a trend toward modeling, the act of creating representations of real-world concepts that are internally translated into the representations that computer systems (like database engines) inherently understand.

What’s key here is that modeling itself is something that takes place outside of the database, in large part because data often spans multiple databases and multiple formats. Thus what matters is the richness of the pipeline between what's in the data store and what runs outside that store.

Increasing that richness is the purpose of Microsoft's next wave of investments beyond further enhancements to the Entity Framework and Data Services (see Entity Framework futures and Data Services futures), namely a body of technologies collectively called the SQL Server Modeling CTP (Community Technology Preview). Rather than replacing existing data development methods, these technologies introduce new ways of working with SQL Server databases along with greater availability of metadata alongside the data itself.

Metadata—or information about data and applications—is the key to the next advances in developer productivity with data-oriented applications. Modeling, in other words, is a recognition that data about an application, and data about data, is just as important as the application and data themselves. And the components of the SQL Server Modeling CTP, shown in Figure 10 as a projection from Figure 9, are the ways in which Microsoft is beginning to explore this new territory.

History_Futures_1.png

Figure 10: Future technologies in the SQL Server Modeling CTP.

The code name “M” language is like a more manageable (though more limited) form of Transact-SQL, the language normally used to describe data schema and values. It's also closely aligned with the purpose of the EDMX files used in the Entity Framework to implement the conceptual layer (or Entity Data Model). The "M" language is, in fact, being developed as a textual language that also implements the Entity Data Model, and will serve alongside the XML-based EDMX dialect. Furthermore, "M" includes powerful support for the creation of domain-specific languages (DSLs) in which application logic, data access layers, and even an application’s user interface can be easily defined by even non-developers.

SQL Server Modeling Services, for its part, deals with enterprise-wide metadata and the kinds of applications that enterprises are interested in building around that metadata. And the code name “Quadrant” tool provides a visual means of interacting with relational data in ways that have traditionally been either very difficult or have necessitated a custom application of some kind. Many “forms over data” applications that have to date been written using the other data access technologies we’ve seen can be quickly assembled directly within “Quadrant”.

Much more can be said on these future technologies, of course, which you can find on the MSDN Data Developer Center along with resources for most of the other current technologies we’ve covered. Indeed, this historical journey has been written to introduce you to those many resources and to put all these technologies into a context that not only spans several decades in the past, but will span at least as many into the future. Indeed, the very cutting-edge developments we've covered here will themselves, one day, be seen as but first steps in an even greater arc of technological development. For within all our systems, data is again the piece with the greatest longevity…and the piece that will undoubtedly continue to expand and take on an increasingly central role in our lives.

posted @ 2010-06-07 16:44  DylanWind  阅读(405)  评论(0编辑  收藏  举报