MSSQL forensics (1) - MDF fundamentals
Last year I had participated in Digital Forensics Challenge 2019 (DFC2019) and enjoyed a lot.
They made a lot of exciting questions. Especially, I had spent a fair amount of times to work on challenges around Microsoft SQL Server(MSSQL). During my challenge I tried to find useful tools/articles for MSSQL forensics, but it seems that there is not much information on the Internet.
That's why I will write a series what I have learned about MSSQL from a forensic perspective.
Scope
There are a lot of components of MSSQL. What I can describe at the series is as follows:
- MDF file structure
- Page header
- Data page
- Recover deleted records
- Recover large object (LOB)
Sample
Unfortunately, all questions and dataset of DFC2019 are not available because DFC2019 has ended. Then I created sample MSSQL database.
Download
Sample database has been created as follows.
Software
- Windows Server 2016 Standard Edition
- SQL Server 2017
- SQL Server Management Studio v18.2 (SSMS)
Database
- Database name: 4n6ist_sample
- Table name: pictures
- Schema:
- Records: Inserted 3 records then deleted 1 record(id=3) with the following query
After the execution of the query, we can see 2 allocated records.
Just for reference, these two binaries of data column contain JPG picture as follows:
(I've got these pictures at PIXNIO, which provides public domain images)
Now my goal is to recover deleted record (i.e. id=3) as possible.
Sample database files consist of "4n6ist_sample.mdf" and "4n6ist_sample_log.ldf". Here I focus on only "4n6ist_sample.mdf" file.
MDF File Structure
Paul Randal has already covered MSSQL page structure at this article. Here is a big picture.
In summary what we should understand is:
- MDF file consists of multiple pages
- The size of a page is 8k bytes
- A page consists of header, records and slot array
- Header: represents page type, the count of records, free space in the page, and so on.
- Records: vary depending on page type, but generally data records hold actual data associated with a table.
- Slot array: is to manage each record position. Each entry (2 bytes) points to each record offset on the page.
Page header
We can get information about page header using DBCC IND and DBCC PAGE query. Here is an example output on SSMS.
DBCC IND shows summary of all pages associated with specified table.
DBCC IND('database name', 'table name', -1)
DBCC PAGE shows detail information of specified page.
DBCC PAGE('4n6ist_sample', 1, 368, 0)
We can see record area with hex view if we set 1 to third parameter like "DBCC PAGE('4n6ist_sample', 1, 368, 1)"
What our interest in page header is:
- m_type: 1(data type), 3(text mix page) and 4(text tree page)
- pminlen: Size of the fixed-length columns of the record
- m_slotCnt: Count of records
- m_freeCnt: Size of free space
- m_freeData: Offset to the first byte after the end of the last record
From my understanding, m_feeCnt and m_freeData are illustrated as follows:
In addition to Paul's article, Mark S. Rasmussen has described the details of page header structure. I have written a python script for parsing MDF page header. The script allows to parse MDF file without SQL Server environment.
I will cover how to handle the output and data page structure next time.
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· DeepSeek 开源周回顾「GitHub 热点速览」
· 物流快递公司核心技术能力-地址解析分单基础技术分享
· .NET 10首个预览版发布:重大改进与新特性概览!
· AI与.NET技术实操系列(二):开始使用ML.NET
· 单线程的Redis速度为什么快?
2016-11-02 Printing Architecture
2016-11-02 Basic Printing Architecture
2016-11-02 SignalR 的跨域支持
2016-11-02 SignalR Troubleshooting
2016-11-02 Building Web Apps with SignalR, Part 1
2016-11-02 Why SignalR does not use WebSockets?
2015-11-02 Linux & Python 导航目录