Programmatically parsing Transact SQL (T-SQL) with the ScriptDom parser
Programmatically parsing Transact SQL (T-SQL) with the ScriptDom parser
.NET Developers are perhaps familiar with libraries like Roslyn and CodeDOM, which allow in-depth analysis of code (amongst many other things). Java developers use tools like SpotBugs for code analysis. As a SQL developer, perhaps you might have wondered if there are any equivalent functionality for Azure SQL and SQL Server code? Well, wonder no more!
Over the years, I have leveraged the very useful TransactSql ScriptDom .NET library, to programmatically parse Transact-SQL (T-SQL) statements, expressions, and scripts. This library accurately produces an Abstract Syntax Tree (AST) representation of the T-SQL code. Once you have such an AST, you can “walk” the tree to analyze it for various code patterns. As an example, you can use it to detect in a robust way if you have any DML statement other than SELECT – which may be useful to detect some forms of SQL injection. You can even change the AST if needed and produce a modified T-SQL script with the re-written tree. The possibilities are endless!
Use Cases
I am aware of several interesting use cases where this parser has been leveraged:
- A customer uses Azure SQL auditing, coupled with the XELite .NET library, to read XEL audit files containing T-SQL text. The T-SQL text is then parsed to get analytics on which tables, views and columns are queried most frequently.
- boB Taylor uses ScriptDom in his SqlCommandFilters project to auto-parameterize T-SQL statements so that they could be used in the context of Always Encrypted.
- The SQL Metadata Toolkit uses ScriptDom to find and map dependencies across the SQL code base.
- The SQL Sharpener project uses ScriptDom to parse T-SQL and then generate C# wrappers for those stored procedures.
- HealthCatalyst ‘HCPosh’ Powershell module, use the parser to programmatically understand the structure of SQL statements.
- Last but definitely not the least, many T-SQL static code analysis projects use ScriptDom!
In addition, members of the SQL community have some great articles showing practical usage of ScriptDom / SQLDOM to do useful tasks:
- Parse parameter default values using PowerShell – Part 1 – SQLPerformance.com
- Microsoft SQL Server Script DOM | Dan Guzman’s Blog (dbdelta.com)
- Removing Comments from SQL | Michael J. Swart
- How to get started with the ScriptDom · The Agile SQL Club
Forgive me if I missed some useful examples – my attempt was not to list all possible related ones, just recent ones I am familiar with. Please do leave comments if you feel there is a relevant example which should be linked.
Within the realm of official Microsoft releases, ScriptDom is also one of the core building blocks for DacFx and associated tooling like SqlPackage. It is also part of the SqlServer PowerShell module, and many other projects, both at Microsoft, and in the broader community. This library is immensely powerful, and I believe that all SQL developers should be aware of the power and scope it provides.
Get Started!
If you are interested in learning about how you can use this library, my GitHub repo has some quite simple examples to get started. Once you add the necessary references to ScriptDom, you can instantiate a TSqlParser class specific to the target version of SQL that you are working with. Currently for Azure SQL and SQL Server 2019, that would be the TSqlParser150 class:
using (var rdr = new StringReader(textBox1.Text))
{
IList<ParseError> errors = null;
var parser = new TSql150Parser(true, SqlEngineType.All);
var tree = parser.Parse(rdr, out errors);
...
BTW, this parser now supports many constructs of the T-SQL syntax specific to Azure Synapse Analytics as well. For example, it can parse CTAS statements! Anyways, once the parser produces the AST, you can use the well-known Visitor pattern to walk the tree. Usually this is done by implementing a class which can then store state as the tree is walked:
class MyVisitor : TSqlFragmentVisitor
{
internal bool containsOnlySelects = true;
public override void Visit(TSqlStatement node)
{
if ((node as SelectStatement) is null)
{
containsOnlySelects = false;
}
base.Visit(node);
}
}
And then, in the main code, we call .Parse(), thereby creating an instance of the “walker” class and walk the AST:
MyVisitor checker = new MyVisitor();
tree.Accept(checker);
Video
To know more, you can examine the samples on my repo. You can also watch a detailed presentation on this topic done to the PASS AppDev virtual chapter session: T-SQL Swiss Knife using the ScriptDom T-SQL Parser by Arvind Shyamsundar. And I have blogged in the past about usage of this parser on my personal blog, including using ScriptDom to format T-SQL etc.: Arvind Shyamsundar’s technical blog
ScriptDom in SQL Projects (Code Analysis)
ScriptDom being a full-fidelity parser, is great for statically analyzing T-SQL code to proactively identify anti-patterns. Historically, we have had Code Analysis capabilities within Visual Studio SQL Projects. Visual Studio SQL Projects allow you to extend the built-in code analysis rules and implement your own custom code analysis rules. Here are some examples of OSS projects for custom code analysis within Visual Studio SQL Projects (earlier known as “SSDT”). They all leverage ScriptDom:
BTW, SQL Projects is recently also available as an preview extension for Azure Data Studio. Also, a related community-led project which allows developers to build DACPACs without Visual Studio is rr-wfm/MSBuild.Sdk.SqlProj. While this project does not use ScriptDom directly, it shows how to leverage the broader DacFx library.
Again, forgive me if I missed some useful code analysis extension which uses ScriptDom. Please do leave comments if you feel there is a relevant post which should be linked.
Conclusion
The parser is immensely powerful if you are interested in programmatically parsing T-SQL. Since the parser is an official release from Microsoft, and largely based on the same grammar that the core SQL engine uses, it is the most full-fidelity choice for parsing T-SQL currently available. And best of all it is freely available, as part of the DacFx NuGet!
作者:Chuck Lu GitHub |
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 记一次.NET内存居高不下排查解决与启示
· 探究高空视频全景AR技术的实现原理
· 理解Rust引用及其生命周期标识(上)
· 浏览器原生「磁吸」效果!Anchor Positioning 锚点定位神器解析
· 没有源码,如何修改代码逻辑?
· 全程不用写代码,我用AI程序员写了一个飞机大战
· DeepSeek 开源周回顾「GitHub 热点速览」
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· 记一次.NET内存居高不下排查解决与启示
· 白话解读 Dapr 1.15:你的「微服务管家」又秀新绝活了
2020-11-17 TLS Security
2017-11-17 web forms page和control的生命周期life cycle交互,以及page生命周期中每个event中需要做什么事情
2016-11-17 sqlite创建表
2015-11-17 What's New for Visual C# 6.0
2015-11-17 分批次获取git for windows的源代码
2015-11-17 Interpolated Strings
2014-11-17 sqlite中的时间