使用Impala parser解析SQL
Impala对于hive引擎的语法和hive原生的有些许不同,虽然使用hive的parser也能部分兼容,但是由于impala的parser是使用flex(Fast Lexical Analyzer Generator,快速词法分析生成器)和java cup(Java Constructor of Useful Parsers,生成语法分析器(parser)的工具)开发的,所以对impala的query进行语法解析的时候建议还是使用Impala原生的parser
1.在安装了impala的机器下找到impala-frontend的jar包(环境中的impala版本为2.12.0+cdh5.15.1+0)
1 2 | lintong@master: /opt/cloudera/parcels/CDH/jars $ ls | grep impala-frontend impala-frontend-0.1-SNAPSHOT.jar |
2.使用mvn install安装到本地仓库中,或者上传到私服仓库中
1 | mvn install : install - file -Dfile= /home/lintong/ 下载 /impala-frontend-0 .1-SNAPSHOT.jar -DgroupId=org.apache.impala -DartifactId=impala-frontend -Dversion=0.1-SNAPSHOT -Dpackaging=jar |
3.在工程中引入impala-frontend和java-cup,java-cup的版本可以使用反编译工具打开impala-frontend的jar进行确认
1 2 3 4 5 6 7 8 9 10 | <dependency> <groupId>org.apache.impala< /groupId > <artifactId>impala-frontend< /artifactId > <version>0.1-SNAPSHOT< /version > < /dependency > <dependency> <groupId>net.sourceforge.czt.dev< /groupId > <artifactId>java-cup< /artifactId > <version>0.11-a-czt02-cdh< /version > < /dependency > |
在解析select语句的时候如果报
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | java.lang.NoClassDefFoundError: org /apache/sentry/core/model/db/DBModelAction at org.apache.impala.analysis.TableRef.<init>(TableRef.java:138) at org.apache.impala.analysis.CUP$SqlParser$actions.case421(SqlParser.java:18035) at org.apache.impala.analysis.CUP$SqlParser$actions.CUP$SqlParser$do_action(SqlParser.java:5976) at org.apache.impala.analysis.SqlParser.do_action(SqlParser.java:1349) at java_cup.runtime.lr_parser.parse(lr_parser.java:587) at com.xxxx.xx.core.parser.XXXXTest.getLineageInfo(XXXXTest.java:41) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.junit.runner.JUnitCore.run(JUnitCore.java:137) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:69) at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33) at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:220) at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:53) Caused by: java.lang.ClassNotFoundException: org.apache.sentry.core.model.db.DBModelAction at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 28 more Process finished with exit code 255 |
在pom中添加
1 2 3 4 5 | <dependency> <groupId>org.apache.sentry< /groupId > <artifactId>sentry-core-model-db< /artifactId > <version>1.5.1-cdh5.15.1< /version > < /dependency > |
4.参考Impala的源代码中parser的demo
1 | https: //github .com /cloudera/Impala/blob/master/fe/src/test/java/com/cloudera/impala/analysis/ParserTest .java |
解析select和create kudu table等语句
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 | import org.apache.impala.analysis.*; import java.io.StringReader; String impalaSelectQuery = "SELECT `ds` FROM `db1`.`table1` WHERE (`ds`='test') OR (`ds`='2020-08-02') OR (`ds`='2020-08-01') LIMIT 100" ; // select 语句 String hiveSelectQuery = "select city,array_contains(city, 'Paris') from default.arraydemo limit 5" ; String kuduCreateTableQuery = "CREATE TABLE `db1`.`my_first_table`\n" + "(\n" + " id BIGINT,\n" + " name STRING,\n" + " PRIMARY KEY(id)\n" + ")\n" + "PARTITION BY HASH PARTITIONS 16\n" + "STORED AS KUDU\n" + "TBLPROPERTIES (\n" + " 'kudu.master_addresses' = 'hadoop01:7051,hadoop02:7051,hadoop03:7051', \n" + " 'kudu.table_name' = 'my_first_table'\n" + ");" ; // kudu建表语句 String invalidQuery = "INVALIDATE METADATA db1.tb1" ; // 刷新元数据语句 String refreshQuery = "REFRESH db1.tb1 partition(ds='2021-05-02')" ; // 刷新元数据语句 String computeQuery = "COMPUTE INCREMENTAL STATS db1.tb1" ; // compute stats语句 String describeQuery = "Describe db1.tb1;" ; // describe语句 String renameQuery = "ALTER TABLE my_db.customers RENAME TO my_db.users;" ; // rename语句 String addColQuery = "ALTER TABLE db1.tb1 ADD COLUMNS (col1 string)" ; // add col语句 String alterColQuery = "ALTER TABLE db1.tb1 CHANGE col1 col2 bigint" ; // alter col语句 String setQuery = "set mem_limit = 5gb" ; String useQuery = "use default" ; String query = impalaSelectQuery; SqlScanner input = new SqlScanner(new StringReader(query)); SqlParser parser = new SqlParser(input); ParseNode node = null; try { node = (ParseNode) parser.parse().value; if (node instanceof SelectStmt) { System.out.println( "查询语句" ); // with语句也属于查询语句 SelectStmt selectStmt = (SelectStmt) node; String databaseName = selectStmt.getTableRefs().get(0).getPath().get(0); String tableName = selectStmt.getTableRefs().get(0).getPath().get(1); System.out.println(databaseName); System.out.println(tableName); } else if (node instanceof CreateTableStmt) { System.out.println( "建表语句" ); CreateTableStmt createTableStmt = (CreateTableStmt) node; System.out.println(createTableStmt.getTbl()); for (ColumnDef def : createTableStmt.getColumnDefs()) { System.out.println(def.getColName() + " " + def.getTypeDef()); } } else if (node instanceof ResetMetadataStmt) { System.out.println( "刷新元数据语句" ); } else if (node instanceof ComputeStatsStmt) { System.out.println( "compute stats语句" ); } else if (node instanceof DescribeTableStmt) { System.out.println( "describe语句" ); } else if (node instanceof AlterTableOrViewRenameStmt) { System.out.println( "rename语句" ); } else if (node instanceof AlterTableAddReplaceColsStmt) { System.out.println( "add col语句" ); } else if (node instanceof AlterTableAlterColStmt) { System.out.println( "alter col语句" ); } else if (node instanceof UseStmt) { System.out.println( "use语句" ); } else if (node instanceof SetStmt) { System.out.println( "set语句" ); } else { System.out.println(node.getClass()); } } catch (Exception e) { e.printStackTrace(); fail( "\nParser error:\n" + parser.getErrorMsg(query)); } |
输出
1 2 3 4 | 建表语句 my_first_table id BIGINT name STRING |
impala建textfile表语句
1 2 3 4 5 | create table IF NOT EXISTS default.bbb ( column1 string, column2 int, column3 bigint ); |
不添加其他参数默认建立的是TEXTFILE格式的hive表
1 | CREATE TABLE default.bbb ( column1 STRING, column2 INT, column3 BIGINT ) STORED AS TEXTFILE LOCATION 'hdfs://xx-nameservice/user/hive/warehouse/bbb' |
impala建parquet表语句
1 2 3 4 5 6 | create table IF NOT EXISTS default.bbb ( column1 string, column2 int, column3 bigint ) stored as parquet; |
表结构
1 | CREATE TABLE default.bbb ( column1 STRING, column2 INT, column3 BIGINT ) STORED AS PARQUET LOCATION 'hdfs://xx-nameservice/user/hive/warehouse/bbb' |
本文只发表于博客园和tonglin0325的博客,作者:tonglin0325,转载请注明原文链接:https://www.cnblogs.com/tonglin0325/p/5243824.html
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· AI与.NET技术实操系列:向量存储与相似性搜索在 .NET 中的实现
· 基于Microsoft.Extensions.AI核心库实现RAG应用
· Linux系列:如何用heaptrack跟踪.NET程序的非托管内存泄露
· 开发者必知的日志记录最佳实践
· SQL Server 2025 AI相关能力初探
· winform 绘制太阳,地球,月球 运作规律
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· AI与.NET技术实操系列(五):向量存储与相似性搜索在 .NET 中的实现
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 超详细:普通电脑也行Windows部署deepseek R1训练数据并当服务器共享给他人