SparkSQL无法创建多个Session解决方法
一、问题现象
SparkSQL创建多个session报错,不能创建一个链接,链接Spark自带的数据库derby
2024-01-25 19:50:59.053 [INFO ]24/01/25 19:50:59 INFO !PLExecution!: Execute SQL: DROP TABLE IF EXISTS ibor_nfsd_instjmport 2024-01 -25 19:51:01.628 (INFO ]24/01/25 19:51:01 INFO HiveUtils: Initializing HiveMetastoreConnection version 2.3.9 using Spark classes.
2024-01 -25 19:51:02.009 [INFO ]24/01/25 19:51:02 INFO HiveClientlmpI: Warehouse location for Hive client (version 2.3.9) is file:/opt/bdata/studio-axas/platform-server/spark-warehouse
2024-01 -25 19:51:03.286 [INFO ]19:51:03.209 [main] ERROR DataNucleus.Datastore.Schema - Failed initialising database. 2024-01 -25 19:51:03.286 [INFO ]org.datanucleus.exceptions.NucleusDataStoreException: Unable to open a test connection to the given database. JDBC url = jdbc:derby:;databaseName=metastore_db;create=true, username = APP. Terminating connection pool (set lazylnit to true if you expect to start your database after your app). Original Exception:..........................
2024-01 -25 19:51:03.286 [INFO Jjava.sql.SQLException: Failed to start database 'metastore_db' with class loader org.apache.spark.sql.hive.clientlsolatedClientLoader$$anon$1@2c3c36df, see the next exception for details.
2024-01 -25 19:51:03.286 [INFO ] at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
2024-01 -25 19:51:03.286 [INFO ] at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
2024-01-25 19:51:03.286 [INFO ] at org.apache.derby.impl.jdbc.Util.seeNextException(Unknown Source)
2024-01-25 19:51:03.286 [INFO ] at org.apache.derby.impl.jdbc.EmbedConnection.bootDatabase(Unknown Source)
2024-01-25 19:51:03.286 [INFO ] at org.apache.derby.impl.jdbc.EmbedConnection.(Unknown Source)
2024-01-25 19:51:03.286 [INFO ] at org.apache.derby.jdbc.lnternalDriver$1.run(Unknown Source)
2024-01-25 19:51:03.286 [INFO ] at org.apache.derby.jdbc.lnternalDriver$1.run(Unknown Source)
2024-01-25 19:51:03.286 [INFO ] at java.security.AccessController.doPrivileged(Native Method)
2024-01 -25 19:51:03.286 [INFO ] at org.apache.derby.jdbc.lnternalDriver.getNewEmbedConnection(Unknown Source)
2024-01-25 19:51:03.286 [INFO ] at org.apache.derby.jdbc.lnternalDriver.connect(Unknown Source)
2024-01 -25 19:51:03.286 [INFO ] at org.apache.derby.jdbc.lnternalDriver.connect(Unknown Source)
2024-01-25 19:51:03.286 [INFO ] at org.apache.derby.jdbc.AutoloadedDriver.connect(Unknown Source)
2024-01 -25 19:51:03.286 [INFO ] at java.sql.DriverManager.getConnection(DriverManager.java:664)
2024-01-25 19:51:03.286 [INFO ] at java.sql.DriverManager.getConnection(DriverManager.java:208)
二、具体原因
SparkSQL使用thrift server方式链接创建链接,使用Derby为元数据库,会在指定目录下创建一个metadata_db文件,如果多个任务并发,会创建多个文件,造成无法使用
三、解决方法
为每个作业执行一个元数据目录,通过-Dderby.system.home方式为每个session指定一个元数据文件目录