Pentaho Kettle
https://hitachiedge1.jfrog.io/ui/native/pntpub-maven-release-cache/org/pentaho/di/pdi-ce/9.4.0.0-343
https://pentaho.com/pentaho-community-edition
pentaho-server doc:https://docs.hitachivantara.com/r/en-us/pentaho-data-integration-and-analytics/10.2.x/mk-95pdia001
pdi doc:https://docs.hitachivantara.com/r/en-us/pentaho-data-integration-and-analytics/10.2.x/mk-95pdia003
document.getElementById('communityProducts').style.display = ''
pentaho-server
配置 MySQL
1、执行 data\mysql 下的脚本:create_jcr_mysql.sql、create_quartz_mysql.sql、create_repository_mysql.sql
执行前可将脚本中的 DEFAULT CHARACTER SET latin1 改为 DEFAULT CHARACTER SET utf8mb4
脚本默认新建 hibuser@localhost/password、jcr_user@localhost/password 和 pentaho_user@localhost/password 三个用户
2、设置 Quartz
打开 pentaho-solutions/system/scheduler-plugin/quartz/quartz.properties
找到 #_replace_jobstore_properties 部分 org.quartz.jobStore.driverDelegateClass = org.quartz.impl.jdbcjobstore.StdJDBCDelegate 找到 # Configure Datasources 部分 org.quartz.dataSource.myDS.jndiURL = Quartz
3、设置 hibernate
打开 pentaho-solutions/system/hibernate/hibernate-settings.xml
<config-file>system/hibernate/mysql5.hibernate.cfg.xml</config-file>
打开 pentaho-solutions/system/hibernate/mysql5.hibernate.cfg.xml,新的为 com.mysql.cj.jdbc.Driver
<property name="connection.driver_class">com.mysql.jdbc.Driver</property>
复制 pentaho-solutions/system/dialects/mysql5/audit_sql.xml 到 pentaho-solutions/system 目录中
4、设置 Jackrabbit
打开 pentaho-solutions/system/jackrabbit/repository.xml,注释掉原来的,放开或添加需要的
找到 Repository-FileSystem 部分 <FileSystem class="org.apache.jackrabbit.core.fs.db.DbFileSystem"> <param name="driver" value="javax.naming.InitialContext"/> <param name="url" value="java:comp/env/jdbc/jackrabbit"/> <param name="schema" value="mysql"/> <param name="schemaObjectPrefix" value="fs_repos_"/> </FileSystem> 找到 Repository-DataStore 部分 <DataStore class="org.apache.jackrabbit.core.data.db.DbDataStore"> <param name="driver" value="javax.naming.InitialContext"/> <param name="url" value="java:comp/env/jdbc/jackrabbit"/> <param name="databaseType" value="mysql"/> <param name="minRecordLength" value="1024"/> <param name="maxConnections" value="3"/> <param name="copyWhenReading" value="true"/> <param name="tablePrefix" value=""/> <param name="schemaObjectPrefix" value="ds_repos_"/> </DataStore> 找到 Repository-Workspaces-FileSystem 部分 <FileSystem class="org.apache.jackrabbit.core.fs.db.DbFileSystem"> <param name="driver" value="javax.naming.InitialContext"/> <param name="url" value="java:comp/env/jdbc/jackrabbit"/> <param name="schema" value="mysql"/> <param name="schemaObjectPrefix" value="fs_ws_"/> </FileSystem> 找到 Repository-Workspaces-PersistenceManager 部分 <PersistenceManager class="org.apache.jackrabbit.core.persistence.bundle.MySqlPersistenceManager"> <param name="driver" value="javax.naming.InitialContext"/> <param name="url" value="java:comp/env/jdbc/jackrabbit"/> <param name="schema" value="mysql"/> <param name="schemaObjectPrefix" value="${wsp.name}_pm_ws_"/> </PersistenceManager> 找到 Repository-Versioning-FileSystem 部分 <FileSystem class="org.apache.jackrabbit.core.fs.db.DbFileSystem"> <param name="driver" value="javax.naming.InitialContext"/> <param name="url" value="java:comp/env/jdbc/jackrabbit"/> <param name="schema" value="mysql"/> <param name="schemaObjectPrefix" value="fs_ver_"/> </FileSystem> 找到 Repository-Versioning-PersistenceManager 部分 <PersistenceManager class="org.apache.jackrabbit.core.persistence.bundle.MySqlPersistenceManager"> <param name="driver" value="javax.naming.InitialContext"/> <param name="url" value="java:comp/env/jdbc/jackrabbit"/> <param name="schema" value="mysql"/> <param name="schemaObjectPrefix" value="pm_ver_"/> </PersistenceManager> 找到 Repository-Cluster 部分 <Journal class="org.apache.jackrabbit.core.journal.DatabaseJournal"> <param name="revision" value="${rep.home}/revision.log"/> <param name="driver" value="javax.naming.InitialContext"/> <param name="url" value="java:comp/env/jdbc/jackrabbit"/> <param name="schema" value="mysql"/> <param name="schemaObjectPrefix" value="J_C_"/> <param name="janitorEnabled" value="true"/> <param name="janitorSleep" value="86400"/> <param name="janitorFirstRunHourOfDay" value="3"/> </Journal>
5、设置 tomcat
下载 JDBC 驱动,复制 jar 到 tomcat/lib 文件夹,然后删除其余不需要的 JDBC 驱动 jar
打开 tomcat/webapps/pentaho/META-INF/context.xml,删除原有的,添加以下配置
<Resource validationQuery="select 1" url="jdbc:mysql://localhost:3306/hibernate" driverClassName="com.mysql.jdbc.Driver" password="password" username="hibuser" initialSize="0" maxActive="20" maxIdle="10" maxWait="10000" factory="org.pentaho.di.core.database.util.DecryptingDataSourceFactory" type="javax.sql.DataSource" auth="Container" name="jdbc/Hibernate"/> <Resource validationQuery="select 1" url="jdbc:mysql://localhost:3306/hibernate" driverClassName="com.mysql.jdbc.Driver" password="password" username="hibuser" initialSize="0" maxActive="20" maxIdle="10" maxWait="10000" factory="org.pentaho.di.core.database.util.DecryptingDataSourceFactory" type="javax.sql.DataSource" auth="Container" name="jdbc/Audit"/> <Resource validationQuery="select 1" url="jdbc:mysql://localhost:3306/quartz" driverClassName="com.mysql.jdbc.Driver" password="password" username="pentaho_user" testOnBorrow="true" initialSize="0" maxActive="20" maxIdle="10" maxWait="10000" factory="org.pentaho.di.core.database.util.DecryptingDataSourceFactory" type="javax.sql.DataSource" auth="Container" name="jdbc/Quartz"/> <Resource validationQuery="select 1" url="jdbc:mysql://localhost:3306/pentaho_operations_mart" driverClassName="com.mysql.jdbc.Driver" password="password" username="hibuser" initialSize="0" maxActive="20" maxIdle="10" maxWait="10000" factory="org.pentaho.di.core.database.util.DecryptingDataSourceFactory" type="javax.sql.DataSource" auth="Container" name="jdbc/pentaho_operations_mart"/> <Resource validationQuery="select 1" url="jdbc:mysql://localhost:3306/pentaho_operations_mart" driverClassName="com.mysql.jdbc.Driver" password="password" username="hibuser" initialSize="0" maxActive="20" maxIdle="10" maxWait="10000" factory="org.pentaho.di.core.database.util.DecryptingDataSourceFactory" type="javax.sql.DataSource" auth="Container" name="jdbc/PDI_Operations_Mart"/> <Resource name="jdbc/live_logging_info" auth="Container" type="javax.sql.DataSource" factory="org.pentaho.di.core.database.util.DecryptingDataSourceFactory" initialSize="0" maxActive="20" maxIdle="10" maxWait="10000" username="hibuser" password="password" driverClassName="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/pentaho_dilogs" validationQuery="select 1"/> <Resource name="jdbc/jackrabbit" auth="Container" type="javax.sql.DataSource" factory="org.pentaho.di.core.database.util.DecryptingDataSourceFactory" maxActive="20" minIdle="0" maxIdle="5" initialSize="0" maxWait="10000" username="jcr_user" password="password" driverClassName="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/jackrabbit" validationQuery="select 1"/>
打开 http://127.0.0.1:8080/pentaho,默认用户名密码:admin/password
查看 kettle job trans 运行状态:http://127.0.0.1:8080/pentaho/kettle/status
日志扩展:https://www.datamensional.com/setting-up-logging-tables-for-pdi-both-locally-and-on-server
6、其它
pentaho-solutions/system/applicationContext-spring-security-jdbc.properties
datasource.driver.classname=com.mysql.jdbc.Driver datasource.url=jdbc:mysql://localhost:3306/hibernate datasource.username=hibuser datasource.password=password datasource.validation.query=SELECT 1
pentaho-solutions/system/applicationContext-spring-security-hibernate.properties & pentaho-solutions/system/dialects/mysql5/applicationContext-spring-security-hibernate.properties
jdbc.driver=com.mysql.jdbc.Driver jdbc.url=java:comp/env/jdbc/Hibernate hibernate.dialect=org.hibernate.dialect.MySQL5InnoDBDialect
pentaho-solutions/system/simple-jndi/jdbc.properties
SampleData/type=javax.sql.DataSource SampleData/driver=com.mysql.jdbc.Driver SampleData/url=jdbc:mysql://localhost:3306/sampledata SampleData/user=pentaho_user SampleData/password=password Hibernate/type=javax.sql.DataSource Hibernate/driver=com.mysql.jdbc.Driver Hibernate/url=jdbc:mysql://localhost:3306/hibernate Hibernate/user=hibuser Hibernate/password=password Quartz/type=javax.sql.DataSource Quartz/driver=com.mysql.jdbc.Driver Quartz/url=jdbc:mysql://localhost:3306/quartz Quartz/user=pentaho_user Quartz/password=password Shark/type=javax.sql.DataSource Shark/driver=com.mysql.jdbc.Driver Shark/url=jdbc:mysql://localhost:3306/shark Shark/user=sa Shark/password= SampleDataAdmin/type=javax.sql.DataSource SampleDataAdmin/driver=com.mysql.jdbc.Driver SampleDataAdmin/url=jdbc:mysql://localhost:3306/sampledata SampleDataAdmin/user=pentaho_admin SampleDataAdmin/password=password
打开 pentaho-solutions/system/systemListeners.xml,注释以下配置
<!-- <bean id="pooledDataSourceSystemListener" class="org.pentaho.platform.engine.services.connection.datasource.dbcp.PooledDatasourceSystemListener"/> --> <!-- <bean id="nonPooledDataSourceSystemListener" class="org.pentaho.platform.engine.services.connection.datasource.dbcp.NonPooledDatasourceSystemListener"/> --> <!-- <bean id="dynamicallyPooledDataSourceSystemListener" class="org.pentaho.platform.engine.services.connection.datasource.dbcp.DynamicallyPooledDatasourceSystemListener"/> --> <!-- <bean id="quartzSystemListener" class="org.pentaho.platform.scheduler2.quartz.EmbeddedQuartzSystemListener"/> -->
打开 tomcat/webapps/pentaho/WEB-INF/web.xml,注释以下配置
<!-- [BEGIN HSQLDB DATABASES] --> <!-- <context-param> --> <!-- <param-name>hsqldb-databases</param-name> --> <!-- <param-value>sampledata@../../data/hsqldb/sampledata,hibernate@../../data/hsqldb/hibernate,quartz@../../data/hsqldb/quartz</param-value> --> <!-- </context-param> --> <!-- [END HSQLDB DATABASES] --> <!-- [BEGIN HSQLDB STARTER] --> <!-- <listener> --> <!-- <listener-class>org.pentaho.platform.web.http.context.HsqldbStartupListener</listener-class> --> <!-- </listener> --> <!-- [END HSQLDB STARTER] -->
打开 pentaho-solutions/system/GettingStartedDB-spring.xml & GettingStartedDB.properties,将 <bean> 标签的 init-method="start" 属性更改为 init-method=""
显式设置中文
打开 pentaho-solutions/system/server.properties
locale-language=zh locale-country=CN
Windows 下控制台乱码,将编码改为 GBK
打开 tomcat/conf/logging.properties
java.util.logging.ConsoleHandler.encoding = GBK
打开 start-pentaho.bat
set CATALINA_OPTS=-Xms2048m -Xmx6144m -Dsun.rmi.dgc.client.gcInterval=3600000 -Dsun.rmi.dgc.server.gcInterval=3600000 -Dfile.encoding=GBK -Djava.locale.providers=COMPAT,SPI -DDI_HOME=%DI_HOME%
GenericServlet.ERROR_0004 - 在插件 pentaho-cdf-dd 中未找到资源 /pentaho-cdf-dd/lang/messages_zh.properties
pentaho-solutions\system\common-ui\resources\web\dojo\pentaho\common\nls\messages_zh.properties
pentaho-solutions\system\common-ui\resources\web\compressed\dojo\pentaho\common\nls\messages_zh.properties
pentaho-solutions\system\pentaho-cdf-dd\lang\messages_CN.properties
pentaho-solutions\system\pentaho-cdf-dd\lang\messages_zh_CN.properties
pdi(Pentaho Data Integration、pentaho-kettle)
Spoon
图形化工具,用于快速设计和维护复杂的 ETL 工作流
Kitchen
运行作业的命令行工具
Pan
运行转换的命令行工具
Carte
轻量级的 Web 服务,用来远程执行转换或作业,一个运行有 Carte 进程的机器可作为从服务器,从服务器是 Kettle 集群的一部分
配置文件在 pwd 目录下
启动:Carte 127.0.0.1 8081,默认用户名密码 cluster/cluster,pwd/kettle.pwd 中 OBF 是混淆(Encr 工具,不等同加密),新建账号密码需另起一行,以 username:password 格式添加
https://docs.hitachivantara.com & https://javadoc.pentaho.com
https://www.hitachi-solutions.cn/products/wisdom/pentaho/index.html
https://blog.csdn.net/SUMU31707/article/details/119063746
https://www.cnblogs.com/fuqian/p/16668506.html
https://blog.csdn.net/qq_36434219/article/details/134813608
KettlePack:https://congjing.net/h-col-147.html