大数据基础之Kudu(4)spark读写kudu

spark2.4.3+kudu1.9

 

1 批量读

val df = spark.read.format("kudu")
      .options(Map("kudu.master" -> "master:7051", "kudu.table" -> "impala::test_db.test_table"))
      .load
df.createOrReplaceTempView("tmp_table")
spark.sql("select * from tmp_table limit 10").show()

2 批量写

import org.apache.kudu.spark.kudu.{KuduContext, KuduWriteOptions}

val kuduMaster = "master:7051"
val table = "impala::test_db.test_table"

val kuduContext = new KuduContext(kuduMaster, sc)

kuduContext.upsertRows(df, table, new KuduWriteOptions(false, true))

3 单个读/条件读

cd $SPARK_HOME
bin/spark-shell --packages org.apache.kudu:kudu-spark2_2.11:1.9.0

import org.apache.kudu.client.{KuduPredicate, RowResult}
import org.apache.kudu.spark.kudu.KuduContext

val kuduMaster = "master:7051"
val table = "impala::test_db.test_table"

val kuduContext = new KuduContext(kuduMaster, sc)
val table = kuduContext.syncClient.openTable(table)
val predicate = KuduPredicate.newComparisonPredicate(table.getSchema().getColumn("id"),KuduPredicate.ComparisonOp.EQUAL, "testid")
val scanner = kuduContext.syncClient.newScannerBuilder(table).addPredicate(predicate).build()

scanner.hasMoreRows
val rows = scanner.nextRows
rows.hasNext
val row = rows.next

println(row.getString(0))

4 单个写

cd $SPARK_HOME
bin/spark-shell --packages org.apache.kudu:kudu-spark2_2.11:1.9.0

import org.apache.kudu.client.{KuduPredicate, RowResult}
import org.apache.kudu.spark.kudu.KuduContext
import org.apache.kudu.client.SessionConfiguration

val kuduMaster = "172.26.192.219:7051"

val kuduContext = new KuduContext(kuduMaster, sc)
val kuduClient = kuduContext.syncClient
val kuduTable = kuduClient.openTable("impala::dataone_xishaoye.tbl_order_union")
val kuduSession = kuduClient.newSession()

//AUTO_FLUSH_BACKGROUND AUTO_FLUSH_SYNC MANUAL_FLUSH
kuduSession.setFlushMode(SessionConfiguration.FlushMode.AUTO_FLUSH_SYNC)
kuduSession.setMutationBufferSpace(1000)

val insert = kuduTable.newInsert()
val row = insert.getRow()
row.addString(0, "hello")
kuduSession.apply(insert)
//kuduSession.flush

其他:newInsert/newUpdate/newDelete/newUpsert

5 错误定位

如果apply之后发现修改没有生效,并且确认已经提交,可能有报错(不会抛异常),需要从OperationResponse中打印错误信息

val opResponse = session.apply(op)
if (opResponse != null && opResponse.hasRowError) println(opResponse.getRowError.toString)

注意一定要使用FlushMode.AUTO_FLUSH_SYNC,详见源代码

org.apache.kudu.client.KuduSession

    public OperationResponse apply(Operation operation) throws KuduException {
        while(true) {
            try {
                Deferred<OperationResponse> d = this.session.apply(operation);
                if(this.getFlushMode() == FlushMode.AUTO_FLUSH_SYNC) {
                    return (OperationResponse)d.join();
                }

                return null;
            } catch (PleaseThrottleException var5) {
                PleaseThrottleException ex = var5;

                try {
                    ex.getDeferred().join();
                } catch (Exception var4) {
                    LOG.error("Previous batch had this exception", var4);
                }
            } catch (Exception var6) {
                throw KuduException.transformException(var6);
            }
        }
    }

 

参考:

https://kudu.apache.org/docs/developing.html

 

posted @ 2019-05-15 10:43  匠人先生  阅读(5179)  评论(0编辑  收藏  举报