使用Scala代码删除hbase数据库当中的数据

这里只是记录下删除HBase数据的一个简单方法,其他的删除方式大家可以发散思维。代码如下:

// 根据时间删除错误数据
  private def rmRazorError(table: String)(implicit args: Array[String]): Unit = {
    var isSucc = false
    var msg = ""
    val JOB_NAME = s"$table-$rmDay"
    val jobID =  s"$JOB_NAME-" + workID
    if (SQLLogger.isJobSucc(jobID)) {
      msg = jobID + " has already been executed successfully."
      log.info(msg)
      isSucc = true
      return
    }
    SQLLogger.insJobStart(workID, jobID, JOB_NAME)
    log.info(s"$JOB_NAME start ...")
    val hTable = Config.getHBaseConn.getTable(table)
    hTable.setAutoFlushTo(false)
    try {
      // Get the parameter of work
      val Array(startTime, endTime) = args
      // 删除操作
      val delRow = (r: Result) => {
        val row = r.getRow
        log.info("Deleting row: " + Bytes.toString(row))
        hTable.delete(new Delete(row))
      }
      var tmpTime = startTime
      // foreach to delete the row
      while(tmpTime.compare(endTime) <= 0) {
        //val hTable: HTableInterface = Config.getHBaseConn.getTable(table)

        log.info(s"Deleting rows in table: $table" + " using " +tmpTime)

        val scan = new Scan()
        val rowFilter1 = new RowFilter(CompareFilter.CompareOp.EQUAL,
          new RegexStringComparator(".*-"+tmpTime+".*"))
        scan.setFilter(rowFilter1)

        val rs2 = hTable.getScanner(scan).toIterator
        rs2.foreach(delRow)
        tmpTime = getBeforeOneDay(tmpTime)
      }
      isSucc = true
    } catch {
      case ex:Exception => {isSucc = false; msg = s"$table's job is failed"; finalSucc = false; isSucc = isSucc&&finalSucc}
    } finally {
      hTable.flushCommits()
      hTable.close()
    }
    SQLLogger.insJobEnd(jobID, isSucc, msg)
    log.info(s"$JOB_NAME end.")
  }

代码当中的table为表的名称,同时拥有两个隐式参数startTime和endTime。该例子是讲startTime到endTime之间的所有的表中的数据给删除掉。删除的依据就是rowKey当中的yyyyMMdd这个时间值,如果你的rowKey当中有这个字段,可以依据此条件进行删除。

posted @ 2018-04-21 13:08  yarcl  阅读(459)  评论(0编辑  收藏  举报