HBase实验

承接上一篇HDFS实验,下一篇，NoSQL实验,学习入门课后的实验真的是以简单为主，后续估计得每个组件一一击破。

Hbase介绍

Hbase是一个分布式的、面向列的开源数据库，源于Google的一篇论文《BigTable：一个结构化的数据的分布式存储系统》。HBase中确定一个元素，需要提供表名，行，列族名，列。因为是以列为单位，所以动态增删数据的性能特别好。

安装配置Hbase

基础环境

hadoop version:2.7.7

java version:1.8

Hbase选择

Hbase的版本需要与hadoop版本对应，我使用的是Hbase-1.2.1,基本上没出差错。版本对应链接

Hbase的下载可以去官网直接进行下载。

安装

解压安装包

sudo tar -zxf /mnt/hgfs/windowShare/hbase-1.2.1-bin.tar.gz -C /usr/local
sudo mv /usr/local/hbase-1.2.1 /usr/local/hbase # 改名方便使用

配置环境变量

vim ~/.bashrc # 针对当前用户修改
export Path=$PATh:/usr/local/hbase/bin # 在文件中添加环境变量的最后一行进行修改，保存退出
source ~/.bashrc # 立即生效

配置Hbase权限

sudo chown -R hadoop /usr/local/hbase # 将hbase下所有文件的所有者改为hadoop

查看版本，确定Hbase安装成功
```
hbase version
```

伪分布式配置

/usr/local/hbase/conf/hbase-env.sh

vim /usr/local/hbase/conf/hbase-env.sh

export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_251/bin # Jkd的安装环境
export HBASE_CLASSPATH=/usr/local/hadoop/conf # hadoop的本地安装conf目录
export HBASE_MANAGES_ZK=true

/usr/local/hbase/conf/hbase-site.xml

gedit /usr/local/hbase/conf/hbase-site.xml

<configuration>
        <property>
                <name>hbase.rootdir</name>
                <value>hdfs://localhost:9000/hbase</value>
        </property>
        <property>
                <name>hbase.cluster.distributed</name>
                <value>true</value>
        </property>
</configuration>

hbase.rootdir：Hbase数据在HDFS上的存储路径
hbase.cluster.distributed:true,分布式配置

测试hbase

首先启动hdfs，然后再启动hbase

start-dfs.sh # 启动hdfs 
start-hbase.sh # 启动hbase
jps # 查看是否成功启动
hbase shell # 启动hbase shell

最重要的步骤

在虚拟机记得快照，快照，快照。

双系统的记得备份，备份，备份。

HBase实验

1. 编程实现以下指定功能，并用Hadoop提供的HBase Shell命令完成相同任务：

总体代码，每一小题只需要编写相应的函数。

import java.io.IOException;
import java.util.Scanner;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.CellUtil;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Admin;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.client.Delete;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.client.Table;
import org.apache.hadoop.hbase.util.Bytes;

public class Prac1 {
	
	private static Configuration conf;
	private static Connection conn;
	private static Admin admin;

	public static void main(String[] args) throws IOException {
		init();
		
		int cmd=5;//对应的题目
		switch(cmd) {
		case 1:list();break;
		case 2:showTable();break;
		case 3:choose();break;
		case 4:deleteAll();break;
		case 5:countRows();break;
		}
		close();
	}
    
    public static void init()
	{
		conf = HBaseConfiguration.create();
		conf.set("hbase.rootdir", "hdfs://localhost:9000/hbase");
		try {
			conn = ConnectionFactory.createConnection(conf);
			admin = conn.getAdmin();
		}catch(IOException e) {
			e.printStackTrace();
		}
	}
	
	public static void close()
	{
		try {
			if(admin!=null)
				admin.close();
			if(conn!=null)
				conn.close();
		}catch(IOException e) {
			e.printStackTrace();
		}
	}
}

（1）列出HBase所有的表的相关信息，例如表名；

shell:

list  # 打印表的信息

java:

public static void list() throws IOException
	{
		HTableDescriptor htds[] = admin.listTables();
		for(HTableDescriptor htd:htds)
			System.out.println(htd.getNameAsString());
		System.out.println("输出表信息完成");
	}

（2）在终端打印出指定的表的所有记录数据；

shell:

scan 'student'

java:

public static void showTable() throws IOException
	{
		System.out.print("请输入表名： ");
		Scanner input = new Scanner(System.in);
		String tableName = input.next();
		Table table = conn.getTable(TableName.valueOf(tableName));
		if(table==null) {
			System.out.println("不存在该表");
			return ;
		}
		Scan scan = new Scan();
		ResultScanner rsr = table.getScanner(scan); //遍历列
		for(Result res:rsr)
			showCell(res);
		table.close();
	}
	
	public static void showCell(Result res)
	{
		//一行的内容也要用for来输出，因为是以列为单位的
		Cell[] cells = res.rawCells();
		for(Cell cell:cells) {
			System.out.println("RowName: "+new String(CellUtil.cloneRow(cell))+"\t"
					+"TimeStamp: "+cell.getTimestamp()+"\t"
					+"Column Family: "+new String(CellUtil.cloneFamily(cell))+"\t"
					+"Row Name: "+new String(CellUtil.cloneQualifier(cell))+"\t"
					+"Value: "+new String(CellUtil.cloneValue(cell)));
		}
	}

（3）向已经创建好的表添加和删除指定的列族或列；

shell:

# 添加列和列族，但是列是确定的，列族不是
put 'student','95002','Sname','Mari'
put 'student','95002','course:Chinese','150'
#删除列和列族
delete 'student','95002','Sname'
delete 'student','95002','course:Chinese'

java:

public static void choose() throws IOException
	{
		System.out.print("请输入操作(insert or delete): ");
		Scanner input = new Scanner(System.in);
		String cmd = input.next();
		if(cmd.equals("insert")||cmd.equals("delete"))
			option(cmd);
		else
			System.out.println("错误命令");
	}
	
	public static void option(String opt) throws IOException
	{
		System.out.print("请输入表、行、列族、列、值(space): ");
		Scanner input = new Scanner(System.in);
		String tableName = input.next().replaceFirst("esc","");
		String rowKey = input.next().replaceFirst("esc","");
		String colFamily = input.next().replaceFirst("esc","");
		String col = input.next().replaceFirst("esc","");
		
		
		Table table = conn.getTable(TableName.valueOf(tableName));
		if(table==null){
			System.out.println("该表不存在");
			return;
		}
		
		if(opt.equals("insert"))
		{
			String val = input.next().replaceFirst("esc", "");
			Put put = new Put(rowKey.getBytes());
			put.addColumn(colFamily.getBytes(), col.getBytes(), val.getBytes());
			table.put(put);
		}else
		{
			Delete delete = new Delete(rowKey.getBytes());
			if(!colFamily.equals("")&&col.equals(""))
				delete.addColumn(colFamily.getBytes(),null);
			else if(!colFamily.equals("")&&!col.equals(""))
				delete.addColumn(colFamily.getBytes(),col.getBytes());
			else if(colFamily.equals("")&&!col.equals("")) {
				System.out.println("不存在无列族只有列的情况");
				table.close();
				return;
			}
			table.delete(delete);
		}
		System.out.println("操作完成");
		table.close();
	}

（4）清空指定的表的所有记录数据；

shell:

truncate 'student'

java:

public static void deleteAll() throws IOException
	{
		System.out.print("请输入删除内容的表名： ");
		Scanner input = new Scanner(System.in);
		String tableName = input.next();
		HBaseAdmin tempAdmin = new HBaseAdmin(conf);
		HTableDescriptor htd = tempAdmin.getTableDescriptor(Bytes.toBytes(tableName));
		//获取这个表的信息
		if(htd==null) {
			System.out.println("不存在该表");
			return ;
		}
		TableName table_name = TableName.valueOf(tableName);
		admin.disableTable(table_name);
		admin.deleteTable(table_name);
		admin.createTable(htd);
		System.out.println("成功删除内容");
		tempAdmin.close();
	}

（5）统计表的行数。

shell:

count 'student'

java:

public static void countRows() throws IOException
	{
		System.out.print("请输入查询的表名: ");
		Scanner input = new Scanner(System.in);
		String tableName = input.next();
		Table table = conn.getTable(TableName.valueOf(tableName));
		if(table==null) {
			System.out.println("不存在该表");
			return ;
		}
		Scan scan = new Scan();
		ResultScanner rsr = table.getScanner(scan);
		int count=0;
		for(Result res:rsr)
			count++;
		System.out.println("共有"+count+"行");
	}

2. 现有以下关系型数据库中的表和数据，要求将其转换为适合于HBase存储的表并插入数据：

学生表（Student）

学号（S_No）	姓名（S_Name）	性别（S_Sex）	年龄（S_Age）
2015001	Zhangsan	male	23
2015002	Mary	female	22
2015003	Lisi	male	24

课程表（Course）

课程号（C_No）	课程名（C_Name）	学分（C_Credit）
123001	Math	2.0
123002	Computer Science	5.0
123003	English	3.0

选课表（SC）

学号（SC_Sno）	课程号（SC_Cno）	成绩（SC_Score）
2015001	123001	86
2015001	123003	69
2015002	123002	77
2015002	123003	99
2015003	123001	98
2015003	123002	95

同时，请编程完成以下指定功能：

（1）createTable(String tableName, String[] fields)

创建表，参数tableName为表的名称，字符串数组fields为存储记录各个域名称的数组。要求当HBase已经存在名为tableName的表的时候，先删除原有的表，然后再创建新的表。

（2）addRecord(String tableName, String row, String[] fields, String[] values)

向表tableName、行row（用S_Name表示）和字符串数组files指定的单元格中添加对应的数据values。其中fields中每个元素如果对应的列族下还有相应的列限定符的话，用“columnFamily:column”表示。例如，同时向“Math”、“Computer Science”、“English”三列添加成绩时，字符串数组fields为{“Score:Math”,”Score；Computer Science”,”Score:English”}，数组values存储这三门课的成绩。

（3）scanColumn(String tableName, String column)

浏览表tableName某一列的数据，如果某一行记录中该列数据不存在，则返回null。要求当参数column为某一列族名称时，如果底下有若干个列限定符，则要列出每个列限定符代表的列的数据；当参数column为某一列具体名称（例如“Score:Math”）时，只需要列出该列的数据。

（4）modifyData(String tableName, String row, String column)

修改表tableName，行row（可以用学生姓名S_Name表示），列column指定的单元格的数据。

（5）deleteRow(String tableName, String row)

删除表tableName中row指定的行的记录。

在导入数据的时候，我是利用程序导入的，改编了一下第一题的所有程序，如下：

import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.Reader;
import java.util.Scanner;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.CellUtil;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Admin;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.client.Delete;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.client.Table;
import org.apache.hadoop.hbase.util.Bytes;

public class Prac2 {
	
	private static Configuration conf;
	private static Connection conn;
	private static Admin admin;

	public static void main(String[] args) throws IOException {
		
		init();
		
		String filenamePrefix="/home/hadoop/Desktop/HPractice/HbasePractice/";
		
		String[] t1 = {"S_No","S_Name","S_Sex","S_Age"};
		String t1Name = "Student";
		createTable(t1Name, t1);
		lead(filenamePrefix+t1Name.toLowerCase()+".txt",t1Name,t1,false);
		
		String[] t2 = {"C_No","C_Name","C_Credit"};
		String t2Name = "Course";
		createTable(t2Name, t2);
		lead(filenamePrefix+t2Name.toLowerCase()+".txt",t2Name,t2,false);
		
		String[] t3 = {"SC_Sno","SC_Cno","SC_score"};
		String t3Name = "SC";
		createTable(t3Name, t3);
		lead(filenamePrefix+t3Name.toLowerCase()+".txt",t3Name,t3,true);
		
		close();
	}
	
	public static void lead(String filename,String tableName,String[] colFamilys,boolean dupl) throws IOException
	{
		Table table = conn.getTable(TableName.valueOf(tableName));
		if(table==null){
			System.out.println("该表不存在");
			return;
		}
		FileInputStream is = new FileInputStream(filename);
		BufferedReader bfr = new BufferedReader(new InputStreamReader(is));
		String line;
		String rowKey,colFamily,col,val;
		int j;
		int count=1;
		while((line = bfr.readLine())!=null)
		{
			String[] values = line.split(" ");
			
			if(values.length!=colFamilys.length)
			{
				System.out.println(count+"行读取失败");
				count++;
				continue;
			}
			
			if(!dupl)
			{
				rowKey = values[0];
				j=1;
			}else
			{
				rowKey = String.valueOf(count);
				j=0;
			}
			
			for(int i=j;i<values.length;i++)
			{
				if(colFamilys[i].contains(":"))
				{
					colFamily = colFamilys[i].split(":")[0];
					col = colFamilys[i].split(":")[1];
				}else {
					colFamily = colFamilys[i];
					col = "";
				}
				option(table,"insert",rowKey,colFamily,col,values[i]);
			}
			count++;
		}
		
		table.close();
		System.out.println("导入"+tableName+"成功");
	}
	
	public static void init()
	{
		conf = HBaseConfiguration.create();
		conf.set("hbase.rootdir", "hdfs://localhost:9000/hbase");
		try {
			conn = ConnectionFactory.createConnection(conf);
			admin = conn.getAdmin();
		}catch(IOException e) {
			e.printStackTrace();
		}
	}

	public static void close()
	{
		try {
			if(admin!=null)
				admin.close();
			if(conn!=null)
				conn.close();
		}catch(IOException e) {
			e.printStackTrace();
		}
	}
	
	public static void createTable(String tableName,String[] colFamily) throws IOException
	{
		HTableDescriptor htd = null;
		TableName table = TableName.valueOf(tableName);
		if(admin.tableExists(table))
		{
			System.out.println("表已经存在");
			return ;
		}else
		{
			htd = new HTableDescriptor(table);
		}
		
		String realInfo;
		for(String info:colFamily) 
		{
			if(info.contains(":"))
				realInfo = info.split(":")[0];
			else
				realInfo = info;
			htd.addFamily(new HColumnDescriptor(new String(realInfo)));
		}
		admin.createTable(htd);
		System.out.println("成功创建");
		
	}
	
	public static void list() throws IOException
	{
		HTableDescriptor htds[] = admin.listTables();
		for(HTableDescriptor htd:htds)
			System.out.println(htd.getNameAsString());
		System.out.println("输出表信息完成");
	}
	
	public static void showTable(String tableName) throws IOException
	{
		Table table = conn.getTable(TableName.valueOf(tableName));
		if(table==null) {
			System.out.println("不存在该表");
			return ;
		}
		Scan scan = new Scan();
		ResultScanner rsr = table.getScanner(scan); //遍历列
		for(Result res:rsr)
			showCell(res);
		table.close();
	}
	
	public static void showCell(Result res)
	{
		//一行的内容也要用for来输出，因为是以列为单位的
		Cell[] cells = res.rawCells();
		for(Cell cell:cells) {
			System.out.println("RowName: "+new String(CellUtil.cloneRow(cell))+"\t"
					+"TimeStamp: "+cell.getTimestamp()+"\t"
					+"Column Family: "+new String(CellUtil.cloneFamily(cell))+"\t"
					+"Row Name: "+new String(CellUtil.cloneQualifier(cell))+"\t"
					+"Value: "+new String(CellUtil.cloneValue(cell)));
		}
	}
		
	public static void option(Table table,String opt,String rowKey,String colFamily,String col,String val) throws IOException
	{	
		
		
		if(opt.equals("insert"))
		{
			Put put = new Put(rowKey.getBytes());
			put.addColumn(colFamily.getBytes(), col.getBytes(), val.getBytes());
			table.put(put);
		}else
		{
			Delete delete = new Delete(rowKey.getBytes());
			if(!colFamily.equals("")&&col.equals(""))
				delete.addColumn(colFamily.getBytes(),null);
			else if(!colFamily.equals("")&&!col.equals(""))
				delete.addColumn(colFamily.getBytes(),col.getBytes());
			else if(colFamily.equals("")&&!col.equals("")) {
				System.out.println("不存在无列族只有列的情况");
				table.close();
				return;
			}
			table.delete(delete);
		}
		System.out.println("操作完成");
	}
	
	public static void deleteAll(String tableName) throws IOException
	{
		HBaseAdmin tempAdmin = new HBaseAdmin(conf);
		HTableDescriptor htd = tempAdmin.getTableDescriptor(Bytes.toBytes(tableName));
		//获取这个表的信息
		if(htd==null) {
			System.out.println("不存在该表");
			return ;
		}
		TableName table_name = TableName.valueOf(tableName);
		admin.disableTable(table_name);
		admin.deleteTable(table_name);
		admin.createTable(htd);
		System.out.println("成功删除内容");
		tempAdmin.close();
	}
	
	public static void countRows(String tableName) throws IOException
	{
		Table table = conn.getTable(TableName.valueOf(tableName));
		if(table==null) {
			System.out.println("不存在该表");
			return ;
		}
		Scan scan = new Scan();
		ResultScanner rsr = table.getScanner(scan);
		int count=0;
		for(Result res:rsr)
			count++;
		System.out.println("共有"+count+"行");
	}
}

结果如下：

没有想到的是基本上是完成接下来的编程任务，只需要改一改形参和微调下代码就好了，比如说deleteRow。删除一行的记录，可以采用下面的方式：

String rowKey = "???"//你想删除的那一行行键
tablename = TableName.valueOf(tableName);
if(!admin.tableExists(tablename))
{
    System.out.println("不存在该表");
    exit(1);
}
Table table = conn.getTable(tablename);
option(table,"delete",rowKey,"","","");
table.close();

同时，我也编写了一些完成以上任务的函数，有scanColumn和modifyData：

public static void scanColumn(String tableName,String columns) throws IOException
	{
		TableName tablename = TableName.valueOf(tableName);
		
		if(!admin.tableExists(tablename)) {
			System.out.println("不存在该表");
			return;
		}
		Table table = conn.getTable(tablename);
		Scan scan = new Scan();
		//scan 对列的要求，ResultScanner是对行的遍历，如果scan为空的话，showCell中的cells就是所有的列族和里面的列，反之cells则是确定的列族或者列
		String[] info = columns.split(":");
		if(info.length==1)
			scan.addFamily(columns.getBytes());
		else
			scan.addColumn(info[0].getBytes(), info[1].getBytes());
		
		ResultScanner rsr = table.getScanner(scan);
		for(Result res = rsr.next();res!=null;res = rsr.next())
			showCell(res);
	}

	public static void modifyData(String tableName,String row,String column) throws IOException
	{
		TableName tablename = TableName.valueOf(tableName);
		if(!admin.tableExists(tablename))
		{
			System.out.println("该表不存在");
			return ;
		}
		System.out.print("输入要修改成的值： ");
		Scanner input = new Scanner(System.in);
		String val = input.next();
		
		Table table = conn.getTable(tablename);
		Put put = new Put(row.getBytes());
		
		String[] info = column.split(":");
		if(info.length==1)
			put.addColumn(info[0].getBytes(),"".getBytes(), val.getBytes());
		else
			put.addColumn(info[0].getBytes(),info[1].getBytes(), val.getBytes());
		table.put(put);
		System.out.println("修改成功");
		table.close();
	}

这些都是我经过验证了的，如果有错误，期待大佬指出。

3. 利用HBase和MapReduce完成如下任务：

假设HBase有2张表，表的逻辑视图及部分数据如下所示：

表逻辑视图及部分数据

书名（bookName）	价格（price）
Database System Concept	30$
Thinking in Java	60$
Data Mining	25$

要求：从HBase读出上述两张表的数据，对“price”的排序，并将结果存储到HBase中。

create 'book','price','bookname'
put 'book','30$','bookname','Datavase System Concept'
put 'book','60$','bookname','Thinking in Java'
put 'book','25$','bookname','Data Mining'
# scan 一个表的时候就会自动对行键进行排序
scan 'book'

总结

总结一下学到的东西。

HTableDescriptor是Hbase表的描述子，获取这个信息后就可以创建相同的表。
scan是选择列的可以不加上rowKey,而get必须加上rowKey。
ResultScanner是对Result，也就是对行的遍历，如果ResultScanner中的scan没加任何条件那么每个Result中每个单位，也就是列，列族都会被选出来，反之加了条件，每个Result就会选出条件的列或者返回null。
~~如果还有继续更新~~

人生此处，绝对乐观

posted @ 2020-07-05 09:36 CodeDancing 阅读(3843) 评论(1) 编辑收藏举报

刷新页面返回顶部

人工巨佬

HBase实验

Hbase介绍

安装配置Hbase

基础环境

Hbase选择

安装

伪分布式配置

最重要的步骤

HBase实验

1. 编程实现以下指定功能，并用Hadoop提供的HBase Shell命令完成相同任务：

2. 现有以下关系型数据库中的表和数据，要求将其转换为适合于HBase存储的表并插入数据：

3. 利用HBase和MapReduce完成如下任务：

总结

人生此处，绝对乐观

公告