Blog

Hive 介绍

Posted by Kylin Soong on Jul 8, 2015 categorized as data

Installation

This section including step by step procedures for installing Apache Hive.

Step.1 Prerequisites

Hadoop is the prerequisite, refer to http://ksoong.org/hadoop-intro/ Installation.

Step.2 Install

$ tar -xvf apache-hive-1.2.1-bin.tar.gz
$ cd apache-hive-1.2.1-bin

Step.3 Configure

Create a ‘hive-env.sh’ under ‘conf’

$ cd conf/
$ cp hive-env.sh.template hive-env.sh
$ vim hive-env.sh

comment out HADOOP_HOME and make sure point to a valid Hadoop home, for example:

HADOOP_HOME=/home/kylin/server/hadoop-1.2.1

Navigate to Hadoop Home, create ‘/tmp’ and ‘/user/hive/warehouse’ and chmod g+w in HDFS before running Hive:

$ ./bin/hadoop fs -mkdir /tmp
$ ./bin/hadoop fs -mkdir /user/hive/warehouse
$ ./bin/hadoop fs -chmod g+w /tmp
$ ./bin/hadoop fs -chmod g+w /user/hive/warehouse
$ ./bin/hadoop fs -chmod 777 /tmp/hive

NOTE: Restart Hadoop services is needed, this for avoid ‘java.io.IOException: Filesystem closed’ in DFSClient check Open.

Step.4 Start and Test

$ ./bin/hive
hive>

Create/Drop database:

hive> CREATE DATABASE userdb;
hive> DROP DATABASE IF EXISTS userdb;

Create/Alter/Drop Table

hive> CREATE TABLE IF NOT EXISTS employee (eid int, name String, salary String, destination String) STORED AS TEXTFILE;
// alternative
hive> CREATE TABLE IF NOT EXISTS employee (eid int, name String, salary String, destination String) COMMENT 'Employee details' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' STORED AS TEXTFILE;

hive> LOAD DATA LOCAL INPATH '/home/kylin/hive-sample.txt' OVERWRITE INTO TABLE employee;

hive> SELECT * FROM employee;

hive> ALTER TABLE employee RENAME TO emp;
hive> ALTER TABLE employee CHANGE name ename String;
hive> ALTER TABLE emp CHANGE salary salary Double;
hive> ALTER TABLE emp ADD COLUMNS(dept STRING COMMENT 'Department name');

hive> DROP TABLE IF EXISTS emp;

Configure and Start HiveServer2

Configure

Create a ‘hive-site.xml’ file under conf folder

$ cd apache-hive-1.2.1-bin/conf/
$ touch hive-site.xml

Edit the ‘hive-site.xml’, add the following content:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <property>
        <name>hive.server2.thrift.min.worker.threads</name>
        <value>5</value>
    </property>
    <property>
        <name>hive.server2.thrift.max.worker.threads</name>
        <value>500</value>
    </property>
    <property>
        <name>hive.server2.thrift.port</name>
        <value>10000</value>
    </property>
    <property>
        <name>hive.server2.thrift.bind.host</name>
        <value>0.0.0.0</value>
    </property>
</configuration>

NOTE: there are other Optional properties, more refer to Setting+Up+HiveServer2

Start

$ ./bin/hiveserver2

HiveJdbcClient

Connection conn = DriverManager.getConnection("jdbc:hive2://192.168.1.105:10000/default", "hive", "");
Statement stmt = conn.createStatement();
ResultSet rs = stmt.executeQuery("SELECT * FROM employee");

If HiveJdbcClient within Maven project, add the following dependency:

<dependency>
	<groupId>org.apache.hive</groupId>
	<artifactId>hive-jdbc</artifactId>
	<version>1.2.1</version>
</dependency>
<dependency>
	<groupId>org.apache.hadoop</groupId>
	<artifactId>hadoop-core</artifactId>
	<version>1.2.1</version>
</dependency>

To run HiveJdbcClient in no Maven project, need add jars in the classpath, refer to HiveServer2Clients-JDBCClientSampleCode.