'好程序員大數據學習路線分享hive的數據類型'

Hive 程序員數據庫大數據 UNIX HDFS 跳槽那些事兒好程序員 2019-08-09

好程序員大數據學習路線分享hive的數據類型：1.基本數據類型

類型

描述

示例

TINYINT

1字節有符號整數

SMALLINT

2字節有符號整數

INT

4字節有符號整數

BIGINT

8字節有符號整數

FLOAT

4字節單精度浮點數

1.0

DOUBLE

8字節雙精度浮點數

1.0

BOOLEAN

true/false

TRUE

STRING

字符串

‘a’,”a”

BINARY

字節數組

TIMESTAMP

精度到納秒的時間戳

132550245000，‘2016-01-01 03:04:05.123456789'

新增數據類型TIMESTAMP的值可以是：

• 整數：距離Unix新紀元時間（1970年1月1日，午夜12點）的秒數

• 浮點數：距離Unix新紀元時間的秒數，精確到納秒（小數點後保留9位數）

• 字符串：JDBC所約定的時間字符串格式，格式為：YYYY-MM-DD hh:mm:ss:fffffffff

BINARY數據類型用於存儲變長的二進制數據。

2.複雜數據類型

類型

描述

示例

ARRAY

一組有序字段，字段的類型必須相同

array(1,2)

MAP

一組無需的鍵值對，鍵的類型必須是原子的，值可以是任何類型。同一個映射的鍵的類型必須相同，值的類型也必須相同。

map(‘a’,1,’b’,2)

STRUCT

一組命名的字段，字段的類型可以不同

struct(‘a’,1,1,0)

3.數據類型應用舉例

好程序員大數據學習路線分享hive的數據類型：1.基本數據類型

類型

描述

示例

TINYINT

1字節有符號整數

SMALLINT

2字節有符號整數

INT

4字節有符號整數

BIGINT

8字節有符號整數

FLOAT

4字節單精度浮點數

1.0

DOUBLE

8字節雙精度浮點數

1.0

BOOLEAN

true/false

TRUE

STRING

字符串

‘a’,”a”

BINARY

字節數組

TIMESTAMP

精度到納秒的時間戳

132550245000，‘2016-01-01 03:04:05.123456789'

新增數據類型TIMESTAMP的值可以是：

• 整數：距離Unix新紀元時間（1970年1月1日，午夜12點）的秒數

• 浮點數：距離Unix新紀元時間的秒數，精確到納秒（小數點後保留9位數）

• 字符串：JDBC所約定的時間字符串格式，格式為：YYYY-MM-DD hh:mm:ss:fffffffff

BINARY數據類型用於存儲變長的二進制數據。

2.複雜數據類型

類型

描述

示例

ARRAY

一組有序字段，字段的類型必須相同

array(1,2)

MAP

一組無需的鍵值對，鍵的類型必須是原子的，值可以是任何類型。同一個映射的鍵的類型必須相同，值的類型也必須相同。

map(‘a’,1,’b’,2)

STRUCT

一組命名的字段，字段的類型可以不同

struct(‘a’,1,1,0)

3.數據類型應用舉例

##創建員工表，使用默認分割符

CREATE TABLE employee(

name STRING,

salary FLOAT,

leader ARRAY<STRING>,

deductions MAP<STRING,FLOAT>,

address STRUCT<street:STRING,city:STRING,state:STRING,zip:INT>

)

;

4.列的分割符

HiveQL文本文件數據編碼表

類型

描述

\\n

對於文本文件來說，每行都是一條記錄，因此換行符可以分割記錄

^A(Ctrl+A)

用於分隔字段（列）。在CREATE TABLE語句中可以使用八進制編碼\\001表示

用於分隔ARRARY或者STRUCT中的元素，或用於MAP中鍵-值對之間的分隔。在CREATE TABLE語句中可以使用八進制編碼\\002表示

用於MAP中鍵和值之間的分隔。在CREATE TABLE語句中可以使用八進制編碼\\003表示

CREATE TABLE employee(

name STRING,

salary FLOAT,

subordinates ARRAY<STRING>,

deductions MAP<STRING,FLOAT>,

address STRUCT<street:STRING,city:STRING,state:STRING,zip:INT>

)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY '\\001'

COLLECTION ITEMS TERMINATED BY '\\002'

MAP KEYS TERMINATED BY '\\003'

LINES TERMINATED BY '\\n'

STORED AS TEXTFILE;

• [ROW FORMAT DELIMITED]關鍵字，是用來設置創建的表在加載數據的時候，支持的列分隔符；

• FIELDS TERMINATED BY '\\001' ，字符\\001是^A的八進制數。這個子句表明Hive將使用^A字符作為列分隔符。

• COLLECTION ITEMS TERMINATED BY '\\002' ，字符\\002是^B的八進制數。這個子句表明Hive將使用^B字符作為集合元素的分隔符。

• MAP KEYS TERMINATED BY '\\003' ，字符\\003是^C的八進制數。這個子句表明Hive將使用^C字符作為map的鍵和值之間的分隔符。

• LINES TERMINATED BY '\\n' 、STORED AS TEXTFILE這個兩個子句不需要ROW FORMAT DELIMITED 關鍵字

• Hive目前對於LINES TERMINATED BY…僅支持字符‘\\n’，行與行之間的分隔符只能為‘\\n’。

hive的基本命令

1.數據庫的創建：

本質上是在hdfs上創建一個目錄，使用comment加入數據庫的描述信息，描述信息放在引號裡。數據庫的屬性信息放在描述信息之後用with dbproperties 加入，屬性信息放在括號內，屬性名和屬性值放在引號裡，用等號連接有多條屬性用逗號分隔

##創建一個數據庫名為myhive,加入描述信息及屬性信息

create database myhive comment 'this is myhive db'

with dbproperties ('author'='me','date'='2018-4-21')

;

##查看屬性信息

describe database extended myhive;

##在原有數據庫基礎上加入新的屬性信息

alter database myhive set dbproperties ('id'='1');

##切換庫

use myhive;

##刪除數據庫

drop database myhive;

2.表的創建

默認創建到當前數據庫(default是hive默認庫)，創建表的本質也是在hdfs上創建一個目錄

==================練習array的使用，本地數據加載，對比hive與mysql的區別========================

##創建數據array.txt映射表t_array

create table if not exists t_array(

id int comment 'this is id',

score array<tinyint>

)

comment 'this is my table'

row format delimited fields terminated by ','

collection items terminated by '|'

tblproperties ('id'='11','author'='me')

;

##從本地加載數據array.txt文件

load data local inpath '/testdata/array.txt' into table t_array;

##查詢表裡面的數據

select * from t_array;

##查詢id=1的第一條成績信息

select score[0] from t_array where id=1;

##查詢id=2的成績條數

select size(score) from t_array where id=2;

##查詢一共有多少條數據

select count(*) from t_array;

##把arra1.txt追加的方式從本地加載進這個表中

load data local inpath '/testdata/array1.txt' into table t_array;

##把test.txt追加的方式從本地加載進這個表中

load data local inpath '/testdata/test.txt' into table t_array;

##從本地覆蓋方式加載數據array.txt文件至t_array表中

load data local inpath '/testdata/array.txt' overwrite into table t_array;

====================練習map的使用，查看錶的創建過程，創建表的同時指定數據位置===================

##創建數據map.txt的映射表t_map

create table if not exists t_map(

id int,

score map<string,int>

)

row format delimited fields terminated by ','

collection items terminated by '|'

map keys terminated by ':'

stored as textfile

;

##從hdfs加載數據，map.txt在hdfs上的位置位置被移動。

load data local inpath '/testdata/map.txt' into table t_map;

##查詢id=1的數學成績

select score['math'] from t_map where id=1;

##查詢每個人考了多少科

select size(score) from t_map;

##查看錶的創建過程

show create table t_map;

CREATE TABLE `t_map1`(

`id` int,

`score` map<string,int>)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY ','

COLLECTION ITEMS TERMINATED BY '|'

MAP KEYS TERMINATED BY ':'

STORED AS INPUTFORMAT

'org.apache.hadoop.mapred.TextInputFormat'

OUTPUTFORMAT

'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

LOCATION

'hdfs://linux5:8020/user/hive/warehouse/t_map'

;

##創建表的同時指定數據的位置

create table if not exists t_map2(

id int,

score map<string,int>

)

row format delimited fields terminated by ','

collection items terminated by '|'

map keys terminated by ':'

stored as textfile

location '/test'

;

##刪除表

drop table test2;

====================練習struct的使用，外部表的創建，總結內部表外部表的區別=====================

##創建數據struct.txt的映射表t_struct(使用external關鍵字並指定數據位置創建外部表)

create external table if not exists t_struct(

id int,

grade struct<score:int,desc:string,point:string>

)

row format delimited fields terminated by ','

collection items terminated by '|'

location '/external'

##查看score>90的信息

select * from t_struct where grade.score>90;

##創建外部表t_struct1

create external table if not exists t_struct1(

id int,

grade struct<score:int,desc:string,point:string>

)

row format delimited fields terminated by ','

collection items terminated by '|'

;

##insert into 方式追加數據

insert into table t_struct1 select * from t_struct;

##刪除表：只有元數據被刪除，數據文件仍然存儲在hdfs上

drop table t_struct;

3.為hive表加載數據：

將數據文件copy到對應的表目錄下面(如果是hdfs上的目錄，將是剪切)。

##load方式從本地加載數據，會將數據拷貝到表所對應的hdfs目錄

#追加

load data local inpath '本地數據路徑' into table tablename

#覆蓋

load data local inpath '本地數據路徑' overwrite into table tablename

##load方式從hdfs加載數據,會將數據移動到對應的hdfs目錄

#追加

load data inpath 'hdfs數據路徑' into table tablename

#覆蓋

load data inpath 'hdfs數據路徑' into table tablename

##通過查詢語句向表中插入數據

#追加

insert into table table1 select * from table2

#覆蓋

insert overwrite into table table1 select * from table2

4.內部表與外部表

內部表：在Hive 中創建表時，默認情況下Hive 負責管理數據。即，Hive 把數據移入它的"倉庫目錄" (warehouse directory)

外部表：由用戶來控制數據的創建和刪除。外部數據的位置需要在創建表的時候指明。使用EXTERNAL 關鍵字以後， Hìve 知道數據並不由自己管理，因此不會把數據移到自己的倉庫目錄。事實上，在定義時，它甚至不會檢查這一外部位置是否存在。這是一個非常重要的特性，因為這意味著你可以把創建數據推遲到創建表之後才進行。

區別：丟棄內部表時，這個表(包括它的元數據和數據)會被一起刪除。丟棄外部表時，Hive 不會碰數據，只會刪除元數據，而不會刪除數據文件本身

5.表屬性修改

##創建表log2

CREATE external TABLE log2(

id string COMMENT 'this is id column',

phonenumber bigint,

mac string,

ip string,

url string,

status1 string,

status2 string,

up int,

down int,

code int,

dt String

)

COMMENT 'this is log table' ##加入描述信息

ROW FORMAT DELIMITED FIELDS TERMINATED BY ' '

LINES TERMINATED BY '\\n'

stored as textfile;

##加載數據

load local data inpath '/home/data.log.txt' into table log2;

修改表名：rename to

alter table原名rename to 新名

alter table log rename to log2;

修改列名：change column

alter table 表名 change column 字段名新字段名字段類型【描述信息】;

##修改列名

alter table log4 change column ip myip String;

##修改列名同時加入列的描述

alter table log4 change column myip ip String comment 'this is mysip' ;

##使用after關鍵字，將修改後的字段放在某個字段後

alter table log4 change column myip ip String comment 'this is myip' after code;

##使用first關鍵字。將修改的字段調整到第一個字段

alter table log4 change column ip myip int comment 'this is myip' first;

添加列：add columns

##添加列，使用add columns,後面跟括號，括號裡面加要加入的字段及字段描述，多個字段用逗號分開

alter table log4 add columns(

x int comment 'this x',

y int

);

刪除列：

##刪除列，使用replace columns,後面跟括號，括號裡面加要刪除的字段，多個字段用逗號分開

alter table log4 replace columns(x int,y int);

alter table log4 replace columns(

myip int,

id string,

phonenumber bigint,

mac string,

url string,

status1 string,

status2 string,

up int,

down int,

code int,

dt string

);

將內部錶轉換為外部表:

alter table log4 set tblproperties(

'EXTERNAL' = 'TRUE'

);

alter table log4 set tblproperties(

'EXTERNAL' = 'false'

);

alter table log4 set tblproperties(

'EXTERNAL' = 'FALSE'

);

'好程序員大數據學習路線分享hive的數據類型'

相關推薦