hiveを利用して、データベースの作成、データの挿入、検索を行なってみます。
RDBのように通常のSQLが利用できます。
[hdspark@node01 ~]$ hive
Logging initialized using configuration in jar:file:/opt/hive/lib/hive-common-1.2.1.jar!/hive-log4j.properties
hive> create database mydb;
OK
Time taken: 1.786 seconds
hive> use mydb;
OK
Time taken: 1.277 seconds
hive> create table uryo (time string,text string,uryo int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
OK
Time taken: 0.683 seconds
hive> desc uryo;
OK
time string
text string
uryo int
Time taken: 0.116 seconds, Fetched: 3 row(s)
まずはデータベースを作成し、uryoというテーブルを作成します。
hive> LOAD DATA LOCAL INPATH "/tmp/Tokyo_Suii_20151215.csv" INTO TABLE uryo;
Loading data to table mydb.uryo
Table mydb.uryo stats: [numFiles=1, totalSize=248952]
OK
Time taken: 2.431 seconds
東京都水防災総合情報システム
http://www.kasen-suibo.metro.tokyo.jp/im/other/tsim0110g.html
こちらからダウンロードできるCSVを作成したテーブルにロードしました。
hive> SELECT text,sum(uryo) FROM uryo WHERE text LIKE "新%" GROUP BY text order y text;
Query ID = hdspark_20151218010651_89554c13-68ab-4cd8-b321-38cda9d34918
Total jobs = 2
Launching Job 1 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2015-12-18 01:06:53,902 Stage-1 map = 100%, reduce = 100%
Ended Job = job_local1762820536_0001
Launching Job 2 out of 2
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2015-12-18 01:06:55,285 Stage-2 map = 100%, reduce = 100%
Ended Job = job_local1968841378_0002
MapReduce Jobs Launched:
Stage-Stage-1: HDFS Read: 596976 HDFS Write: 596976 SUCCESS
Stage-Stage-2: HDFS Read: 596976 HDFS Write: 596976 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
新橋戸橋 4896
新河岸橋 49109
新田橋 10517
新鶴見橋 7488
Time taken: 4.113 seconds, Fetched: 4 row(s)
SELECTや集計関数、LIKE句も利用できます。
hive> SELECT COUNT(*) FROM uryo;
・
・
・
OK
30240
Time taken: 4.848 seconds, Fetched: 1 row(s)
hive> SELECT CONCAT(SUBSTR(time,1,2),':00') AS hour,SUM(uryo)
> FROM uryo
> WHERE text LIKE '渋谷%'
> GROUP BY CONCAT(SUBSTR(time,1,2),':00') ORDER BY hour;
・
・
・
00:00 177
01:00 174
02:00 173
03:00 169
04:00 173
05:00 173
06:00 168
07:00 173
08:00 178
09:00 180
10:00 180
11:00 180
12:00 180
13:00 180
14:00 180
15:00 180
16:00 155
17:00 142
18:00 175
19:00 180
20:00 180
21:00 179
22:00 180
23:00 180
Time taken: 6.688 seconds, Fetched: 24 row(s)
様々なSQL文を実行しています。
最後のSQL文は時間ごとの雨量の合計を集計関数を利用して算出しています。
RDBのように通常のSQLが利用できます。
[hdspark@node01 ~]$ hive
Logging initialized using configuration in jar:file:/opt/hive/lib/hive-common-1.2.1.jar!/hive-log4j.properties
hive> create database mydb;
OK
Time taken: 1.786 seconds
hive> use mydb;
OK
Time taken: 1.277 seconds
hive> create table uryo (time string,text string,uryo int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
OK
Time taken: 0.683 seconds
hive> desc uryo;
OK
time string
text string
uryo int
Time taken: 0.116 seconds, Fetched: 3 row(s)
まずはデータベースを作成し、uryoというテーブルを作成します。
hive> LOAD DATA LOCAL INPATH "/tmp/Tokyo_Suii_20151215.csv" INTO TABLE uryo;
Loading data to table mydb.uryo
Table mydb.uryo stats: [numFiles=1, totalSize=248952]
OK
Time taken: 2.431 seconds
東京都水防災総合情報システム
http://www.kasen-suibo.metro.tokyo.jp/im/other/tsim0110g.html
こちらからダウンロードできるCSVを作成したテーブルにロードしました。
hive> SELECT text,sum(uryo) FROM uryo WHERE text LIKE "新%" GROUP BY text order y text;
Query ID = hdspark_20151218010651_89554c13-68ab-4cd8-b321-38cda9d34918
Total jobs = 2
Launching Job 1 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2015-12-18 01:06:53,902 Stage-1 map = 100%, reduce = 100%
Ended Job = job_local1762820536_0001
Launching Job 2 out of 2
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2015-12-18 01:06:55,285 Stage-2 map = 100%, reduce = 100%
Ended Job = job_local1968841378_0002
MapReduce Jobs Launched:
Stage-Stage-1: HDFS Read: 596976 HDFS Write: 596976 SUCCESS
Stage-Stage-2: HDFS Read: 596976 HDFS Write: 596976 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
新橋戸橋 4896
新河岸橋 49109
新田橋 10517
新鶴見橋 7488
Time taken: 4.113 seconds, Fetched: 4 row(s)
SELECTや集計関数、LIKE句も利用できます。
hive> SELECT COUNT(*) FROM uryo;
・
・
・
OK
30240
Time taken: 4.848 seconds, Fetched: 1 row(s)
hive> SELECT CONCAT(SUBSTR(time,1,2),':00') AS hour,SUM(uryo)
> FROM uryo
> WHERE text LIKE '渋谷%'
> GROUP BY CONCAT(SUBSTR(time,1,2),':00') ORDER BY hour;
・
・
・
00:00 177
01:00 174
02:00 173
03:00 169
04:00 173
05:00 173
06:00 168
07:00 173
08:00 178
09:00 180
10:00 180
11:00 180
12:00 180
13:00 180
14:00 180
15:00 180
16:00 155
17:00 142
18:00 175
19:00 180
20:00 180
21:00 179
22:00 180
23:00 180
Time taken: 6.688 seconds, Fetched: 24 row(s)
様々なSQL文を実行しています。
最後のSQL文は時間ごとの雨量の合計を集計関数を利用して算出しています。

コメント