HiveQL Select Group By

本章介紹了SELECT語句的GROUP BY子句。GROUP BY子句用於分類所有記錄結果的特定集合列。它被用來查詢一組記錄。

語法

GROUP BY子句的語法如下:

SELECT [ALL | DISTINCT] select_expr, select_expr, ... FROM table_reference [WHERE where_condition] [GROUP BY col_list] [HAVING having_condition] [ORDER BY col_list]] [LIMIT number];

示例

讓我們以SELECT... GROUP BY子句爲例。假設員工表有如下Id, Name, Salary, Designation, 和 Dept字段。產生一個查詢以檢索每個部門的員工數量。

+------+--------------+-------------+-------------------+--------+
| ID | Name | Salary | Designation | Dept |
+------+--------------+-------------+-------------------+--------+
|1201 | Gopal | 45000 | Technical manager | TP |
|1202 | Manisha | 45000 | Proofreader | PR |
|1203 | Masthanvali | 40000 | Technical writer | TP |
|1204 | Krian | 45000 | Proofreader | PR |
|1205 | Kranthi | 30000 | Op Admin | Admin |
+------+--------------+-------------+-------------------+--------+

下面使用上述業務情景查詢檢索員工的詳細信息。

hive> SELECT Dept,count(*) FROM employee GROUP BY DEPT;

成功執行查詢後,能看到以下回應:

+------+--------------+
| Dept | Count(*) |
+------+--------------+
|Admin | 1 |
|PR | 2 |
|TP | 3 |
+------+--------------+

JDBC 程序

下面給出的是JDBC程序應用對給定的GROUP BY子句例子。

import java.sql.SQLException; import java.sql.Connection; import java.sql.ResultSet; import java.sql.Statement; import java.sql.DriverManager; public class HiveQLGroupBy { private static String driverName = "org.apache.hadoop.hive.jdbc.HiveDriver"; public static void main(String[] args) throws SQLException { // Register driver and create driver instance Class.forName(driverName); // get connection Connection con = DriverManager. getConnection("jdbc:hive://localhost:10000/userdb", "", ""); // create statement Statement stmt = con.createStatement(); // execute statement Resultset res = stmt.executeQuery(「SELECT Dept,count(*) 」 + 「FROM employee GROUP BY DEPT; 」); System.out.println(" Dept \t count(*)"); while (res.next()) { System.out.println(res.getString(1) + " " + res.getInt(2)); } con.close(); } }

保存程序在一個名爲HiveQLGroupBy.java文件。使用下面的命令來編譯並執行這個程序。

$ javac HiveQLGroupBy.java
$ java HiveQLGroupBy

輸出:

Dept Count(*)
Admin 1
PR 2
TP 3