將 Unicode 編碼的字串轉換為字母字串

1. 概述

在軟體開發領域，有時我們可能需要將Unicode編碼的字串轉換為可讀的字母字串。當處理來自不同來源的資料時，這種轉換非常有用。

在本文中，我們將探討如何將 Unicode 編碼的字串轉換為 Java 中的字母字串。

2. 理解Unicode編碼

首先，Unicode 是一種通用字元編碼標準，它為每個字元分配一個唯一的數字（代碼點），無論平台或程式如何。 Unicode 編碼將字元表示為“\uXXXX,”其中XXXX是表示字元的 Unicode 代碼點的十六進位數字。

例如，字串“\u0048\u0065\u006C\u006C\u006F World”使用Unicode轉義序列進行編碼，並表示短語“Hello World” 。

3. 使用 Apache Commons 文本

Apache Commons Text 函式庫提供了一個可靠的實用程式類別： StringEscapeUtils ，它提供了unescapeJava()方法來解碼字串中的 Unicode 轉義序列：

String encodedString = "\\u0048\\u0065\\u006C\\u006C\\u006F World";
 String expectedDecodedString = "Hello World";
 assertEquals(expectedDecodedString, StringEscapeUtils.unescapeJava(encodedString));

4. 使用純Java

此外，我們可以使用java.util.regex套件中的Pattern和Matcher類別來尋找輸入字串中的所有 Unicode 轉義序列。然後，我們可以替換每個 Unicode 轉義序列：

public static String decodeWithPlainJava(String input) {
 Pattern pattern = Pattern.compile("\\\\u[0-9a-fA-F]{4}");
 Matcher matcher = pattern.matcher(input);

 StringBuilder decodedString = new StringBuilder();

 while (matcher.find()) {
 String unicodeSequence = matcher.group();
 char unicodeChar = (char) Integer.parseInt(unicodeSequence.substring(2), 16);
 matcher.appendReplacement(decodedString, Character.toString(unicodeChar));
 }

 matcher.appendTail(decodedString);
 return decodedString.toString();
 }

正規表示式可以解釋如下：

\\\\u：匹配文字字元“\u”。
[0-9a-fA-F]：符合任何有效的十六進位數字。
{4}：正好符合一行中的四個十六進位數字。

例如，讓我們解碼以下字串：

String encodedString = "Hello \\u0057\\u006F\\u0072\\u006C\\u0064";
 String expectedDecodedString = "Hello World";
 assertEquals(expectedDecodedString, decodeWithPlainJava(encodedString));

5. 結論

在本教程中，我們探索了兩種將 Unicode 編碼的字串轉換為 Java 中的字母字串的方法。

本文中的範例程式碼可以在 GitHub 上找到。

本作品係原創或者翻譯，採用《署名-非商業性使用-禁止演繹4.0國際》許可協議