Elasticsearch 查詢“不包含”

1.概述

在使用Elasticsearch時，我們經常需要過濾掉欄位中不包含特定子字串的文件。 Elasticsearch 沒有直接的'not contains'運算符，但我們可以使用多種方法來實現此行為。在本文中，我們將探索實現not contains行為的各種方法。

2. 索引設定

在開始之前，我們先像平常一樣執行一個 Elasticsearch 實例。接下來，建立索引來儲存交易日誌：

curl -X PUT "http://localhost:9200/transaction-logs" -H "Content-Type: application/json" -d'

 {

 "mappings": {

 "properties": {

 "message": {

 "type": "text",

 "fields": {

 "keyword": {

 "type": "keyword"

 }

 }

 }

 }

 }

 }'

最後，讓我們準備一些包含用戶交易的文檔：

curl -X POST "http://localhost:9200/transaction-logs/_doc/1"

 -H "Content-Type: application/json"

 -d' { "message": "User1 deposited 1000 AP1 points" }'



 curl -X POST "http://localhost:9200/transaction-logs/_doc/2"

 -H "Content-Type: application/json"

 -d' { "message": "User1 deposited 1000 AP2 points" }'



 curl -X POST "http://localhost:9200/transaction-logs/_doc/3"

 -H "Content-Type: application/json"

 -d' { "message": "User1 deposited 1000 AP3 points" }'



 curl -X POST "http://localhost:9200/transaction-logs/_doc/4"

 -H "Content-Type: application/json"

 -d' { "message": "User1 deposited 1000 PP1 points" }'

現在，我們已經建立了一個包含文件的索引，我們可以開始探索過濾它們的不同方法。

3. 使用正規表示式`must_not`

正規表示式為我們提供了靈活的模式匹配，可以應對複雜的排除情況。讓我們查詢transaction-logs索引，並僅包含不包含 AP2 到 AP9 之間任何值的日誌訊息。

curl -X GET "http://localhost:9200/transaction-logs/_search" -H "Content-Type: application/json" -d'

 {

 "query": {

 "bool": {

 "must_not": [

 { "regexp": { "message.keyword": ".*AP[2-9].*" } }

 ]

 }

 }

 }'

我們使用了[regexp](https://www.elastic.co/docs/reference/query-languages/query-dsl/query-dsl-regexp-query)關鍵字來尋找所有此類情況，並使用must_not來撤銷此指令。在回應中，我們將看到：

{

 "hits": [

 {

 "_index": "transaction-logs",

 "_id": "1",

 "_score": 0.0,

 "_source": {

 "message": "User1 deposited 1000 AP1 points"

 }

 },

 {

 "_index": "transaction-logs",

 "_id": "4",

 "_score": 0.0,

 "_source": {

 "message": "User1 deposited 1000 PP1 points"

 }

 }

 ]

 }

我們應該考慮到正規表示式是一種低效能的操作，因此只有在別無選擇的情況下才適用。

4. 使用通配符`must_not`

我們可以使用通配符方法，這是一種更有效率的子字串排除方法。這裡我們有一些限制，無法使用完整的正規表示式語法。但是，我們仍然可以從結果中排除子字串。讓我們查詢索引，並嘗試排除所有帶有 AP 符號的交易：

curl -X GET "http://localhost:9200/transaction-logs/_search" -H "Content-Type: application/json" -d'

 {

 "query": {

 "bool": {

 "must_not": [

 { "wildcard": { "message.keyword": "*AP*" } }

 ]

 }

 }

 }'

這裡，我們再次使用must_not來還原通配符指令。結果如下：

{

 "hits": {

 "total": {

 "value": 1,

 "relation": "eq"

 },

 "max_score": 0.0,

 "hits": [

 {

 "_index": "transaction-logs",

 "_id": "4",

 "_score": 0.0,

 "_source": {

 "message": "User1 deposited 1000 PP1 points"

 }

 }

 ]

 }

 }

正如預期的那樣，所有AP交易都被過濾掉了。

5. 使用`must_not`查詢字串

我們也可以使用帶有通配符的查詢字串語法。實際上，我們將使用更小的請求來實現相同的通配符查詢。讓我們執行查詢來過濾掉相同的 AP 交易：

curl -X GET "http://localhost:9200/transaction-logs/_search" -H "Content-Type: application/json" -d'

 {

 "query": {

 "bool": {

 "must_not": [

 { "query_string": { "query": "message:*AP*"} }

 ]

 }

 }

 }'

這裡，我們使用了query_string語法和must_not運算子。因此，我們將看到與預期相同的 PP 交易日誌：

{

 "hits": {

 "total": {

 "value": 1,

 "relation": "eq"

 },

 "max_score": 0.0,

 "hits": [

 {

 "_index": "transaction-logs",

 "_id": "4",

 "_score": 0.0,

 "_source": {

 "message": "User1 deposited 1000 PP1 points"

 }

 }

 ]

 }

 }

通配符比正規表示式更快，但它們仍然是效能相對較低的操作，並且運行速度可能很慢。

6. 使用 Match、 `must_not`和自訂分析器

如果我們事先知道查詢參數，就能最有效地實現not-contains行為。在建立索引時，我們可以指定自訂的分析器。在其屬性中，我們可以添加單字分隔符，甚至可以定義自訂的標記器。

讓我們使用指定的分隔符號重新建立transaction-logs索引：

curl -X PUT "localhost:9200/transaction-logs"

 -H "Content-Type: application/json"

 -d'

 {

 "settings": {

 "analysis": {

 "analyzer": {

 "message_analyzer": {

 "tokenizer": "whitespace",

 "filter": ["lowercase", "word_delimiter"]

 }

 }

 }

 },

 "mappings": {

 "properties": {

 "message": {

 "type": "text",

 "analyzer": "message_analyzer"

 }

 }

 }

 }'

有了這樣的配置，從AP1字詞中，我們將得到ap 、 ap1,和1 token。現在，我們只需使用must_not和match指令來查詢索引即可：

curl -X GET "http://localhost:9200/transaction-logs/_search"

 -H "Content-Type: application/json"

 -d' {

 "query": {

 "bool": {

 "must_not": [

 { "match": { "message": "AP" } }

 ]

 }

 }

 }'

在回應中，我們會看到相同的AP交易被過濾掉了。此查詢比正規表示式或通配符更有效率。然而，我們應該考慮查詢選項的妥協，這些選項會變得更加複雜和繁重。

7. 結論

在本文中，我們回顧了在 Elasticsearch 中實作not contains行為的不同方法。所有這些方法都依賴must_not運算符，該運算符會反轉匹配條件。每種方法都是在所需功能和效能之間進行權衡。

當效能不是問題時，我們可以使用regexp來建立最靈活的查詢。另一方面，我們可以使標記化過程更加複雜，並且僅依賴預測的子字串，但作為回報，我們將獲得更快的查詢速度。

與往常一樣，程式碼可在 GitHub 上取得。

本作品係原創或者翻譯，採用《署名-非商業性使用-禁止演繹4.0國際》許可協議

Elasticsearch 查詢“不包含”

1.概述

2. 索引設定

3. 使用正規表示式must_not

4. 使用通配符must_not

5. 使用must_not查詢字串

6. 使用 Match、 must_not和自訂分析器

7. 結論

最新文章

3. 使用正規表示式`must_not`

4. 使用通配符`must_not`

5. 使用`must_not`查詢字串

6. 使用 Match、 `must_not`和自訂分析器