我需要在后端实现以下功能:用户键入查询并获取命中以及命中的统计信息。下面是一个简化的示例。
假设查询为Grif
,则用户返回(例如,随机单词)
频率+某个术语出现的文档数,例如:
我是Elasticsearch的新手,所以我不确定从哪里开始实现这样的东西。哪种查询最适合此查询?我可以用来获取此类统计信息吗?任何其他建议也将不胜感激。
这有多层。您需要:
PUT my-index
{
"settings": {
"index": {
"max_ngram_diff": 20
},
"analysis": {
"tokenizer": {
"my_ngrams": {
"type": "ngram",
"min_gram": 3,
"max_gram": 20,
"token_chars": [
"letter",
"digit"
]
}
},
"analyzer": {
"my_ngrams_analyzer": {
"tokenizer": "my_ngrams",
"filter": [
"lowercase"
]
}
}
}
},
"mappings": {
"properties": {
"content": {
"type": "text",
"fields": {
"analyzed": {
"type": "text",
"analyzer": "my_ngrams_analyzer"
},
"keyword": {
"type": "keyword"
}
}
}
}
}
}
content
字段中批量插入一些包含文本的示例文档。请注意,每个文档也都有一个_id
-以后将需要它们。POST _bulk
{"index":{"_index":"my-index", "_id":1}}
{"content":"Griffith"}
{"index":{"_index":"my-index", "_id":2}}
{"content":"Griffin"}
{"index":{"_index":"my-index", "_id":3}}
{"content":"Grif"}
{"index":{"_index":"my-index", "_id":4}}
{"content":"Grift"}
{"index":{"_index":"my-index", "_id":5}}
{"content":"Griffins"}
{"index":{"_index":"my-index", "_id":6}}
{"content":"Griffith"}
{"index":{"_index":"my-index", "_id":7}}
{"content":"Griffins"}
.analyzed
字段中搜索n-gram,然后通过terms
聚合将匹配的文档按原始术语分组。同时,_id
通过top_hits
聚合检索存储桶中的文档之一。顺便说一句,_id
在给定存储桶中返回哪个都没关系,所有的存储桶都将包含相同的存储项。POST my-index/_search?filter_path=aggregations.*.buckets.key,aggregations.*.buckets.doc_count,aggregations.*.buckets.*.hits.hits._id
{
"size": 0,
"query": {
"term": {
"content.analyzed": "grif"
}
},
"aggs": {
"full_terms": {
"terms": {
"field": "content.keyword",
"size": 10
},
"aggs": {
"top_doc": {
"top_hits": {
"size": 1,
"_source": false
}
}
}
}
}
}
filter_path
从以前的请求URL参数减少,只是这些属性,我们需要的响应-未被破坏的,原来full_terms
加一个潜在的ID:{
"aggregations" : {
"full_terms" : {
"buckets" : [
{
"key" : "Griffins",
"doc_count" : 2,
"top_doc" : {
"hits" : {
"hits" : [
{
"_id" : "5"
}
]
}
}
},
{
"key" : "Griffith",
"doc_count" : 2,
"top_doc" : {
"hits" : {
"hits" : [
{
"_id" : "1"
}
]
}
}
},
{
"key" : "Grif",
"doc_count" : 1,
"top_doc" : {
"hits" : {
"hits" : [
{
"_id" : "3"
}
]
}
}
},
{
"key" : "Griffin",
"doc_count" : 1,
"top_doc" : {
"hits" : {
"hits" : [
{
"_id" : "2"
}
]
}
}
},
{
"key" : "Grift",
"doc_count" : 1,
"top_doc" : {
"hits" : {
"hits" : [
{
"_id" : "4"
}
]
}
}
}
]
}
}
}
有一个称为Term Vectors的专用Elasticsearch API ,它完全可以满足您的需求-它从整个索引中检索字段和术语统计信息。为了将这些统计信息交给您,它需要文档ID-您将从上述汇总中获得文档ID!
filter_path
以下方式浓缩响应:POST /my-index/_mtermvectors?filter_path=docs.term_vectors.*.*.*.doc_freq,docs.term_vectors.*.*.*.term_freq
{
"docs": [
{
"_id": "5", <--- guaranteeing
"fields": [
"content.keyword"
],
"payloads": false,
"positions": false,
"offsets": false,
"field_statistics": false,
"term_statistics": true
},
{
"_id": "1", <--- the response
"fields": [
"content.keyword"
],
"payloads": false,
"positions": false,
"offsets": false,
"field_statistics": false,
"term_statistics": true
},
{
"_id": "3", <--- order
"fields": [
"content.keyword"
],
"payloads": false,
"positions": false,
"offsets": false,
"field_statistics": false,
"term_statistics": true
},
{
"_id": "2",
"fields": [
"content.keyword"
],
"payloads": false,
"positions": false,
"offsets": false,
"field_statistics": false,
"term_statistics": true
},
{
"_id": "4",
"fields": [
"content.keyword"
],
"payloads": false,
"positions": false,
"offsets": false,
"field_statistics": false,
"term_statistics": true
}
]
}
doc_freq
),以及C)术语频率:{
"docs" : [
{
"term_vectors" : {
"content.keyword" : {
"terms" : {
"Griffins" : { | term
"doc_freq" : 2, | <-- # of docs
"term_freq" : 1 | term frequency
}
}
}
}
},
{
"term_vectors" : {
"content.keyword" : {
"terms" : {
"Griffith" : {
"doc_freq" : 2,
"term_freq" : 1
}
}
}
}
},
{
"term_vectors" : {
"content.keyword" : {
"terms" : {
"Grif" : {
"doc_freq" : 1,
"term_freq" : 1
}
}
}
}
},
{
"term_vectors" : {
"content.keyword" : {
"terms" : {
"Griffin" : {
"doc_freq" : 1,
"term_freq" : 1
}
}
}
}
},
{
"term_vectors" : {
"content.keyword" : {
"terms" : {
"Grift" : {
"doc_freq" : 1,
"term_freq" : 1
}
}
}
}
}
]
}
无耻的插件:如果您是Elasticsearch的新手,并且像我一样,从实际示例中学习得最好,请考虑购买《Elasticsearch手册》。
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句