Elasticsearch phrase suggester is suggesting me suggestions that do not exists in my index -


i have elasticsearch index have data. implemented , did-you-mean feature when user write misspelled receive suggestion right words.

i used phrase suggester because need suggestions short phrases, names example, problem suggestions not exists in index.

example:

document in index: coding master search: codning boss suggestion: <em>coding</em> boss search result: not found 

my problem that, there no phrase in index match specified suggestion, it's recommending me phrases not exists , give me not found search.

what can this? shouldn't phrase suggester give suggestions phrases exists in index?

here i'll leave corresponding query, mapping , setting in case need it.

setting , mappings

{   "settings": {     "index": {       "number_of_shards": 3,       "number_of_replicas": 1,       "search.slowlog.threshold.fetch.warn": "2s",       "index.analysis.analyzer.default.filter.0": "standard",       "index.analysis.analyzer.default.tokenizer": "standard",       "index.analysis.analyzer.default.filter.1": "lowercase",       "index.analysis.analyzer.default.filter.2": "asciifolding",       "index.priority": 3,       "analysis": {         "analyzer": {           "suggests_analyzer": {             "tokenizer": "lowercase",             "filter": [               "lowercase",               "asciifolding",               "shingle_filter"             ],             "type": "custom"           }         },         "filter": {           "shingle_filter": {             "min_shingle_size": 2,             "max_shingle_size": 3,             "type": "shingle"           }         }       }     }   },   "mappings": {     "my_type": {       "properties": {         "suggest_field": {           "analyzer": "suggests_analyzer",           "type": "string"         }       }     }   } } 

query

{   "didyoumean": {     "text": "codning boss",     "phrase": {       "field": "suggest_field",       "size": 1,       "gram_size": 1,       "confidence": 2.0     }   } } 

thanks help.

this expected actually. if analyze document analyze api, better picture of happening.

get suggest_index/_analyze?text=coding master&analyzer=suggests_analyzer 

this output

{    "tokens": [       {          "token": "coding",          "start_offset": 0,          "end_offset": 6,          "type": "word",          "position": 1       },       {          "token": "coding like",          "start_offset": 0,          "end_offset": 11,          "type": "shingle",          "position": 1       },       {          "token": "coding a",          "start_offset": 0,          "end_offset": 13,          "type": "shingle",          "position": 1       },       {          "token": "like",          "start_offset": 7,          "end_offset": 11,          "type": "word",          "position": 2       },       {          "token": "like a",          "start_offset": 7,          "end_offset": 13,          "type": "shingle",          "position": 2       },       {          "token": "like master",          "start_offset": 7,          "end_offset": 20,          "type": "shingle",          "position": 2       },       {          "token": "a",          "start_offset": 12,          "end_offset": 13,          "type": "word",          "position": 3       },       {          "token": "a master",          "start_offset": 12,          "end_offset": 20,          "type": "shingle",          "position": 3       },       {          "token": "master",          "start_offset": 14,          "end_offset": 20,          "type": "word",          "position": 4       }    ] } 

as can see, there token "coding" generated text , hence in index. not suggesting not in index.if strictly want phrase search, might want consider using keyword tokenizer. e.g if change mapping

{   "settings": {     "index": {       "analysis": {         "analyzer": {           "suggests_analyzer": {             "tokenizer": "lowercase",             "filter": [               "lowercase",               "asciifolding",               "shingle_filter"             ],             "type": "custom"           },           "raw_analyzer": {             "tokenizer": "keyword",             "filter": [               "lowercase",               "asciifolding"             ]           }         },         "filter": {           "shingle_filter": {             "min_shingle_size": 2,             "max_shingle_size": 3,             "type": "shingle"           }         }       }     }   },   "mappings": {     "my_type": {       "properties": {         "suggest_field": {           "analyzer": "suggests_analyzer",           "type": "string",           "fields": {             "raw": {               "analyzer": "raw_analyzer",               "type": "string"             }           }         }       }     }   } } 

then query give expected results

{   "didyoumean": {     "text": "codning lke master",     "phrase": {       "field": "suggest_field.raw",       "size": 1,       "gram_size": 1     }   } } 

it wont show "codning boss".

edit 1

2) comments , running phrase suggestions on own dataset, feel better approach use collate option phrase suggester provides can check every suggestion against query , give suggestion if going document index. have added stemmers mapping consider root word. using light_english less aggressive. more on that.

analyzer part of mapping looks now

 "analysis": {      "analyzer": {          "suggests_analyzer": {              "tokenizer": "standard",              "filter": [                  "lowercase",                  "english_possessive_stemmer",                  "light_english_stemmer",                  "asciifolding",                  "shingle_filter"              ],              "type": "custom"          }      },      "filter": {          "light_english_stemmer": {              "type": "stemmer",              "language": "light_english"          },          "english_possessive_stemmer": {              "type": "stemmer",              "language": "possessive_english"          },          "shingle_filter": {              "min_shingle_size": 2,              "max_shingle_size": 4,              "type": "shingle"          }      }  } 

now query give desired results.

{    "suggest" : {      "text" : "appel on tabel",      "simple_phrase" : {        "phrase" : {          "field" :  "suggest_field",          "size" :   5,          "collate": {            "query": {               "inline" : {                "match_phrase": {                    "{{field_name}}" : "{{suggestion}}"                 }              }            },            "params": {"field_name" : "suggest_field"},             "prune": false          }        }      }    },    "size": 0  } 

this give apple on table here match_phrase query used run every suggested phrase against index. can make "prune" : true , see results have been suggested regardless of match. might want consider using stop filter avoid stopwords.

hope helps!!


Comments

Popular posts from this blog

get url and add instance to a model with prefilled foreign key :django admin -

css - Make div keyboard-scrollable in jQuery Mobile? -

ruby on rails - Seeing duplicate requests handled with Unicorn -