Elasticsearch phrase suggester is suggesting me suggestions that do not exists in my index -
i have elasticsearch index have data. implemented , did-you-mean
feature when user write misspelled receive suggestion right words.
i used phrase suggester because need suggestions short phrases, names example, problem suggestions not exists in index.
example:
document in index: coding master search: codning boss suggestion: <em>coding</em> boss search result: not found
my problem that, there no phrase in index match specified suggestion, it's recommending me phrases not exists , give me not found search.
what can this? shouldn't phrase suggester give suggestions phrases exists in index?
here i'll leave corresponding query, mapping , setting in case need it.
setting , mappings
{ "settings": { "index": { "number_of_shards": 3, "number_of_replicas": 1, "search.slowlog.threshold.fetch.warn": "2s", "index.analysis.analyzer.default.filter.0": "standard", "index.analysis.analyzer.default.tokenizer": "standard", "index.analysis.analyzer.default.filter.1": "lowercase", "index.analysis.analyzer.default.filter.2": "asciifolding", "index.priority": 3, "analysis": { "analyzer": { "suggests_analyzer": { "tokenizer": "lowercase", "filter": [ "lowercase", "asciifolding", "shingle_filter" ], "type": "custom" } }, "filter": { "shingle_filter": { "min_shingle_size": 2, "max_shingle_size": 3, "type": "shingle" } } } } }, "mappings": { "my_type": { "properties": { "suggest_field": { "analyzer": "suggests_analyzer", "type": "string" } } } } }
query
{ "didyoumean": { "text": "codning boss", "phrase": { "field": "suggest_field", "size": 1, "gram_size": 1, "confidence": 2.0 } } }
thanks help.
this expected actually. if analyze document analyze api, better picture of happening.
get suggest_index/_analyze?text=coding master&analyzer=suggests_analyzer
this output
{ "tokens": [ { "token": "coding", "start_offset": 0, "end_offset": 6, "type": "word", "position": 1 }, { "token": "coding like", "start_offset": 0, "end_offset": 11, "type": "shingle", "position": 1 }, { "token": "coding a", "start_offset": 0, "end_offset": 13, "type": "shingle", "position": 1 }, { "token": "like", "start_offset": 7, "end_offset": 11, "type": "word", "position": 2 }, { "token": "like a", "start_offset": 7, "end_offset": 13, "type": "shingle", "position": 2 }, { "token": "like master", "start_offset": 7, "end_offset": 20, "type": "shingle", "position": 2 }, { "token": "a", "start_offset": 12, "end_offset": 13, "type": "word", "position": 3 }, { "token": "a master", "start_offset": 12, "end_offset": 20, "type": "shingle", "position": 3 }, { "token": "master", "start_offset": 14, "end_offset": 20, "type": "word", "position": 4 } ] }
as can see, there token "coding" generated text , hence in index. not suggesting not in index.if strictly want phrase search, might want consider using keyword tokenizer. e.g if change mapping
{ "settings": { "index": { "analysis": { "analyzer": { "suggests_analyzer": { "tokenizer": "lowercase", "filter": [ "lowercase", "asciifolding", "shingle_filter" ], "type": "custom" }, "raw_analyzer": { "tokenizer": "keyword", "filter": [ "lowercase", "asciifolding" ] } }, "filter": { "shingle_filter": { "min_shingle_size": 2, "max_shingle_size": 3, "type": "shingle" } } } } }, "mappings": { "my_type": { "properties": { "suggest_field": { "analyzer": "suggests_analyzer", "type": "string", "fields": { "raw": { "analyzer": "raw_analyzer", "type": "string" } } } } } } }
then query give expected results
{ "didyoumean": { "text": "codning lke master", "phrase": { "field": "suggest_field.raw", "size": 1, "gram_size": 1 } } }
it wont show "codning boss".
edit 1
2) comments , running phrase suggestions on own dataset, feel better approach use collate
option phrase suggester
provides can check every suggestion against query
, give suggestion if going document index. have added stemmers
mapping consider root word. using light_english
less aggressive. more on that.
analyzer part of mapping looks now
"analysis": { "analyzer": { "suggests_analyzer": { "tokenizer": "standard", "filter": [ "lowercase", "english_possessive_stemmer", "light_english_stemmer", "asciifolding", "shingle_filter" ], "type": "custom" } }, "filter": { "light_english_stemmer": { "type": "stemmer", "language": "light_english" }, "english_possessive_stemmer": { "type": "stemmer", "language": "possessive_english" }, "shingle_filter": { "min_shingle_size": 2, "max_shingle_size": 4, "type": "shingle" } } }
now query give desired results.
{ "suggest" : { "text" : "appel on tabel", "simple_phrase" : { "phrase" : { "field" : "suggest_field", "size" : 5, "collate": { "query": { "inline" : { "match_phrase": { "{{field_name}}" : "{{suggestion}}" } } }, "params": {"field_name" : "suggest_field"}, "prune": false } } } }, "size": 0 }
this give apple on table here match_phrase
query used run every suggested phrase against index. can make "prune" : true
, see results have been suggested regardless of match. might want consider using stop
filter avoid stopwords.
hope helps!!
Comments
Post a Comment