improve highlighting/analyzers
When searching for a term (e.g. a variable name), it doesn’t highlight occurrences which are followed with a dot.
e.g. it highlights ‘out’ in
var out;
, where it does not highlight it inout.print(q)
;.We probably need to change the analyzers for this to work, if possible at all.
Submitted by Elmer van Chastelet on 15 February 2012 at 10:44
Issue Log
To be investigated: use of shingle or word delimiter filter (both token filters), possibly in combination with token expansion by PatternReplaceFilter.
I took another approach, now using phrase queries. All non whitespace characters are preserved in the index and sequences matching regex:
([a-zA-Z_]\\w*)|\\d+|[!-/:-@\\[-`{-~]
are treated as single tokens.
User queries are translated to phrase queries with (by default) a slop of 0, meaning that tokens form the user query may only have 0 tokens in between them.
I named thisexact search
in reposearch. Disabling exact search sets the slop to 100000, allowing multiple tokens to appear ‘anywhere’ in a file, while ranking files higher when tokens appear closer to each other :)This approach increases recall and is more transparent and flexible. It doesn’t assume anything about the identifiers in the source code, where we previously used different search fields for identifiers that may and identifiers that may not include dots and hyphens. This also fixes an issue where not all matches were highlighted.
Fixed in r51-52.
Log in to post comments