Strict matching not respected when analyzer expands terms
Illustrative example
In Reposearch we index the repository locations using the PathHierarchyTokenizer, which expands a single string like
https://svn.strategoxt.org/repos/webdsl/webdsls/trunk
into tokens: [svn.strategoxt.org
svn.strategoxt.org/repos
svn.strategoxt.org/repos/WebDSL
svn.strategoxt.org/repos/WebDSL/webdsls
svn.strategoxt.org/repos/WebDSL/webdsls/trunk
]If we define our search constraints to be matched strictly (default combinator AND, i.e. all terms must match), the query parser only takes the terms seperated by a whitespace as required, and not the terms that are tokenized by the analyzer.
So this aint going to work as expected:
search Entry matching repoPath:q [no lucene, strict matching]
Incorrect Lucene query that comes out (each clause being optional, i.e. at least 1 clause should match):
(repoPath:svn.strategoxt.org repoPath:svn.strategoxt.org/repos repoPath:svn.strategoxt.org/repos/WebDSL repoPath:svn.strategoxt.org/repos/WebDSL/webdsls repoPath:svn.strategoxt.org/repos/WebDSL/webdsls/trunk)
What we expected to get (each clause is required, i.e. all clauses must match):
(+repoPath:svn.strategoxt.org +repoPath:svn.strategoxt.org/repos +repoPath:svn.strategoxt.org/repos/WebDSL +repoPath:svn.strategoxt.org/repos/WebDSL/webdsls +repoPath:svn.strategoxt.org/repos/WebDSL/webdsls/trunk)
This is an issue related to the Lucene MultiFieldQueryParser. Maybe a wont-fix, but we need to be able to workaround this problem somehow. It currently causes Reposearch #37: Disappearing entries.
Submitted by Elmer van Chastelet on 15 January 2013 at 11:17