Illustrative example

In Reposearch we index the repository locations using the PathHierarchyTokenizer, which expands a single string like https://svn.strategoxt.org/repos/webdsl/webdsls/trunk into tokens: [ svn.strategoxt.org svn.strategoxt.org/repos svn.strategoxt.org/repos/WebDSL svn.strategoxt.org/repos/WebDSL/webdsls svn.strategoxt.org/repos/WebDSL/webdsls/trunk]

If we define our search constraints to be matched strictly (default combinator AND, i.e. all terms must match), the query parser only takes the terms seperated by a whitespace as required, and not the terms that are tokenized by the analyzer.

So this aint going to work as expected:

search Entry matching repoPath:q [no lucene, strict matching]

Incorrect Lucene query that comes out (each clause being optional, i.e. at least 1 clause should match):

(repoPath:svn.strategoxt.org repoPath:svn.strategoxt.org/repos repoPath:svn.strategoxt.org/repos/WebDSL repoPath:svn.strategoxt.org/repos/WebDSL/webdsls repoPath:svn.strategoxt.org/repos/WebDSL/webdsls/trunk)

What we expected to get (each clause is required, i.e. all clauses must match):

(+repoPath:svn.strategoxt.org +repoPath:svn.strategoxt.org/repos +repoPath:svn.strategoxt.org/repos/WebDSL +repoPath:svn.strategoxt.org/repos/WebDSL/webdsls +repoPath:svn.strategoxt.org/repos/WebDSL/webdsls/trunk)

This is an issue related to the Lucene MultiFieldQueryParser. Maybe a wont-fix, but we need to be able to workaround this problem somehow. It currently causes Reposearch #37: Disappearing entries.

Submitted by Elmer van Chastelet on 15 January 2013 at 11:17

On 15 January 2013 at 11:17 Elmer van Chastelet tagged search

Log in to post comments