Friday, March 14, 2014

custom filters score query performance issue

Environment Details
  • No of shards - 1
  • Data nodes - 34
  • Master nodes - 3
  • No of document types - 2 (type 1 - parent docs & type 2 - child docs) 
  • Total no of documents - 1.6 million 
  • Document Store size - 2GB
  • Request volume - 600000 per minute

Search Query

{
   "from":0,
   "size":5,
   "timeout":450,
   "query":{
      "filtered":{
         "query":{
            "custom_filters_score":{
               "query":{
                  "match_all":{

                  }
               },
               "filters":[],
               "score_mode":"multiply"
            }
         },
         "filter":{
            "and":{
               "filters":[
                  {
                     "bool":{}
                  },
                  {
                     "bool":{}
                  },
                  {
                     "or":{}
                  }
               ]
            }
         }
      }
   },
   "sort":[
      {
         "priority":{
            "order":"asc",
            "ignore_unmapped":true
         }
      },
      {
         "_score":{

         }
      }
   ]
}

Expected Behavior

The above mentioned query is a filtered query comprising of a custom_filters_score query and a set of eliminating filters highlighted in orange. There are 1.6 million documents out of which 1300 documents are of type 1 (parent) and the remaining are type 2 (child). The expected behavior was, eliminating filters will end up  fetching a smaller set of type 1 documents which will be passed to the custom_filters_score query, so that the scoring process will be executed only for those filtered set of documents.

For example,  the eliminating filter has a filter on document type (value - type1) that will narrow the search to 1300 documents (all docs of type 1) and then reduce the result count further by applying other eliminating filters. Documents that passed all eliminating filters will then be fed to the custom_filters_score query and the scoring process will be limited to those filtered documents. Top 5 documents will be then returned as the end result.

Actual Behavior

For every request, 1.6 million docs is processed as the match all query under custom_filters_score was getting executed. Each of those document is evaluated against the eliminating filters (AND DocIdset comprising of two BOOL filters and one OR filter with an mvel script) to be considered for scoring process. Though the scoring process is executed only on a limited set of documents, fetching all documents for every request is a major performance bottleneck. Query response time was between ~50 to ~80ms and the 8 core cpu was on its max capacity (800%) all the time. Refer to the "Lucene Internals" section below that explains this behavior.

Query Optimization

For the scenario discussed here, it was sufficient to process parent documents and so the match all query was replaced with a _type term query ({"query":{"term":{"_type":"type1"}}). Post the change the query under custom_filters_score fetches only type 1 docs (1300 only instead of 1.6 million) and evaluates it against the eliminating filters. This has a significant impact on the response time and the CPU utilization. Query response time is between ~2ms to ~24ms and the 8 core cpu utilization is less than 400%.

Lucene Internals

On parsing the filtered query, FilteredQueryParser.java parse method determines the strategy to be used for processing as CustomRandomAccessFilterStrategy with a threshold value of -1. Refer to the  code snippet from CustomRandomAccessFilterStrategy.java:

            if (threshold == -1) {
                // default  value, don't iterate on only apply filter after query if its not a "fast" docIdSet
                if (!DocIdSets.isFastIterator(docIdSet)) {
                    return FilteredQuery.QUERY_FIRST_FILTER_STRATEGY.filteredScorer(context, scoreDocsInOrder, topScorer, weight, docIdSet);
                }
            }

The eliminating filter (query section highlighted in orange) results in AndDocIdSet. As AndDocIdSet is not a fast iterator (instance of FixedBitSet is a fast iterator), Lucene uses the QUERY_FIRST_FILTER_STRATEGY to process the request. Refer to the comments from FilteredQuery.java about use of QUERY_FIRST_FILTER_STRATEGY:

   * <p>
   * Use this strategy if the filter computation is more expensive than document
   * scoring or if the filter has a linear running time to compute the next
   * matching doc like exact geo distances.
   * </p>

The above mentioned lucene code path, resulted in the execution of the query first (query under custom_filters_score) followed by filtering. Documents returned by the query is then evaluated with the AndDocIdSet. Only those documents that evaluated to true in the AndDocIdSet (if (filterbits.get(scorerDoc))) will be considered for scoring process and the remaining will be skipped.