-
Notifications
You must be signed in to change notification settings - Fork 42
(bugfix)VespaMatchEvaluator
with nearestNeighbor
#1125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This is only a draft workaround for now, as we should probably do this automatically, but then also be very clear on the limitations. |
We also need |
@glebashnik |
Re: |
Let's wait a bit with this and see whether grouping approach is better! |
@boeker fyi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes a bug in the VespaMatchEvaluator
class when used with nearestNeighbor queries. The issue was with the previous naive assumption that query filters and recall parameters were independent, which doesn't hold for NN queries where the recall parameter gets translated to an IN term that can significantly alter query behavior. The fix replaces the recall-based approach with a grouping-based method that properly handles nearestNeighbor queries.
Key changes:
- Replaces recall parameter with grouping queries to check document matches
- Makes id_field parameter required with proper validation
- Removes dual query approach (limit + recall) in favor of single grouping query
- Updates all tests to work with the new grouping-based approach
Reviewed Changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated 4 comments.
File | Description |
---|---|
vespa/evaluation.py | Main implementation changes - switches from recall to grouping approach, adds required id_field validation, implements new grouping filter and ID extraction methods |
tests/unit/test_evaluator.py | Updates all unit tests to work with new grouping response structure, adds comprehensive tests for new methods, removes dual-query test scenarios |
tests/integration/test_integration_evaluation.py | Updates integration tests to include required id_field parameter, adds new test cases for small targetHits scenarios |
docs/sphinx/source/evaluating-vespa-application-cloud.ipynb | Updates example notebook to include required id_field parameter in VespaMatchEvaluator usage |
from typing import Iterable | ||
import re |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The import of re
and Iterable
should be added to the existing import block rather than creating a new one. Consider organizing imports by grouping standard library imports together at the top.
Copilot uses AI. Check for mistakes.
def create_grouping_filter( | ||
self, yql: str, id_field: str, relevant_ids: Union[str, Iterable] | ||
) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type hint Union[str, Iterable]
is too broad as str
is also iterable. Consider using Union[str, Iterable[str]]
or Union[str, List[str]]
for better type safety and clarity.
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think List[str] is even better.
def create_mock_response_json(matched_doc_id, searchtime, total_count): | ||
return { | ||
"root": { | ||
"children": [ | ||
{ | ||
"id": "group:root:0", | ||
"relevance": 1.0, | ||
"continuation": {"this": ""}, | ||
"children": [ | ||
{ | ||
"id": "group:string:id", | ||
"children": [ | ||
{ | ||
"id": f"group:string:{matched_doc_id}", | ||
"relevance": 0.4395870752632618, | ||
"value": matched_doc_id, | ||
"fields": {"count()": 1}, | ||
} | ||
], | ||
} | ||
], | ||
} | ||
], | ||
"fields": {"totalCount": total_count}, | ||
}, | ||
"timing": {"searchtime": searchtime}, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This nested JSON structure is repeated across multiple test methods with slight variations. Consider extracting this into a reusable helper method or test fixture to reduce code duplication and improve maintainability.
Copilot uses AI. Check for mistakes.
@thomasht86 How this handles query timeouts and other errors during evaluation? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments.
logger.info(f"Wrote verbose match evaluation results to {csv_path}") | ||
|
||
def create_grouping_filter( | ||
self, yql: str, id_field: str, relevant_ids: Union[str, Iterable] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is unnecessary to allow both str
and Iterable
for relevant_ids
because it can be interpreted as string with several ids, e.g. "d1,d2" etc. I would suggest to keep only list of strings as an option - as specified in docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, it will actually be either a str or Set[str] if we look at what should be passed as relevant_docs
to the init method.
using that and fixing docstring. agree?
modified_yql = yql.strip().rstrip(";") | ||
return modified_yql + grouping_clause | ||
|
||
def extract_matched_ids(self, resp: VespaQueryResponse, id_field: str) -> Set[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a static method.
def create_grouping_filter( | ||
self, yql: str, id_field: str, relevant_ids: Union[str, Iterable] | ||
) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think List[str] is even better.
I confirm that this contribution is made under the terms of the license found in the root directory of this repository's source tree and that I have the authority necessary to make this contribution on behalf of its copyright owner.
Background:
VespaMatchEvaluator
tries to exploit the recall query-parameter to find out whether or not a docid is matched by a query.I took this statement too literally:
Hence, the previous implementation was made under the naive assumption that the query and the recall-parameter are independent of each other, which does not hold true, at least for NN queries, as the
recall
-parameter will be translated to IN term and added to the query, and add a strong filter to the query.To illustrate, consider the following queries:
A.
select * from sources * where ({targetHits:1000}nearestNeighbor(embedding,q));
B.
select * from sources * where ({targetHits:1000}nearestNeighbor(embedding,q)) AND docid IN(“id01”, “id02” ..);
(There can be many ids)
We want to make sure that all of the matched documents for query B will also have been matched by query A.
By setting the following parameters:
ranking.matching.postFilterThreshold: 0.0
, # We always want postfiltering. Post-filtering is chosen when the estimated filter hit ratio of the query is larger than this threshold.ranking.matching.approximateThreshold: 0.0
, # We never want fallback to exact search. The fallback to exact search is chosen when the estimated filter hit ratio of the query is less than this threshold.it will work for all cases like the above, but setting these parameters themselves will change the behavior of NN-queries, if the original queries themselves would have filters, which should not be postfiltered or which should actually fall back to exact search)
Maybe there is a better way of doing it..?