-
Notifications
You must be signed in to change notification settings - Fork 5.7k
SERVER-25957 Optimize $filter + $arrayElemAt to avoid scanning entire array #1236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
src/mongo/db/pipeline/expression.cpp
Outdated
} | ||
|
||
intrusive_ptr<Expression> ExpressionArrayElemAt::optimize() { | ||
// If ExpressionArrayElemAt is passed an ExpressionFilter as its first arugment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to what you did in the ExpressionIndexOf optimization, I think you should first call ExpressionNary::optimize()
:
mongo/src/mongo/db/pipeline/expression.cpp
Lines 2809 to 2813 in c799c6c
// This will optimize all arguments to this expression. | |
auto optimized = ExpressionNary::optimize(); | |
if (optimized.get() != this) { | |
return optimized; | |
} |
(Also, while you're there, it looks like the comment right below that got wrapped at a strange place, can you fix that?)
src/mongo/db/pipeline/expression.cpp
Outdated
str::stream() << getOpName() << "'s second argument must be a numeric value," | ||
<< " but is " | ||
<< typeName(indexArg.getType()), | ||
indexArg.numeric()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this check is redundant with the one below. If something is integral it must be numeric. https://github.com/mongodb/mongo/blob/master/src/mongo/db/pipeline/value.cpp#L1010
src/mongo/db/pipeline/expression.cpp
Outdated
<< " a 32-bit integer: " | ||
<< indexArg.coerceToDouble(), | ||
indexArg.integral()); | ||
auto ind = indexArg.coerceToInt(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use full names, index
src/mongo/db/pipeline/expression.cpp
Outdated
str::stream() << "Second argument to $slice must be a numeric value," | ||
<< " but is of type: " | ||
<< typeName(secondVal.getType()), | ||
secondVal.numeric()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I don't think you need the assertion about numeric().
src/mongo/db/pipeline/expression.cpp
Outdated
if (auto thirdArg = dynamic_cast<ExpressionConstant*>(vpOperand[2].get())) { | ||
auto thirdVal = thirdArg->getValue(); | ||
|
||
uassert(50799, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can condense these three asserts into 'uassertIfNotIntegralAndNonNegative': https://github.com/mongodb/mongo/blob/master/src/mongo/db/pipeline/expression.cpp#L2704
src/mongo/db/pipeline/expression.cpp
Outdated
|
||
int arg3 = thirdVal.coerceToInt(); | ||
if (arg2 >= 0) { | ||
// If ExpressionSlice is given three arguments set limit to 'firstArg' + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate a little bit more here? Maybe something like If we have three arguments, then we can stop the $filter once we have seen 'firstArg' + 'secondArg' + 1. We need to skip 'firstArg' elements, then include 'secondArg' things after that. For example, if we want to slice starting from 2 with length 3, we skip the first 2, then include 3 after that.
Wait, I think we don't need the +1
here? Is that right?
src/mongo/db/pipeline/expression.h
Outdated
boost::intrusive_ptr<Expression> _input; | ||
// The expression determining whether each element should be present in the result array. | ||
boost::intrusive_ptr<Expression> _filter; | ||
// For expression ArrayElemAt and Slice we can optimize filter by setting a limit to end the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd make it clearer that this optimization applies when $filter is being used as an argument to either $slice or $arrayElemAt. Also, feel free to use the $slice name instead of ExpressionSlice here.
|
||
} // namespace Object | ||
|
||
TEST(ExpressionFilter, ExpressionFilterWithASetLimitShouldReturnAArrayNoGreaterThanTheLimit) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use AnArray
instead of AArray
.
auto arrayElemAtSpec = BSON("$arrayElemAt" << BSON_ARRAY(filterSpec << 2)); | ||
|
||
auto expArrayElemAt = ExpressionArrayElemAt::parse(expCtx, arrayElemAtSpec.firstElement(), vps); | ||
auto xpArrayElemAt = dynamic_cast<ExpressionArrayElemAt*>(expArrayElemAt.get())->optimize(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a better variable name, maybe 'optimized'?
auto val = xpArrayElemAt->evaluate(Document()); | ||
ASSERT_VALUE_EQ(val, Value(6)); | ||
|
||
auto eA = ExpressionArrayElemAt::parse(expCtx, BSON("$arrayElemAt" << BSON_ARRAY(BSON_ARRAY(1 << 2 <<3 << 4 <<5) << 1)).firstElement(), vps); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can re-use the variable names from before, I think that'd be clearer than two-letter variable names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same with below. Alternatively, you could separate each one into it's own test.
Hey @KevinCybura! I was checking through old PR's and saw that SERVER-25957 is actually still open, as this wasn't merged. I am poking that ticket to see where it's at. If you are still looking to get these changes in, can you please resolve the merge conflicts and get this caught up with master? I saw you already signed our contributor's agreement. I know it's been a few years, so it's okay if that's not on the table. (I'm trying to clean up our pull request queue right now) |
No description provided.