Optimize memory allocation when rendering partials #8
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Re-implements rails#591 for our fork.
We're seeing calls to
reverse_merge!
,merge!
, andmerge
fromJbuilderTemplate
come up as CPU and memory hot spots in our profiles.The changes proposed in this PR are inspired by https://github.com/fastruby/fast-ruby#hashmerge-vs-hash-code, and favours mutating the
options
hash via element assignment over merge methods. This saves on both CPU and memory allocation.Comparing
options[:locals].merge!(json: self)
tooptions[:locals][:json] = self
for example produced:This PR replaces all instances of
reverse_merge!
with[] ||=
, and all instances ofmerge!
with[]=
. Theoptions
were already being mutated so this introduces no change in behaviour.There are a handful of non-mutating calls to
merge
as well that I was hesitant to change, but upon further analysis theoptions
hash ends up being mutated further down the call chain anyways; any instance of theoptions
hash being merged are on code paths that render to partials which already mutate the options.I've run some benchmarks against something simple yet representative of a template structure that would exercise some of the changes being proposed.
The measurements below are for 100 posts, each with a single author.
CPU
Memory
I was surprised to see no difference in IPS given the earlier benchmarks, but that can be explained by
actionview
diluting it; this benchmark includes the entirerender
lifecycle which means that my code changes are only running a couple hundred times per second.The impactful improvements is the ~20% reduction in memory. Note that the memory allocation savings would depend entirely on your template - templates rendering to fewer or no partials would see less of an improvement, templates rendering to more partials could see a much larger improvement. As your API serves requests over time, this improvement would go a long way towards saving on garbage collection cycles.