This popped up while I was working on a presentation about the importance of CPU cache and the habit of checking your assumptions. Yes! I did mash together those two subjects. Maybe soon™ I will either do a youtube video of that presentation, or write a post about it, but now I want to talk about what I believe is an interesting performance issue with vector::insert.
TLDR version for lazy people: Microsoft’s vector::insert is not as cache-friendly as it could be in some cases. It also misses the opportunity to use one single call to memmove in the case of trivially copyable data types.
For the not so lazy people, I invite you on a journey where we get to see a glimpse of what’s in the lower levels of Microsoft’s STL implementation.
I made for the presentation that I mentioned before a simple experiment that was supposed to highlight the importance of the CPU cache. The test example method (full code is shown towards the end of this post) was just doing container.insert(it, EpicStruct()) at random positions in the container. It was a classic example of list vs vector and memory access patterns.
Since half of my presentation was about checking assumptions, I decided to do exactly that – not leave my assumptions unchecked. My assumptions were that it would be hard to beat vector’s timing on that test due to the way memory works.
As we will see, vector::insert is implemented in an interesting way.