Microsoft’s vector::insert is slower than it should be

This popped up while I was working on a presentation about the importance of CPU cache and the habit of checking your assumptions. Yes! I did mash together those two subjects. Maybe soon™ I will either do a youtube video of that presentation, or write a post about it, but now I want to talk about what I believe is an interesting performance issue with vector::insert.

TLDR version for lazy people: Microsoft’s vector::insert is not as cache-friendly as it could be in some cases. It also misses the opportunity to use one single call to memmove in the case of trivially copyable data types.

For the not so lazy people, I invite you on a journey where we get to see a glimpse of what’s in the lower levels of Microsoft’s STL implementation.

I made for the presentation that I mentioned before a simple experiment that was supposed to highlight the importance of the CPU cache. The test example method (full code is shown towards the end of this post) was just doing container.insert(it, EpicStruct()) at random positions in the container. It was a classic example of list vs vector and memory access patterns.

Since half of my presentation was about checking assumptions, I decided to do exactly that – not leave my assumptions unchecked. My assumptions were that it would be hard to beat vector’s timing on that test due to the way memory works.

As we will see, vector::insert is implemented in an interesting way.

Continue reading “Microsoft’s vector::insert is slower than it should be”

Intersecting lists of intervals

Some evenings I like to sit at home, make myself a tea, and start working on random problems that pop into my head or that I find through various contexts. I force myself NOT to look on stackoverflow or wikipedia for a copy-paste solution because I enjoy bashing my head against something that I don’t know how to solve.

A few days ago I started thinking about the following problem.
Let’s say we have a list of intervals, each interval corresponding to a ID. Something like this:
ID 0 : [start0, end0], [start1, end1], [start2, end2]
ID 1 : [start3, end3], [start4, end4]

Given a query made up of one or more IDs, how can we return the intervals that contain ALL the IDs specified in the query.
We need to return only the overlapping parts of the intervals.

Continue reading “Intersecting lists of intervals”

Candy crush bot

Hello again, I am really lazy with posting stuff, but due to popular demand (2 friends) I decided to offer a more in-depth explanation regarding the candy crush bot that I made. I did start writing this post quite some time ago, but things always got in the way of me finishing it. Now remember that this is an explanation for beginners so if you do know the basics you can skip it and check out the code on github.

Continue reading “Candy crush bot”

Adventures with OpenGL compute shaders and particles

Some time ago I was quite bored so I wanted to learn something new. Probably inspired by the Steam Dev Days talks, OpenGL seemed like a nice idea and started reading about it. While there are some great books out there, nothing compares to doing some work yourself and figuring stuff out.

After getting accustomed with the basics of OpenGL, I began thinking about drawing lots of particles. There are not enough particles in current games. If I would make a game it would be probably have everything made out of of particles, but let’s not get off-track.

Continue reading “Adventures with OpenGL compute shaders and particles”