Implementing multi-threading support for sonifyCPP
SonifyCPP is my software written in C++ that "sonifies" or creates sound from any input image. I have explained in my last blog post how exactly this is done. The input image pixels have to be read one by one, and in my approach I generate sine waves for each of these pixels. Clearly, as the image dimensions increase, the computational cost increases. One simple way of increasing the speed of sonification is to just naively downscale the input image to a dimension that is "sonifiable" (something like 200x200). But this approach is just....meh. So I thought, instead of sending the pixels one by one to the sine wave generator, how about I send it in columns. And that's what I did. This, sadly, did not result in considerable speed up. So, the only other way was to implement multi-threading.
Multi Threading
For those who have been living under a cave, modern computers have processors which have more than one working "brain" or cores. (Note: I'll use processor and CPU interchangeably from here onwards). They are different numbers in the market, like 8, 16, 32 cores etc. And each of these cores have "minute brains" officially called as threads. Normally, any computer program written in C++ (or any other languages for that matter) runs on a single thread of a single core on your CPU.
Why do we even need multi-threading ?
I'll give you an example. Let's say we are trying to write a really basic program of computing the sum of integers from 1 to 100,000. The easiest way to do this is to just add the number up in a for loop, keeping count of the sum adding it up each time during the loop. This is one way. But, this is slow. Instead of this, how about we write two different programs, one calculating the sum from 1 to 50, 000 and the other from 50, 000 to 100, 000 and add the sums that result at the end of each of the programs. At this point, why do we just have to stop at two splits. We can run 10 programs that add from 10 different ranges and we can sum them up at the end. This is basically why we benefit from multi-threading.
In the example above, I talked about running different programs, so, even if you started running each of these individual programs at the same time, you still use only single core of your CPU for each of the programs, which means that they run one after the other. So, it's not the fastest way. Instead we write the logic in the same file, and give each of these to different threads of a core, and this way we can parallelize (it's similar to running all the code at the same time) the code, resulting in much faster code execution. This is just a gist of what multi-threading and parallelising code is about. Please check other resources to read more about them on the internet.
So, now with the basics out of the way, I'll move onto how I used multi-threading to speed up my software.
Speeding up my software
Single core execution, parsing each individual pixel of say 1000x1000 pixels image is really really slow. And I couldn't not do anything about it.
So, I first started implementing multi-threading for the left to right traversal of the image. This is really easy. You split the image into a number of chunks (equal to the total number of threads in the CPU) and then parse them like we did normally, except this time, each of the parsers are handled by different threads. These parsers parse the image, generate the corresponding sine wave signal and write them back to the audio array memory. In my software I just have a single array that holds the audio data, which means that each of these threads have to write to a single location. Here comes the problem of indexing. Let's say the first threads parsed the image (200 x 200 dimension) from (0 to 12). This means that it has to write to the audio array from the 0th location to 12th (technically 12 * number of samples per second of the audio data we are writing). This is trivial for the first thread. Let's check for the second thread. Let's say the second thread started parsing from 12th location to 24th. Now, when it writes to the array, it should not write from 0th to the 12th, because the first thread has already put the data there, and also because we want to sonify according to how the image looks. If the order is wrong, the audio wouldn't sound correct. So, we have to write to (12 + x)th location. These numbers aren't exactly how I did it for my software. Because each of the pixel generates a sine wave of a fixed duration, each of the pixel occupies around 1024 bits. So, after each column of the image is read, we have to shift position by 1024 or whatever the number we have decided on.
Now, this was easy to implement for left to right, right to left, top to bottom and bottom to top (Just have to change the indexing strategy). The problem arose with the other traversals like clockwise, anticlockwise, circular inwards and outwards. The problem maybe has to do something with the way I'm allocating the chunk sizes to each of the threads, I don't know. I tried to change different things, I couldn't get it to work. So, for now these traversals run single-threaded.
The new multi-threaded implementation is able to load 1000x1000 png image under 30s. Previously, on single thread, it never progressed upto 1 percent, even under a minute.
Comments
Post a Comment