SonifyCPP - A software for converting images to music magically

 Well.....it's not magic, it's programming and mathematics that's helping us produce the music or audio signals. A while back, around November of 2023, I had participated in the NASA Hackathon. There was variety of projects to choose from to work, I chose the project called "Image Sonification". Me and my team members had 48 hours to code the software in any language we wanted. We chose to do it in Python for some reason. Okay, I'll back up out of the flashback for a second to describe what Image Sonification actually is. Similar to how microphone is used to produce some audio signals and then is processed on a digital audio workstation to produce a digital audio, Image sonification refers to the process of converting the images to audio signals.

Coming back to the hackathon flashback, we finished coding the project to produce some sort of "music" from input images that were smaller resolution (We couldn't figure out a way to optimize the code to read images with higher resolution). The end product is shown below.



You can checkout the code for the project here.


This is the how the new SonifyCPP written in C++ looks like.

We won a prize at the local level (couldn't get nominated to the global level) for best creation in the "Art and Creativity" department. I was happy. What I was not happy about is how the project functioned and it's visual presentation. I had created a repo on github for re-writing the software in C++ around November of 2023, but had not found the time or interest to do so. So, recently, I thought let's get my hands dirty by working on this, and I'm glad that I did, because I got to learn much more about digital audio, audio generation and audio processing.

On a side note, NASA has done Image sonifications for a long time now, and it looks awesome. If you don't know what I'm talking about, you can check it out over here. You can take a look at lot more of these if you search for "NASA Image Sonification" on YouTube.

I wanted to add in visual feedback of things similar to how NASA did, for my software. So, I worked on a few basic stuff. So, firstly, the way in which we read pixels from an image matters and defines the audio that gets produced. I implemented few of these "traversals" like traversing image from left to right, right to left, top to bottom, bottom to top, circular inwards, outwards and clockwise and anti-clockwise, which can be seen in my github project repo.

Now, with traversal out of the way, I want to explain how we convert image data to audio signal. Take a look at the image below, which I shamelessly borrowed from the seeingwithsound page.

Sonification algorithm

To make things easier, we take the grayscale of the image, so that we only work with intensity of the pixels and not their colors (atleast for now, plan to incorporate colors for extra depth). Each pixels intensity of an image is represented by a number from 0 to 255 (8 bits), 0 is black and 255 is white. I followed the image shown above and implemented the same mapping algorithm, which is to take the y position of the pixel on the image and use it as frequency of a sine wave and intensity of the pixel to amplitude. If you don't know what sine waves are, or if you do know them, but don't know how audio signals are related to this, don't worry, I'll explain.

So, basically any of the sound that we hear is due to the compressions and de-compressions (technically called rarefaction) of air molecules which are produced by the sound source.  These vibrations are periodic in nature and therefore are represented by sine waves (or cosine waves) in nature.


The frequency of a sine wave, is how many times a cycle appears in one second of time. The wavelength is the distance between two consecutive crest or trough. And the amplitude is the maximum displacement or the distance between the wave and the zero line. Any waves, is produced at frequency between 20 to 20,000 Hz (Hertz is the unit of frequency), we can hear it. This range is the audible range for humans. Animals have different perceivable frequency range.

So, for every pixel in a row, I get a sine wave (refer the to algorithm image), which I then sum them to produce a new sine wave.

Image credit: link

For every column of the image, I get a sine wave. At the end, I am left with sine waves equal to the width of the image. Each of the sine waves are sampled at a constant number of samples. Then I write this data to a special audio file called WAV files and get the audio. If you wanna try out the program, you can head on to my github repo. The code is still a work in progress, and I have a lots of things in mind to add to this.




Comments

Popular posts from this blog

My experience of the Winter School held at IISER TVM

First time participating in a solo instrumental event