Music, together with air, water and food, the only thing that we need to survive. Music is heard everywhere, all day, every day, from shops, to radio, to TV, to elevators… We are surrounded by music and therefore is a vital aspect of our daily lives. Usually we understand ‘music’ as sounds that create a reaction on ourselves, affecting our mood and emotions, but can we go further than sounds and recreate visuals from those sounds?
Well, this art project consists in using Machine Learning models, to be able to create a visual representation that matches the sounds that are being heard. That is, to smoothly recreate images that alternate following the song’s pitch, tempo, rhythm, depth… To be more specific the model used to do so, is the BigGAN (Brock et al., 2018), a generative neural network whose goal is to be able to visualize music.
I first tried to use the model with the sample Beethoven song provided in GitHub, and after some debugging, I managed to make it work. I tested out different parameters that could be changed and applied the model to the song ‘Crazy’ by Gnarls Barkley from 2006. Finally, I decided to use a mid-high level of sensitivity for both pitch and tempo in terms of image transition, as well as a 2-minute duration and 256 as the resolution. The result can be observed below:
Tools used: