You have probably heard of the song [Equation] by Aphex Twin. It is known for having a scary face hidden in its spectrum.
So I asked myself, how can I do this by myself? It is possible using a bit of matlab and basic knowledge in signal processing.
I choose a short mp3 track to modify: Cartoon (Interlude) by French House Artist SabastiAn.
This is the image we’ll try to hide:
Here is the matlab code and an explanation for it.
clear; fname = '12 Cartoon (Interlude).mp3'; imgname = 'img1.png'; outname = 'skull.wav'; % duration/pos. of projected image (seconds) duration = 15; offset = 2; window_fact = 0; %apply_func = @(S, Img) S .* (-(Img > 0)*5 + 1); apply_func = @(S, Img) S ./ (1 + Img * 2); %Hz Fmin = 7000; Fmax = 12000; % vertical resolution of projected image % bigger value makes stripe-pattern, sound % gets chirping hres = 200; % read audio [y,Fs] = audioread(fname); track_len = size(y,1); % chunk size chunk_s = floor(hres / (Fmax-Fmin) * Fs/2) * 2; % channels y1 = y(:,1); y2 = y(:,2); pad_len = chunk_s - mod(track_len, chunk_s); track_len = track_len + pad_len; y1 = [y1; zeros(pad_len, 1)]; y2 = [y2; zeros(pad_len, 1)]; y1 = reshape(y1, chunk_s, track_len/chunk_s); y2 = reshape(y2, chunk_s, track_len/chunk_s); % we have real data, so only take first half of fft get_half = @(Y) flipud(Y(1:chunk_s/2+1,:)); get_whole = @(Y) [flipud(Y); conj(Y(2:end-1,:))]; Y1 = get_half(fft(y1)); Y2 = get_half(fft(y2)); %%%% find top frequency for later filtering freq = fliplr(log(sum(abs(Y1) + abs(Y2),2))'); [m, index] = max(smooth(freq < min(freq) * 1.1, chunk_s/10) > 0.5); filter_freq = floor(index / size(Y1,1) * Fs / 2); %%%% load the image, put in right shape %%%% I = imread(imgname); if ndims(I) == 3 I = rgb2gray(I); end I = imcomplement(I); % I = imcomplement(I); I = imresize(I, [hres duration*Fs/chunk_s]); I = double(I); % append zeros left and right I = [zeros(hres, floor(offset*Fs/chunk_s)) I]; I = [I zeros(hres, size(Y1,2)-size(I,2))]; % append zeros top and bottom I = [zeros(floor(chunk_s/2 - (chunk_s*Fmax/Fs)), size(Y1,2)); I]; I = [I; zeros((size(Y1,1) - size(I,1)), size(Y1,2))]; %%%% append window to filter %%%% I = ifft(get_whole(I)); I = reshape(I, 1, track_len)'; window = hanning(size(I,1)); I = I .* (1 - window_fact + window_fact * window); I = reshape(I', chunk_s, track_len/chunk_s); I = get_half(fft(I)); %%%% append image %%%% % plot before colormap(hsv); ax1 = subplot(2,1,1); image(sqrt(abs(Y1)) * 15); ax1.XTick = linspace(0, size(Y1,2), track_len/Fs); ax1.XTickLabel = 0:track_len/Fs; max_freq = floor(Fs / 2 / 1000) * 1000; ax1.YTickLabel = linspace(max_freq, 0, max_freq/2000 + 1); ax1.YTick = fliplr(linspace(max_freq, 0, max_freq/2000 + 1)/(Fs/2)*chunk_s/2); Y1 = apply_func(Y1, I); Y2 = apply_func(Y2, I); % plot after ax2 = subplot(2,1,2); image(sqrt(abs(Y1)) * 15); ax2.XTick = ax1.XTick; ax2.XTickLabel = ax1.XTickLabel; ax2.YTick = ax1.YTick; ax2.YTickLabel = ax1.YTickLabel; %%%% reversion %%%% y1 = ifft(get_whole(Y1)); y2 = ifft(get_whole(Y2)); y1 = reshape(y1, 1, track_len)'; y2 = reshape(y2, 1, track_len)'; %%%% low-pass filter %%%% df = designfilt('lowpassfir','FilterOrder',200,'CutoffFrequency',filter_freq/(Fs/2)); D = mean(grpdelay(df)); y1 = filter(df,[y1; zeros(D,1)]); y2 = filter(df,[y2; zeros(D,1)]); y1 = y1(D+1:end); y2 = y2(D+1:end); audiowrite(outname, [y1, y2], Fs);
First, there are many parameters to set:
- names of the files to read in and write out
- how long the part of the song with modified spectrum should be
- where in the track to place the image
- between which frequencies to place the image
- spectrum resolution to work with
Basically, we then read in the song and apply a FFT on it. We than throw away the first half of the FFT because our input data was real, and thus it is just a reflection of the second half. Then we find the maximum frequency occurring in the track now, so we can use a lowpass filter later to get rid of artifacts.
The comes the interesting step: we load the image, and add it to the FFT after scaling. After that, we just reverse all the transformations we did before, and apply a lowpass filter.
This is the resulting image (I used Spek to view the spectrum):
If you listen closely in the modified track, you can hear the chirping and missing of some frequencies in the modified version. This is why I choose a french house track: There are many high frequencies so it won’t be that bad when some go missing.