You have probably heard of the song [Equation] by Aphex Twin. It is known for having a scary face hidden in its spectrum.

So I asked myself, how can I do this by myself? It is possible using a bit of matlab and basic knowledge in signal processing.

I choose a short mp3 track to modify: Cartoon (Interlude) by French House Artist SabastiAn.

This is the image we’ll try to hide:

Here is the matlab code and an explanation for it.

```
clear;
fname = '12 Cartoon (Interlude).mp3';
imgname = 'img1.png';
outname = 'skull.wav';
% duration/pos. of projected image (seconds)
duration = 15;
offset = 2;
window_fact = 0;
%apply_func = @(S, Img) S .* (-(Img > 0)*5 + 1);
apply_func = @(S, Img) S ./ (1 + Img * 2);
%Hz
Fmin = 7000;
Fmax = 12000;
% vertical resolution of projected image
% bigger value makes stripe-pattern, sound
% gets chirping
hres = 200;
% read audio
[y,Fs] = audioread(fname);
track_len = size(y,1);
% chunk size
chunk_s = floor(hres / (Fmax-Fmin) * Fs/2) * 2;
% channels
y1 = y(:,1);
y2 = y(:,2);
pad_len = chunk_s - mod(track_len, chunk_s);
track_len = track_len + pad_len;
y1 = [y1; zeros(pad_len, 1)];
y2 = [y2; zeros(pad_len, 1)];
y1 = reshape(y1, chunk_s, track_len/chunk_s);
y2 = reshape(y2, chunk_s, track_len/chunk_s);
% we have real data, so only take first half of fft
get_half = @(Y) flipud(Y(1:chunk_s/2+1,:));
get_whole = @(Y) [flipud(Y); conj(Y(2:end-1,:))];
Y1 = get_half(fft(y1));
Y2 = get_half(fft(y2));
%%%% find top frequency for later filtering
freq = fliplr(log(sum(abs(Y1) + abs(Y2),2))');
[m, index] = max(smooth(freq < min(freq) * 1.1, chunk_s/10) > 0.5);
filter_freq = floor(index / size(Y1,1) * Fs / 2);
%%%% load the image, put in right shape %%%%
I = imread(imgname);
if ndims(I) == 3
I = rgb2gray(I);
end
I = imcomplement(I);
% I = imcomplement(I);
I = imresize(I, [hres duration*Fs/chunk_s]);
I = double(I);
% append zeros left and right
I = [zeros(hres, floor(offset*Fs/chunk_s)) I];
I = [I zeros(hres, size(Y1,2)-size(I,2))];
% append zeros top and bottom
I = [zeros(floor(chunk_s/2 - (chunk_s*Fmax/Fs)), size(Y1,2)); I];
I = [I; zeros((size(Y1,1) - size(I,1)), size(Y1,2))];
%%%% append window to filter %%%%
I = ifft(get_whole(I));
I = reshape(I, 1, track_len)';
window = hanning(size(I,1));
I = I .* (1 - window_fact + window_fact * window);
I = reshape(I', chunk_s, track_len/chunk_s);
I = get_half(fft(I));
%%%% append image %%%%
% plot before
colormap(hsv);
ax1 = subplot(2,1,1);
image(sqrt(abs(Y1)) * 15);
ax1.XTick = linspace(0, size(Y1,2), track_len/Fs);
ax1.XTickLabel = 0:track_len/Fs;
max_freq = floor(Fs / 2 / 1000) * 1000;
ax1.YTickLabel = linspace(max_freq, 0, max_freq/2000 + 1);
ax1.YTick = fliplr(linspace(max_freq, 0, max_freq/2000 + 1)/(Fs/2)*chunk_s/2);
Y1 = apply_func(Y1, I);
Y2 = apply_func(Y2, I);
% plot after
ax2 = subplot(2,1,2);
image(sqrt(abs(Y1)) * 15);
ax2.XTick = ax1.XTick;
ax2.XTickLabel = ax1.XTickLabel;
ax2.YTick = ax1.YTick;
ax2.YTickLabel = ax1.YTickLabel;
%%%% reversion %%%%
y1 = ifft(get_whole(Y1));
y2 = ifft(get_whole(Y2));
y1 = reshape(y1, 1, track_len)';
y2 = reshape(y2, 1, track_len)';
%%%% low-pass filter %%%%
df = designfilt('lowpassfir','FilterOrder',200,'CutoffFrequency',filter_freq/(Fs/2));
D = mean(grpdelay(df));
y1 = filter(df,[y1; zeros(D,1)]);
y2 = filter(df,[y2; zeros(D,1)]);
y1 = y1(D+1:end);
y2 = y2(D+1:end);
audiowrite(outname, [y1, y2], Fs);
```

First, there are many parameters to set:

- names of the files to read in and write out
- how long the part of the song with modified spectrum should be
- where in the track to place the image
- between which frequencies to place the image
- spectrum resolution to work with

Basically, we then read in the song and apply a FFT on it. We than throw away the first half of the FFT because our input data was real, and thus it is just a reflection of the second half. Then we find the maximum frequency occurring in the track now, so we can use a lowpass filter later to get rid of artifacts.

The comes the interesting step: we load the image, and add it to the FFT after scaling. After that, we just reverse all the transformations we did before, and apply a lowpass filter.

This is the resulting image (I used Spek to view the spectrum):

If you listen closely in the modified track, you can hear the chirping and missing of some frequencies in the modified version. This is why I choose a french house track: There are many high frequencies so it won’t be that bad when some go missing.