audio fingerprinting in browser

17 march 2026

audio fingerprinting in browser

I first paid attention to audio fingerprinting while looking through a third-party analytics script. Most of the script was the usual collection code, but one small piece stood out: it created an AudioContext, pushed a signal through a few Web Audio nodes, and read back floating-point samples.

No microphone prompt. No permission dialog. Nothing visible to the user.

That is the part that made me stop. The browser was not recording sound. It was using the audio stack as a math engine, then turning the tiny differences in the result into another fingerprinting signal.

Why Audio Works

Audio fingerprinting sounds odd until you look at what is being measured.

Different browsers, CPUs, operating systems, drivers, and audio backends do floating-point audio processing slightly differently. The differences are small. Usually they show up in the last few decimal places of a rendered sample.

Small does not mean useless. If the same machine gives the same small differences every time, and other machines give slightly different ones, that becomes a tracking signal.

The Web Audio API makes this easy because it is a normal browser feature. Games use it. Music tools use it. Visualizers use it. Creating an OfflineAudioContext does not ask the user for permission because it is not touching the microphone. It just renders audio into memory.

That is enough.

The Basic Setup

The useful object here is OfflineAudioContext.

A normal AudioContext plays through speakers. An offline context renders into a buffer silently, usually faster than real time. The result is a Float32Array containing the generated waveform.

Most demos start with something like this:

const ctx = new OfflineAudioContext(1, 44100, 44100);

That means:

channels:    1
samples:     44100
sample rate: 44100
duration:    1 second

No audio comes out. The page just gets numbers.

Oscillator And Compressor

The older technique is still the easiest one to understand. Generate a known waveform, run it through a compressor, render it, and hash the result.

const ctx = new OfflineAudioContext(1, 44100, 44100);

const osc = ctx.createOscillator();
const comp = ctx.createDynamicsCompressor();

osc.type = "triangle";
osc.frequency.setValueAtTime(10000, ctx.currentTime);

osc.connect(comp);
comp.connect(ctx.destination);

osc.start(0);

ctx.startRendering().then(buf => {
  const x = buf.getChannelData(0);
  let n = 0;

  for (let i = 4500; i < 5000; i++) {
    n += Math.abs(x[i]);
  }

  console.log(n);
});

The input is simple: a triangle wave at 10 kHz. The compressor then applies gain reduction. The final buffer contains thousands of floats.

The fingerprint can be a hash of the buffer, a sum of selected samples, or a few sample positions known to vary across systems.

I do not treat one audio number as a unique ID by itself. But it is a useful piece of entropy when combined with canvas, WebGL, fonts, timezone, GPU strings, and the rest of the usual fingerprinting pile.

Why The Numbers Differ

The browser spec does not force every implementation to produce bit-identical audio output in every case. The pipeline touches a lot of things:

browser engine code
CPU floating-point behavior
operating system audio APIs
sample-rate conversion
compressor implementation details
sometimes driver and hardware behavior

Chrome on Windows and Chrome on macOS can diverge. Firefox and Safari can diverge. Linux setups can diverge depending on the audio stack underneath.

The differences are not dramatic. That is what makes the technique annoying. Nothing looks broken. The audio API is doing what it is allowed to do. It just leaves a measurable trace.

Compressor Values

The DynamicsCompressorNode is popular because it has enough internal behavior to create variation.

It exposes parameters like:

comp.threshold.value;
comp.knee.value;
comp.ratio.value;
comp.attack.value;
comp.release.value;
comp.reduction;

The reduction value is the interesting one. It reflects the gain reduction the compressor is applying. Different engines can compute that curve a little differently.

A tracker can set the compressor to known values, render a short signal, then read the output samples and sometimes the compressor state.

It is not that one browser is "wrong". The spec gives enough room that implementations do not always land on the same exact floats.

FFT As Another Signal

Another version uses AnalyserNode and FFT output.

const analyser = ctx.createAnalyser();
analyser.fftSize = 2048;

osc.connect(analyser);
analyser.connect(ctx.destination);

const bins = new Float32Array(analyser.frequencyBinCount);
analyser.getFloatFrequencyData(bins);

The FFT breaks the signal into frequency bins. With an fftSize of 2048, the array has 1024 values.

The idea is the same: feed the browser a known signal and look at the exact numbers it returns. Some implementations compare the measured frequency bins with the theoretical bins for a clean sine wave, then use the error as the signal.

I would not start with FFT if I were writing a quick detector. It is slower and a little more awkward. But as a second signal, it can help separate machines that look similar on the basic oscillator test.

Using A Buffer Directly

A cleaner variant is to create the input samples yourself.

const buf = ctx.createBuffer(1, 4096, ctx.sampleRate);
const ch = buf.getChannelData(0);

for (let i = 0; i < ch.length; i++) {
  ch[i] = Math.sin((2 * Math.PI * i) / 440);
}

const src = ctx.createBufferSource();
src.buffer = buf;
src.connect(comp);
comp.connect(ctx.destination);
src.start(0);

This removes oscillator synthesis from the measurement. JavaScript creates the input, then the browser's audio pipeline processes it.

That gives you a slightly cleaner question:

same input samples -> what does this browser/audio stack return?

The tradeoff is small, but real: you spend time filling the buffer in JavaScript before rendering.

How Trackers Use It

Audio fingerprinting is usually not the whole fingerprint. It is one row in a larger table.

A library might collect:

audio sample sum
compressor output
canvas hash
WebGL renderer
installed font hints
screen size
timezone
language
platform strings

Then it combines those into one identifier.

The audio part often looks rougher than people expect:

let fp = 0;

for (let i = 200; i < 5000; i++) {
  fp += Math.abs(samples[i]);
}

Skipping the first few samples is common because startup behavior can be noisy. Summing across a range smooths out small one-off glitches while keeping the systematic differences.

That is the pattern I usually look for in third-party scripts: OfflineAudioContext, oscillator/compressor setup, render, then a sum or hash of the returned channel data.

Defending Against It

There is no perfect defense that keeps every audio feature working exactly as-is.

One option is randomization. Some privacy extensions add small noise to the audio output so the value changes between reads. That can break stable tracking, but it can also break legitimate audio apps. A determined tracker may also average repeated reads if the noise is too small or too simple.

Another option is blocking or normalizing the API. Firefox's privacy.resistFingerprinting changes several fingerprinting surfaces. Tor Browser goes further and tries to make users look the same, which is the point for that threat model.

The tradeoff is compatibility. Web Audio is not a weird API. Plenty of real apps use it.

What does not help much:

clearing cookies
changing User-Agent
using incognito mode alone

Audio fingerprinting is not reading your cookies. It is reading output from a local rendering path.

Timing Leaks

The output samples are not the only signal.

The time it takes to render an OfflineAudioContext can also carry information. Faster CPU, slower CPU, different browser scheduling, different audio implementation. All of that can move the timing.

A tracker can measure:

start timer
render offline context
stop timer

That timing is noisy, but fingerprinting does not need every signal to be perfect. It only needs enough weak signals to improve the composite result.

Audio worklets can expose similar throughput-style clues. How quickly can the browser process small chunks? How stable is the timing? Those measurements can say something about the machine even if the sample values are normalized.

What I Look For In Scripts

When I review third-party JavaScript, I usually search for:

OfflineAudioContext
createOscillator
createDynamicsCompressor
getChannelData
getFloatFrequencyData
startRendering

Any one of those can be legitimate. A game or audio tool will obviously use Web Audio. But if I see them inside analytics code, ad tech, fraud scoring, or a tag loaded on every page, I look closer.

The rough shape is easy to spot:

make offline audio context
generate signal
render silently
read floats
hash or sum floats
send result somewhere

That last step is the one I care about most. Rendering audio locally is not the issue by itself. Shipping the derived identifier to a third party is.

Closing Notes

Audio fingerprinting is a good example of how normal browser APIs become tracking surfaces.

The API was built for audio. The fingerprint comes from the tiny implementation differences around that audio.

I would not panic every time I see Web Audio on a page. But if a third-party analytics script silently renders audio and sends the result away, I want to know why.