Latent Granular
Latent Granular
Latent Granular is an implementation of corpus-based granular resynthesis that operates entirely in the latent space of a neural audio codec, following the approach described by Tokui and Baker in their 2025 paper Latent Granular Resynthesis using Neural Audio Codecs.
The basic idea: instead of matching and splicing grains of raw audio, we encode a source corpus into a latent representation, then find the best-matching grain for each moment of a target signal using cosine similarity in that latent space. The matched grains are assembled into a new latent sequence and decoded back to audio.
Codecs
The paper uses Music2Latent (Pasini, Lattner, Fazekas — ISMIR 2024), a consistency autoencoder from Sony CSL that encodes audio into a continuous 64-dimensional latent sequence at roughly 10.75 Hz. That works out to about one vector per 93ms of audio. The latent space is compact, continuous, and smooth enough that cosine similarity between vectors is perceptually meaningful. We also implemented support for DAC (MIT license) as an alternative, which produces 1024-dimensional vectors at ~86 Hz before RVQ quantization. Finer-grained but considerably noisier at the grain boundaries, for reasons described below.
Codebook
The source corpus is encoded once and saved as a codebook of grains. To increase the diversity of available grains, we apply augmentation at n-squared: every pitch shift is combined with every volume scaling, generating file names like source_p+5_v30.wav. The encoded latents are cached per file so that changing grain parameters does not require re-encoding through the network.
For each target frame, we compute cosine similarity against the entire codebook on GPU, sample from the resulting distribution with a temperature parameter, and reassemble in latent space. The codec decoder then handles the rest.
Envelope Follower
One additional matching strategy replaces cosine similarity with amplitude. Codebook grains are sorted by L2 norm of their latent vectors as an energy proxy. The target’s RMS envelope, computed per grain window, then maps directly to a position in the sorted list. Quiet target moments pull from low-energy source grains; loud moments pull from the energetic end of the codebook.
Latent Space Exploration
To get a feel for what the individual latent dimensions actually control, we built a cell that steps through a set of codebook entries while progressively modulating dimensions with sinusoids. Dimension 0 begins oscillating at a base rate; each subsequent dimension joins with a staggered onset and doubled frequency. The session runs for about 80 seconds and is saved to a file.
Above the Nyquist rate of the latent sequence (~5 Hz for Music2Latent), dimensions alias back into lower frequencies in unpredictable ways. The lower dimensions produce recognizable sweeps; the upper ones produce something more chaotic, which is not unwelcome.
Latent Scoring
Beyond frame-by-frame cosine matching, the codebook can be used as a palette for composed sequences. A score is simply an ordered list of codebook indices with durations. Between each entry, we LERP in latent space from the current grain’s vector to the next, one output vector at a time, so the decoder sees a continuously varying input rather than a hard cut or a tiled repetition.
score = [ (30, 3), (2, 1), (30, 3), (2, 1), (0, 7), (8, 1), (0, 7) ]
The result is something between granular synthesis and a kind of timbral score: you are not sequencing notes or samples but positions in a learned acoustic space, with the codec decoder interpreting the trajectory between them. The transitions are smooth in the sense that the latent space is smooth, which is a property of the codec’s training rather than anything we impose.
For Music2Latent, with vectors at ~10.75 Hz, the movement between two codebook entries is audible as a slow morphing over whatever duration you assign. For DAC at ~86 Hz the transitions are finer-grained but the architectural buzz at the vector hop rate (~86 Hz, 512 samples) remains present underneath.
Source
Full source and notebook at github.com/lucaskuzma/LatentGranular.