Matrix multipcation broadcasting?

zxygentoo · June 17, 2023, 5:11am

Hi everyone,

I’m trying to do the nanoGPT tutorial by Andrej Karpathy and stuck at a little piece of code something like this:

import troch

B, T, C = 32, 8, 65

x = torch.randn(B, T, C)

wei = torch.tril(torch.ones(T, T))
wei = wei / wei.sum(1, keepdim=True)

xbow2 = wei @ x

Everything above the last line is easy:

using Flux

B, T, C = 32, 8, 65

x = randn(B, T, C)

wei = tril(ones(T, T))
wei = wei ./ sum(wei, dims=2)

For the last line, PyTorch seems batch wei (T, T) → (B, T, T) then do a batch operation, how to achieve the same effect in Julia (repeat and broadcast seems not working, or I’m doing it wrong, batched_mul?)? Or anywhere to look things up? (my linear algebra is really rusty at this point)

thx~

Alex

nilshg · June 17, 2023, 5:47am

It would be good if you could show a minimal example of what the result in python is, but maybe you’re looking for kronecker?

bertschi · June 17, 2023, 6:59am

For a direct translation you will need to handle the batching yourself, e.g.

xbow2 = Compat.stack(wei * x[b, :, :] for b ∈ axes(x, 1); dims = 1)

The real issue is that Julia arrays are column-major whereas Torch is row-major. Thus, Flux has the Tensor dimensions reversed as compared to Torch, i.e., the batch dimension last instead of first, and batched matrix multiplication correctly works when you also translate the tensors:

x = randn(C, T, B)

xbow2 = batched_mul(x, wei')  # Note: Transpose wei as it was constructed according to Torch conventions

zxygentoo · June 17, 2023, 9:33am

The real issue is that Julia arrays are column-major whereas Torch is row-major.

Yes. exactly this. I’ve noticed the difference in previous code but not yet wrap my head around it (some onehot encoding stuff, I was able to line things up as default in Julia and avoiding reshaping to direclty match torch).

Thank for pointing it out! The goal here is not to mimic torch’s behavior but to work the tutuorial out and learn something about transformer/Julia.

Thank you again.

Topic		Replies	Views
TensorFlow `matmul` equivalent in Julia: matrix multiplication with two given tensor dimensions General Usage	5	932	January 22, 2020
Broadcasting arrays with different number of dimension General Usage broadcast , arrays	7	1024	April 10, 2021
How to perform explicit array broadcasting like np.broadcast_to/torch.expand in Julia? General Usage broadcast , array , tullio , views	11	1934	August 3, 2021
Batched Matrix Multiply General Usage gpu , blas , linearalgebra , cuarrays	11	3694	January 31, 2025
"I don't like NumPy" - Julia equivalents to the numpy code? General Usage gpu , gpuarrays , linearalgebra , linearsolve	19	1347	May 21, 2025

Matrix multipcation broadcasting?

Related topics