How to onehot encode batches of sequences?

e3c6 · January 11, 2020, 2:49pm

Is there a recommended way to one-hot encode a batch of sequences?

More precisely, I am modelling sequences from an alphabet with q letters. A sequence of length N can be one-hot encoded as a q x N one-hot matrix (e.g., using Flux.OneHotMatrix). Then it seems that to encode a batch of B sequences I would need a q x N x B “one-hot tensor”.

What’s the recommended approach?

e3c6 · January 11, 2020, 3:05pm

One possibility is to put the data into a q x (N * B) one-hot matrix, and then reshape this into q x N x B for downstream processing. But this is slow.

mantzaris · January 17, 2023, 12:37am

I currently put each one hot matrix as a batch and then that matrix into a vector so that the batch is each 1 hot matrix where the N are the sequence steps, and in the loop I restack the data via

x_batch = [Flux.stack([Float32.(x[ii][:,tt]) for ii in 1:length(x)],dims=2) for tt in 1:length(x[1][1,:]) ]

I agree that it is suboptimal and am about to ask a question with more detail provided to get a best practice response hopefully.

Topic		Replies	Views
How to format sequential data to be used in reccurence models when batches are needed? Machine Learning question , flux	6	447	January 18, 2023
Coming from python this took a while to figure out New to Julia	3	296	March 17, 2024
All the ways to do one-hot encoding General Usage	30	11318	October 20, 2024
Using a onehot vector together with other data Machine Learning question	2	474	September 23, 2019
How to add Padding/Zero Rows to OneHotArray in Flux General Usage flux , sparse	1	340	June 28, 2022

How to onehot encode batches of sequences?

Related topics