einsum('bf,fc->bfc', batched_inputs, channel_embedding)
Then carry that info through the network and project it down at the end. It's roughly equivalent to the token embedding step in an LLM.
einsum('bf,fc->bfc', batched_inputs, channel_embedding)
Then carry that info through the network and project it down at the end. It's roughly equivalent to the token embedding step in an LLM.