acceptable-knife-37130
02/12/2025, 4:41 PMcontext_vec = context_vec.contiguous().view(b, num_tokens, self.d_out)
context_vec = self.out_proj(context_vec)
Here is my question:
a) Why do i need to do a contiguous call for merging the dimensions. I can write it as below and get the same result. Is there an issue, if i do it this way:
context_vec = context_vec.view(b, num_tokens, self.d_out)
b) I don’t understand the use of a output projection linear layer. In the book it says that bcos GPT uses it we are doing the same without giving an explanation why is it needed?
context_vec = self.out_proj(context_vec)