Hi, I have a question related to building a large...
# ai-reading-club
a
Hi, I have a question related to building a large language model from scratch book by Sabastian. The discussion happended 2 months back but i have a few questions. If anyone is aware and can help me out with that: In the Chapter where we code Attention and in the section for MultiHeadAttention code (section 3.6.2), the code given as an example is below:
Copy code
context_vec = context_vec.contiguous().view(b, num_tokens, self.d_out)
context_vec = self.out_proj(context_vec)
Here is my question: a) Why do i need to do a contiguous call for merging the dimensions. I can write it as below and get the same result. Is there an issue, if i do it this way:
context_vec = context_vec.view(b, num_tokens, self.d_out)
b) I don’t understand the use of a output projection linear layer. In the book it says that bcos GPT uses it we are doing the same without giving an explanation why is it needed?
context_vec = self.out_proj(context_vec)