ai概念 | Winter is coming

大模型本质就是一个巨大包含权重的多维张量, 训练的过程就是，将信息转化为权重张量，调整整个嵌入层权重的过程

嵌入层本质上就是一个查找表，它存储了 vocab_size 个向量，每个向量的长度为 output_dim。当你输入一个词索引时，嵌入层会返回对应的向量。

分词器

Tokenizer是将文本数据转换为模型能够理解的格式的组件。分词器的主要功能是将输入的自然语言文本拆分为更小的单位k, 通常是词、子词、或字符，然后将这些单位映射为模型所能处理的数值（如词的索引或词向量）

tokenizer = tiktoken.get_encoding("gpt2")
通常tokenizer 拥有encode 和 decode 方法
encode ： 将文本转换为向量
decode ： 将向量转换为文本

text = "Hello, do you like tea? <|endoftext|> In the sunlit terraces of someunknownPlace."

integers = tokenizer.encode(text, allowed_special={"<|endoftext|>"})

strings = tokenizer.decode(integers)

Hello, do you like tea? <|endoftext|> In the sunlit terraces of someunknownPlace.

创建一个词汇表大小为6， 维度为3的张量，词汇表决定了大模型的上限
vocab_size = 50257
output_dim = 3

创建词元嵌入
torch.manual_seed(123)
pos_embedding_layer = torch.nn.Embedding(vocab_size, output_dim)
tensor([[[ 0.3793,  1.0554, -0.4246],
         [ 1.4180,  0.1776, -0.2737],
         [ 0.6189, -3.0485, -1.0450],
         [-1.1296, -0.5921, -0.0588],
         [ 1.6772, -0.8353,  0.7531],
         [-0.1515,  0.2832,  0.1554],
         [-0.7367,  2.1855,  0.2716],
         [ 0.0744, -0.8683, -0.5622],
         [ 0.7998,  1.8777,  1.0335],
         [-0.4080, -0.0293,  0.2531],
         [-2.1542,  1.3953,  1.1845],
         [ 0.5945, -0.4951, -0.5756],
         [-1.4126,  0.5412, -1.2169],
         [-0.0322, -0.4761, -0.8343],
         [ 0.9031, -0.7218, -0.5951]]], grad_fn=<EmbeddingBackward0>)

- 计算注意力分数, 点积
# 从输入序列中取出第二个元素作为查询向量
query = inputs[1]
# print(query)
# print(inputs)
# 创建一个空的张量来存储注意力分数，其形状与输入序列的批次大小相同
attn_scores_2 = torch.empty(inputs.shape[0])
# print(attn_scores_2)
# 遍历输入序列的每个元素
for i, x_i in enumerate(inputs):
    # 计算当前元素与查询向量的点积作为注意力分数
    # 这里不需要转置，因为假设输入向量是一维的
    # 手动实现
    s = 0
    for m, n in enumerate(query):
        s = s + query[m] * inputs[i][m]
    #torch 官方库实现 
    # attn_scores_2[i] = torch.dot(x_i, query)
    attn_scores_2[i] = s
# 打印注意力分数
print(attn_scores_2)


# 使用注意力分数 除以 注意力分数之和， 就是注意力权重
attn_weights_2_tmp = attn_scores_2 / attn_scores_2.sum()
print(attn_scores_2)
print(attn_scores_2.sum())
# 打印归一化后的注意力权重
print("Attention weights:", attn_weights_2_tmp)
# 验证归一化后的注意力权重之和是否为1
print("Sum:", attn_weights_2_tmp.sum())

批次：

将数据集分割为若干个小块，每次从中取出一个小块的数据进行训练，
这个小块就称为 批次（Batch）。

步长：

两个批次之间的offset差
[tensor([[  40,  367, 2885, 1464]]), tensor([[ 367, 2885, 1464, 1807]])

加权平均的例子(应用：计算平均股价)

假设有三个数值 x1=3, x2=5, x3i=8，它们的对应权重分别为 w1=1、w2=2、w3=3
那么加权平均计算如下：
加权平均=1⋅3+2⋅5+3⋅81+2+3=3+10+246=376≈6.17

点积概念

是线性代数中的一种操作，它是两个向量相乘并得到一个标量

向量 a = [1, 2, 3]
向量 b = [4, 5, 6]

它们的点积计算为：
a⋅b=(1×4)+(2×5)+(3×6)=4+10+18=32

点积的几何意义可以通过以下公式表示：a⋅b=∥a∥∥b∥cosθ