Tue May 2 06:43:11 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12 Driver Version: 525.85.12 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |
| N/A 60C P8 11W / 70W | 0MiB / 15360MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Download data
If the Google Drive links below do not work, you can use the dropbox link below or download data from Kaggle, and upload data manually to the workspace.
如果下面的 Google Drive 链接不起作用,您可以使用下面的 dropbox 链接或从 Kaggle 下载数据,然后手动将数据上传到工作区。
1 2 3 4 5 6 7
# google drive link # !gdown --id '1BjXalPZxq9mybPKNjF3h5L3NcF7XKTS-' --output covid_train.csv # !gdown --id '1B55t74Jg2E5FCsKCsUEkPKIuqaY7UIi1' --output covid_test.csv
deftrainer(train_loader, valid_loader, model, config, device): # 定义了一个均方误差损失函数 MSELoss,用于计算模型预测结果和真实标签之间的差异。 criterion = nn.MSELoss(reduction='mean') # Define your loss function, do not modify this.
# Define your optimization algorithm. 定义你自己的优化函数 # TODO: Please check https://pytorch.org/docs/stable/optim.html to get more available algorithms. # TODO: L2 regularization (optimizer(weight decay...) or implement by your self). L2 正则化(优化器(权重衰减…)或自行实现)。 ''' 定义了一个随机梯度下降(SGD)优化器,用于优化模型的参数。 其中,model.parameters()表示需要优化的模型参数, lr=config['learning_rate']表示学习率, momentum=0.7 表示使用动量法进行优化。 ''' optimizer = torch.optim.SGD(model.parameters(), lr=config['learning_rate'], momentum=0.7) writer = SummaryWriter() # Writer of tensoboard. 创建了一个 SummaryWriter 对象,用于将训练过程中的监控指标写入 tensorboard 日志文件。
for epoch inrange(n_epochs): # 开始迭代训练,共进行 n_epochs 轮训练。 model.train() # Set your model to train mode. 将模型切换为训练模式。 loss_record = [] # 新建一个列表以存储每一个 epoch 损失函数的值。
# tqdm is a package to visualize your training progress. 使用 tqdm 库创建一个进度条,用于可视化训练进度。 train_pbar = tqdm(train_loader, position=0, leave=True)
for x, y in train_pbar: # 开始迭代训练集,依次提取出输入特征 x 和标签 y。 optimizer.zero_grad() # Set gradient to zero. 清零梯度,避免上一次的梯度对本次梯度的影响。 x, y = x.to(device), y.to(device) # Move your data to device. 将输入特征 x 和标签 y 复制到 GPU 设备上进行加速计算。 pred = model(x) # 将输入特征 x 输入到模型中,得到预测结果 pred。 loss = criterion(pred, y) # 计算预测结果 pred 和真实标签 y 之间的误差,即损失函数值。 loss.backward() # Compute gradient(backpropagation). 自动计算损失函数对各个参数的梯度。 optimizer.step() # Update parameters. 通过优化器更新模型参数。 step += 1 loss_record.append(loss.detach().item()) # 将每一批次的损失函数值记录下来。
# Display current epoch number and loss on tqdm progress bar. train_pbar.set_description(f'Epoch [{epoch+1}/{n_epochs}]') # 在进度条中显示当前轮数和总轮数。 train_pbar.set_postfix({'loss': loss.detach().item()}) # 在进度条中显示当前批次的损失函数值。
model.eval() # Set your model to evaluation mode. 将模型切换为评估模式,用于对验证集进行预测和评估。 loss_record = [] # 新建一个列表以存储每一个 epoch 损失函数的值。 for x, y in valid_loader: # 开始迭代验证集,依次提取出输入特征 x 和标签 y。 x, y = x.to(device), y.to(device) # 将输入特征 x 和标签 y 复制到 GPU 设备上进行加速计算。 with torch.no_grad(): # 评估验证集的时候不改变模型参数,关闭梯度 pred = model(x) # 将输入特征 x 输入到模型中,得到预测结果 pred。 loss = criterion(pred, y) # 计算预测结果 pred 和真实标签 y 之间的误差,即损失函数值。
if mean_valid_loss < best_loss: ''' 如果当前轮的验证集平均损失函数值优于历史最佳值,则更新最佳损失值和最佳模型参数,并将模型保存到指定路径 ''' best_loss = mean_valid_loss torch.save(model.state_dict(), config['save_path']) # Save your best model print('Saving model with loss {:.3f}...'.format(best_loss)) early_stop_count = 0 else: # 如果当前轮的验证集平均损失函数值没有优于历史最佳值,则早停计数器加 1 early_stop_count += 1
if early_stop_count >= config['early_stop']: # 如果早停计数器超过早停阈值,则停止训练并返回 print('\nModel is not improving, so we halt the training session.') return
Configurations
config contains hyper-parameters for training and the path to save your model.
config 包含用于训练的超参数和保存模型的路径。
1 2 3 4 5 6 7 8 9 10 11 12
device = 'cuda'if torch.cuda.is_available() else'cpu' config = { 'seed': 5201314, # Your seed number, you can pick your lucky number. :) 'select_all': True, # Whether to use all features. 'valid_ratio': 0.2, # validation_size = train_size * valid_ratio 'n_epochs': 5000, # Number of epochs. 'batch_size': 256, 'learning_rate': 1e-5, 'early_stop': 600, # If model has not improved for this many consecutive epochs, stop training. 若模型在这么多连续的时期内并没有得到改善,就停止训练。 'save_path': './models/model.ckpt'# Your model will be saved here. }
Dataloader
Read data from files and set up training, validation, and testing sets. You do not need to modify this part.
train_data size: (2408, 89)
valid_data size: (601, 89)
test_data size: (997, 88)
number of features: 88
Start training!
1 2
model = My_Model(input_dim=x_train.shape[1]).to(device) # put your model and data on the same computation device. trainer(train_loader, valid_loader, model, config, device)
Epoch [1/5000]: 100%|██████████| 10/10 [00:03<00:00, 3.26it/s, loss=131]
Epoch [1/5000]: Train loss: 263.8631, Valid loss: 94.9638
Saving model with loss 94.964...
tensorboard is a tool that allows you to visualize your training progress.
If this block does not display your learning curve, please wait for few minutes, and re-run this block. It might take some time to load your logging information.
Helper functions to pre-process the training data from raw MFCC features of each utterance.
辅助函数用于预处理来自每个话语的原始 MFCC 特征的训练数据。
A phoneme may span several frames and is dependent to past and future frames.
Hence we concatenate neighboring phonemes for training to achieve higher accuracy. The concat_feat function concatenates past and future k frames (total 2k+1 = n frames), and we predict the center frame.
一个音素可能跨越几个帧,并依赖于过去和未来的帧。
因此,我们将相邻的音素连接起来进行训练,以获得更高的精度。concat_filt函数连接过去和未来的 k 帧(总共 2k+1=n 帧),我们预测中心帧。
Feel free to modify the data preprocess functions, but do not drop any frame (if you modify the functions, remember to check that the number of frames are the same as mentioned in the slides)
defshift(x, n): ''' shift 函数用来实现对一个一维或二维的 Tensor x 进行循环移位操作。其中,移位的距离为整数值 n,可以为正、负、零。 repeat() 函数是用来对一个序列进行重复的函数,即将原序列中的元素按照指定次数进行重复。 具体而言,对于一个具有 n 个元素的序列 x, 调用 x.repeat(m) 函数可以得到一个新的序列,其中包含原序列 x 中的所有元素,每个元素均重复 m 次。 例如,对于 [1, 2, 3] 序列执行 repeat(3) 操作后得到的序列为 [1, 2, 3, 1, 2, 3, 1, 2, 3]。 ''' if n < 0: ''' 如果 n < 0,表示向左移动, 此时函数会将 x 最左侧的 n 个元素复制到 x 的最右侧, 同时将 x 原来的前 n 个元素截取出来放到 x 的末尾; ''' left = x[0].repeat(-n, 1) right = x[:n] elif n > 0: ''' 如果 n > 0,表示向右移动, 此时函数会将 x 最右侧的 n 个元素复制到 x 的最左侧, 同时将 x 原来的后 n 个元素截取出来放到 x 的开头; ''' right = x[-1].repeat(n, 1) left = x[n:] else: ''' 如果 n = 0,表示不需要移位,直接返回原始的 x。 ''' return x # 使用 torch.cat() 函数将左右两个部分进行拼接,其中 dim=0 表示在第 0 维(即在行方向上)进行拼接。最后返回拼接后的结果。 return torch.cat((left, right), dim=0)
label_dict = {} if mode == 'train': # 如果是训练模式 # 从标签文件中读取标签信息到字典 label_dict 中 for line inopen(os.path.join(phone_path, f'{mode}_labels.txt')).readlines(): line = line.strip('\n').split(' ') label_dict[line[0]] = [int(p) for p in line[1:]]
# split training and validation data # 从训练集划分文件中读取训练集和验证集的使用情况,并按照参数 train_ratio 进行划分。 usage_list = open(os.path.join(phone_path, 'train_split.txt')).readlines() random.shuffle(usage_list) train_len = int(len(usage_list) * train_ratio) usage_list = usage_list[:train_len] if split == 'train'else usage_list[train_len:]
# data prarameters 数据参数 # 要连接的帧数,n 必须是奇数(总共 2k+1=n 帧) concat_nframes = 3# the number of frames to concat with, n must be odd (total 2k+1 = n frames) # 用于训练的数据比例,其余数据将用于验证 train_ratio = 0.75# the ratio of data used for training, the rest will be used for validation
# training parameters 训练参数 seed = 1213# random seed 随机数种子 batch_size = 512# batch size 批大小 num_epoch = 10# the number of training epoch,epoch 次数 learning_rate = 1e-4# learning rate 学习率 model_path = './model.ckpt'# the path where the checkpoint will be saved 检查点保存位置
# model parameters 模型参数 # 模型的输入维数,不应更改该值 input_dim = 39 * concat_nframes # the input dim of the model, you should not change the value # 隐藏层数 hidden_layers = 2# the number of hidden layers # 隐藏层维数 hidden_dim = 64# the hidden dim
DEVICE: cuda
[Dataset] - # phone classes: 41, number of utterances for train: 2571
2571it [00:23, 107.38it/s]
[INFO] train set
torch.Size([1588590, 117])
torch.Size([1588590])
[Dataset] - # phone classes: 41, number of utterances for val: 858
858it [00:02, 308.31it/s]
[INFO] val set
torch.Size([525078, 117])
torch.Size([525078])
# training 将模型设为“训练模式” model.train() # set the model to training mode for i, batch inenumerate(tqdm(train_loader)): # 对 train_loader 中的每个 batch 进行循环操作 # 在每个 batch 中,将 features 和 labels 数据移动到之前设置的设备上(即 device) features, labels = batch features = features.to(device) labels = labels.to(device)
# 计算损失值 loss loss = criterion(outputs, labels) # 通过反向传播计算出梯度 loss.backward() # 更新模型参数 optimizer.step()
# 获取预测结果中的最大值 _, train_pred = torch.max(outputs, 1) # get the index of the class with the highest probability # 将其与标签进行比较,得到得到 train_acc 的值 ''' train_pred 是模型在训练集上的预测结果,labels 是该 batch 中样本的标签。 train_pred.detach() 和 labels.detach() 是为了防止其梯度反向传播而被计算的前后关联(detach 方法是将一个 tensor 从计算图中分离出来,不再参与自动求导)。 通过 train_pred.detach() == labels.detach() 的判断,得到一个 boolean 类型的 Tensor,表示模型预测结果是否正确。 接着对这个 Tensor 进行 sum() 操作,计算其中 True 的元素个数,即预测正确的样本数。 最后使用 item() 方法将结果作为 Python 的 float 类型,累加到 train_acc 变量中。 ''' train_acc += (train_pred.detach() == labels.detach()).sum().item() # 将损失值加入 train_loss 中 train_loss += loss.item()
# validation model.eval() # set the model to evaluation mode 将模型设为“评估模式” with torch.no_grad(): for i, batch inenumerate(tqdm(val_loader)): # 对 val_loader 中的每个 batch 进行循环操作 features, labels = batch features = features.to(device) labels = labels.to(device) outputs = model(features) # 调用模型进行前向计算
loss = criterion(outputs, labels) # 计算 loss 损失值
_, val_pred = torch.max(outputs, 1) # 计算出预测准确率 val_acc 和 val_loss val_acc += (val_pred.cpu() == labels.cpu()).sum().item() # get the index of the class with the highest probability val_loss += loss.item() # 在每个 epoch 结束后,计算出 train_acc、train_loss、val_acc 和 val_loss 的平均值,并将其输出以监控训练过程 print(f'[{epoch+1:03d}/{num_epoch:03d}] Train Acc: {train_acc/len(train_set):3.5f} Loss: {train_loss/len(train_loader):3.5f} | Val Acc: {val_acc/len(val_set):3.5f} loss: {val_loss/len(val_loader):3.5f}')
# if the model improves, save a checkpoint at this epoch """ 如果当前模型的验证准确率 val_acc 超过了之前的最佳准确率 best_acc, 则将 best_acc 更新为当前的 val_acc 值, 将模型的参数保存到指定文件名的模型路径 model_path 中, 并输出日志记录保存的模型及其准确率值。 """ if val_acc > best_acc: best_acc = val_acc torch.save(model.state_dict(), model_path) print(f'saving model with acc {best_acc/len(val_set):.5f}')
[Dataset] - # phone classes: 41, number of utterances for test: 857
857it [00:08, 103.10it/s]
[INFO] test set
torch.Size([527364, 117])
1 2 3
## load model model = Classifier(input_dim=input_dim, hidden_layers=hidden_layers, hidden_dim=hidden_dim).to(device) model.load_state_dict(torch.load(model_path))
<All keys matched successfully>
Make prediction.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
pred = np.array([], dtype=np.int32) # 创建一个空的 numpy 数组 pred,其数据类型为 np.int32
model.eval() # 将模型设置为“评估模式” with torch.no_grad(): # 使用 with torch.no_grad() 上下文管理器,以避免在评估模式下无意中修改了梯度。 for i, batch inenumerate(tqdm(test_loader)): # 在 test_loader 上进行循环操作,每次从中取出一个 batch,并将其转换到指定的设备 device 上 # 使用模型对 features 进行前向传播,得到预测结果 outputs features = batch features = features.to(device)
outputs = model(features)
# 通过 torch.max(outputs, 1) 可以得到每个样本在每个类别上的分数, # _ 表示分数张量,test_pred 是在第 1 维度(即类别)上具有最大值的索引,代表模型预测的类别 _, test_pred = torch.max(outputs, 1) # get the index of the class with the highest probability # 将 test_pred 转回 numpy 数组,并使用 np.concatenate() 方法将其与之前的 pred 数组进行拼接,生成更新后的预测结果 pred = np.concatenate((pred, test_pred.cpu().numpy()), axis=0) # 最终,当所有测试集的样本都被预测完毕后,pred 数组中将保存模型在测试集上的所有预测结果。
Tue May 2 10:05:59 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.161.03 Driver Version: 470.161.03 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... Off | 00000000:00:04.0 Off | 0 |
| N/A 33C P0 25W / 250W | 0MiB / 16280MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Import Packages
1
_exp_name = "sample"
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
# Import necessary packages. import numpy as np import pandas as pd import torch import os import torch.nn as nn import torchvision.transforms as transforms from PIL import Image # "ConcatDataset" and "Subset" are possibly useful when doing semi-supervised learning. # 在进行半监督学习时,“ConcatDataset”和“Subset”可能很有用。 from torch.utils.data import ConcatDataset, DataLoader, Subset, Dataset from torchvision.datasets import DatasetFolder, VisionDataset # This is for the progress bar. from tqdm.auto import tqdm import random
1 2 3 4 5 6 7
myseed = 6666# set a random seed for reproducibility torch.backends.cudnn.deterministic = True torch.backends.cudnn.benchmark = False np.random.seed(myseed) torch.manual_seed(myseed) if torch.cuda.is_available(): torch.cuda.manual_seed_all(myseed)
# Normally, We don't need augmentations in testing and validation. # 通常情况下,我们不需要在测试和验证中进行增强。 # All we need here is to resize the PIL image and transform it into Tensor. # 这里我们所需要的只是调整 PIL 图像的大小并将其转换为张量。 test_tfm = transforms.Compose([ transforms.Resize((128, 128)), transforms.ToTensor(), ])
# However, it is also possible to use augmentation in the testing phase. # 然而,在测试阶段也可以使用增强功能。 # You may use train_tfm to produce a variety of images and then test using ensemble methods # 您可以使用 train_tfm 生成各种图像,然后使用集成方法进行测试 train_tfm = transforms.Compose([ # Resize the image into a fixed shape (height = width = 128) transforms.Resize((128, 128)), # You may add some transforms here.
# ToTensor() should be the last one of the transforms. ToTensor()应该是最后一个 transform。 transforms.ToTensor(), ])
# "cuda" only when GPUs are available. device = "cuda"if torch.cuda.is_available() else"cpu"
# Initialize a model, and put it on the device specified. model = Classifier().to(device)
# The number of batch size. batch_size = 64
# The number of training epochs. n_epochs = 8
# If no improvement in 'patience' epochs, early stop. patience = 300
# For the classification task, we use cross-entropy as the measurement of performance. criterion = nn.CrossEntropyLoss()
# Initialize optimizer, you may fine-tune some hyperparameters such as learning rate on your own. optimizer = torch.optim.Adam(model.parameters(), lr=0.0003, weight_decay=1e-5)
Dataloader
1 2 3 4 5 6
# Construct train and valid datasets. # The argument "loader" tells how torchvision reads the data. train_set = FoodDataset("/kaggle/input/ml2023spring-hw3/train", tfm=train_tfm) train_loader = DataLoader(train_set, batch_size=batch_size, shuffle=True, num_workers=0, pin_memory=True) valid_set = FoodDataset("/kaggle/input/ml2023spring-hw3/valid", tfm=test_tfm) valid_loader = DataLoader(valid_set, batch_size=batch_size, shuffle=True, num_workers=0, pin_memory=True)
# Initialize trackers, these are not parameters and should not be changed stale = 0 best_acc = 0
for epoch inrange(n_epochs):
# ---------- Training ---------- # Make sure the model is in train mode before training. model.train()
# These are used to record information in training. train_loss = [] train_accs = []
for batch in tqdm(train_loader):
# A batch consists of image data and corresponding labels. imgs, labels = batch #imgs = imgs.half() #print(imgs.shape,labels.shape)
# Forward the data. (Make sure data and model are on the same device.) logits = model(imgs.to(device))
# Calculate the cross-entropy loss. # We don't need to apply softmax before computing cross-entropy as it is done automatically. # 在计算交叉熵之前,我们不需要应用 softmax,因为它是自动完成的。 loss = criterion(logits, labels.to(device))
# Gradients stored in the parameters in the previous step should be cleared out first. # 应首先清除上一步中存储在参数中的梯度。 optimizer.zero_grad()
# Compute the gradients for parameters. loss.backward()
# Print the information. print(f"[ Train | {epoch + 1:03d}/{n_epochs:03d} ] loss = {train_loss:.5f}, acc = {train_acc:.5f}")
# ---------- Validation ---------- # Make sure the model is in eval mode so that some modules like dropout are disabled and work normally. model.eval()
# These are used to record information in validation. valid_loss = [] valid_accs = []
# Iterate the validation set by batches. for batch in tqdm(valid_loader):
# A batch consists of image data and corresponding labels. imgs, labels = batch #imgs = imgs.half()
# We don't need gradient in validation. # Using torch.no_grad() accelerates the forward process. with torch.no_grad(): logits = model(imgs.to(device))
# We can still compute the loss (but not the gradient). loss = criterion(logits, labels.to(device))
# Compute the accuracy for current batch. acc = (logits.argmax(dim=-1) == labels.to(device)).float().mean()
# Record the loss and accuracy. valid_loss.append(loss.item()) valid_accs.append(acc) #break
# The average loss and accuracy for entire validation set is the average of the recorded values. valid_loss = sum(valid_loss) / len(valid_loss) valid_acc = sum(valid_accs) / len(valid_accs)
# Print the information. print(f"[ Valid | {epoch + 1:03d}/{n_epochs:03d} ] loss = {valid_loss:.5f}, acc = {valid_acc:.5f}")
# save models if valid_acc > best_acc: print(f"Best model found at epoch {epoch}, saving model") torch.save(model.state_dict(), f"{_exp_name}_best.ckpt") # only save best to prevent output memory exceed error best_acc = valid_acc stale = 0 else: stale += 1 if stale > patience: print(f"No improvment {patience} consecutive epochs, early stopping") break
0%| | 0/157 [00:00<?, ?it/s]
[ Train | 001/008 ] loss = 1.87167, acc = 0.34385
0%| | 0/57 [00:00<?, ?it/s]
[ Valid | 001/008 ] loss = 1.87423, acc = 0.34339
[ Valid | 001/008 ] loss = 1.87423, acc = 0.34339 -> best
Best model found at epoch 0, saving model
# --- Dataset --- import os import json import torch import random from pathlib import Path from torch.utils.data import Dataset from torch.nn.utils.rnn import pad_sequence
# --- Dataloader --- import torch from torch.utils.data import DataLoader, random_split from torch.nn.utils.rnn import pad_sequence
defcollate_batch(batch): # Process features within a batch. """Collate a batch of data.""" mel, speaker = zip(*batch) # 因为我们一批一批地训练模型,所以我们需要在同一批中填充特征,以使它们的长度相同。 # 对于较短的 mel 音频特征,pad_sequence 函数会自动添加 0 填充。 # 这样,在训练神经网络时,不同长度的 mel 特征就可以组成一个 batch 进行训练了。 # Because we train the model batch by batch, we need to pad the features in the same batch to make their lengths the same. mel = pad_sequence(mel, batch_first=True, padding_value=-20) # pad log 10^(-20) which is very small value. # mel: (batch size, length, 40) return mel, torch.FloatTensor(speaker).long()
# Save the best model so far. if (step + 1) % save_steps == 0and best_state_dict isnotNone: torch.save(best_state_dict, save_path) pbar.write(f"Step {step + 1}, best model saved. (accuracy={best_accuracy:.4f})")