pytorch中的bug总结(更新)

pytorch中的bug总结(更新)

出的bug一定要记下来,这些都是经验,不然以后再遇到了还是会花一些时间去找它.

读model的时候出现的missing keys 或者是 upexpected keys

比如

RuntimeError: Error(s) in loading state_dict for SiameseAlexNet:
	Missing key(s) in state_dict: "corr_bias", "features.0.bias", "features.0.weight", "features.1.bias", "features.1.running_mean", "features.1.running_var", "features.1.weight", "features.4.bias", "features.4.weight", "features.5.bias", "features.5.running_mean", "features.5.running_var", "features.5.weight", "features.8.bias", "features.8.weight", "features.9.bias", "features.9.running_mean", "features.9.running_var", "features.9.weight", "features.11.bias", "features.11.weight", "features.12.bias", "features.12.running_mean", "features.12.running_var", "features.12.weight", "features.14.bias", "features.14.weight".
	Unexpected key(s) in state_dict: "module.corr_bias", "module.features.0.weight", "module.features.0.bias", "module.features.1.weight", "module.features.1.bias", "module.features.1.running_mean", "module.features.1.running_var", "module.features.1.num_batches_tracked", "module.features.4.weight", "module.features.4.bias", "module.features.5.weight", "module.features.5.bias", "module.features.5.running_mean", "module.features.5.running_var", "module.features.5.num_batches_tracked", "module.features.8.weight", "module.features.8.bias", "module.features.9.weight", "module.features.9.bias", "module.features.9.running_mean", "module.features.9.running_var", "module.features.9.num_batches_tracked", "module.features.11.weight", "module.features.11.bias", "module.features.12.weight", "module.features.12.bias", "module.features.12.running_mean", "module.features.12.running_var", "module.features.12.num_batches_tracked", "module.features.14.weight", "module.features.14.bias".

可以发现, 两个模型其实是一样的,但是在load的时候却对不上,这个bug不容易搞明白,看key的话可以发现,其实 后者在前者的基础上每一个都加了个’module’, 那么改的时候当然可以把key还原进行改,

  • 错误根源

因为训练的时候,数据集常常比较大,所以会用多个GPU,即往往会加入这样一句话在训练的时候


model = nn.DataParallel(model)

不加这个的话,default都是一个GPU,加了的话就会用多个GPU,但是也是需要指定具体的用哪几个GPU, 但是在测试推理的时候,如果另写代码做测试的话,一个容易出的bug就在这里,因为测试的时候往往不需要那么多的GPU,一个general就可以了,这时候就不会加上面那句话了,如果不加就会报上面的错了,这是因为模型文件保存的时候是以多个GPU的模式进行保存的,但是现在测试的时候用的是一个GPU的模式的,所以在测试的时候也加上那句话就没事儿了.

EOFError

 magic_number = pickle_module.load(f)
EOFError: Ran out of input

这个出错的原因很有可能是模型文件受损,或者就是个空文件,可以通过查看其大小来检查是否为空文件.

upper bound and larger bound


>>> import torch
>>> torch.arange(0,-1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: invalid argument 2: upper bound and larger bound inconsistent with step sign at /opt/conda/conda-bld/pytorch_1532582123400/work/aten/src/TH/generic/THTensorMath.cpp:2948
>>>

这个主要是用arange的时候,第一个数不能比第二个数大,要么就像range一样这样用range(0,-10,-1)

expand使用的时候要注意的地方

  • 先看一个错误的情况

>>> a = torch.rand(2,3)
>>> a.expand(4,2,3)
tensor([[[0.9970, 0.7571, 0.6880],
         [0.7481, 0.8553, 0.8888]],

        [[0.9970, 0.7571, 0.6880],
         [0.7481, 0.8553, 0.8888]],

        [[0.9970, 0.7571, 0.6880],
         [0.7481, 0.8553, 0.8888]],

        [[0.9970, 0.7571, 0.6880],
         [0.7481, 0.8553, 0.8888]]])
>>> b=a.expand(4,2,3)
>>> b.view(-1,3)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: invalid argument 2: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Call .contiguous() before .view(). at /opt/conda/conda-bld/pytorch_1532582123400/work/aten/src/TH/generic/THTensor.cpp:237
>>>

这个错需要先这里

b = b.contiguous().view()
  • 关于expand的另一个错
In [23]: a = torch.rand(2,3)

In [24]: a.expand(2,4,3)
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-24-ae91e3d924f0> in <module>
----> 1 a.expand(2,4,3)

RuntimeError: The expanded size of the tensor (4) must match the existing size (2) at non-singleton dimension 1

In [25]: a.view(2,1,3).expand(2,4,3)
Out[25]:
tensor([[[0.8609, 0.9992, 0.2209],
         [0.8609, 0.9992, 0.2209],
         [0.8609, 0.9992, 0.2209],
         [0.8609, 0.9992, 0.2209]],

        [[0.4864, 0.2837, 0.7643],
         [0.4864, 0.2837, 0.7643],
         [0.4864, 0.2837, 0.7643],
         [0.4864, 0.2837, 0.7643]]])

打赏,谢谢~~

取消

感谢您的支持,我会继续努力的!

扫码支持
扫码打赏,多谢支持~

打开微信扫一扫,即可进行扫码打赏哦