迁移学习和预训练模型

简单来说,迁移学习意味着你需要训练有素的预训练模型来预测某种类,然后直接使用它或者只训练它的一小部分,以便预测另一种类。例如,您可以采用预先训练的模型来识别猫的类型,然后仅对狗的类型重新训练模型的小部分,然后使用它来预测狗的类型。

如果没有迁移学习,在大型数据集上训练一个巨大的模型需要几天甚至几个月。然而,通过迁移学习,通过采用预先训练的模型,并且仅训练最后几层,我们可以节省大量时间从头开始训练模型。

当您没有庞大的数据集时,迁移学习也很有用。在小型数据集上训练的模型可能无法检测在大型数据集上训练的模型可以检测到的特征。因此,通过迁移学习,即使数据集较小,也可以获得更好的模型。

在本文中,我们将采用预先训练的模型并对新对象进行训练。我们展示了带有图像的预训练模型的示例,并将它们应用于图像分类问题。您应该尝试找到其他预先训练的模型,并将它们应用于不同的问题,如对象检测,文本生成或机器翻译。本文将介绍以下主题:

  • ImageNet 数据集
  • 再训练或精调模型
  • COCO 动物数据集和预处理
  • 使用 TensorFlow 实现的 VGG16 预训练模型进行图像分类
  • TensorFlow中的图像预处理,用于预先训练的VGG16
  • 使用TensorFlow中的再训练的VGG16进行图像分类
  • 使用Keras中预先训练的VGG16进行图像分类
  • 使用Keras中的再训练的VGG16进行图像分类
  • 使用TensorFlow中的Inception v3进行图像分类
  • 使用TensorFlow中的再训练的Inception v3进行图像分类

ImageNet 数据集

根据 http://image-net.org

ImageNet是根据WordNet层次结构组织的图像数据集。WordNet中的每个有意义的概念(可能由多个单词或单词短语描述)称为同义词集或同义词集。

ImageNet有大约十万个同义词集,平均每个同义词集约有1,000个人工注释图像。ImageNet仅存储对图像的引用,而图像存储在互联网上的原始位置。在深度学习论文中,ImageNet-1K是指作为ImageNet的大规模视觉识别挑战(ILSVRC)的一部分发布的数据集,用于将数据集分类为1,000个类别:

可以在以下URL找到1,000个挑战类别:

http://image-net.org/challenges/LSVRC/2017/browse-synsets

http://image-net.org/challenges/LSVRC/2016/browse-synsets

http://image-net.org/challenges/LSVRC/2015/browse-synsets

http://image-net.org/challenges/LSVRC/2014/browse-synsets

http://image-net.org/challenges/LSVRC/2013/browse-synsets

http://image-net.org/challenges/LSVRC/2012/browse-synsets

http://image-net.org/challenges/LSVRC/2011/browse-synsets

http://image-net.org/challenges/LSVRC/2010/browse-synsets

我们编写了一个自定义函数来从Google下载ImageNet标签:

def build_id2label(self):
    base_url = 'https://raw.githubusercontent.com/tensorflow/models/master/research/inception/inception/data/'
    synset_url = '{}/imagenet_lsvrc_2015_synsets.txt'.format(base_url)
    synset_to_human_url = '{}/imagenet_metadata.txt'.format(base_url)

    filename, _ = urllib.request.urlretrieve(synset_url)
    synset_list = [s.strip() for s in open(filename).readlines()]
    num_synsets_in_ilsvrc = len(synset_list)
    assert num_synsets_in_ilsvrc == 1000
    filename, _ = urllib.request.urlretrieve(synset_to_human_url)
    synset_to_human_list = open(filename).readlines()
    num_synsets_in_all_imagenet = len(synset_to_human_list)

    assert num_synsets_in_all_imagenet == 21842
    synset2name = {}
    for s in synset_to_human_list:
        parts = s.strip().split('\t')
        assert len(parts) == 2
        synset = parts[0]
        name = parts[1]
        synset2name[synset] = name

    if self.n_classes == 1001:
        id2label={0:'empty'}
        id=1
    else:
        id2label = {}
        id=0

    for synset in synset_list:
        label = synset2name[synset]
        id2label[id] = label
        id += 1

    return id2label

我们将这些标签加载到我们的Jupyter笔记本中,如下所示:

### Load ImageNet dataset for labels
from datasetslib.imagenet import imageNet
inet = imageNet()
inet.load_data(n_classes=1000)
# n_classes is 1001 for Inception models and 1000 for VGG models

在ImageNet-1K数据集上训练过的热门预训练图像分类模型如下表所示:

模型名称Top-1精确度Top-5精确度Top-5误差率原始论文链接
AlexNet15.3%https://www.cs.toronto.edu/~fritz/absps/imagenet.pdf
Inception,
也即Inception V1
69.889.66.67%https://arxiv.org/abs/1409.4842
BN-Inception-v2,
也即Inception V2
73.991.84.9%https://arxiv.org/abs/1502.03167
Inception V378.093.93.46%https://arxiv.org/abs/1512.00567
Inception V480.295.2https://arxiv.org/abs/1602.07261
Inception-Resnet-V280.495.2https://arxiv.org/abs/1602.07261
VGG1671.589.87.4%https://arxiv.org/abs/1409.1556
VGG1971.189.87.3%https://arxiv.org/abs/1409.03385
ResNet V1 5075.292.27.24%https://arxiv.org/abs/1512.03385
ResNet V1 10176.492.9https://arxiv.org/abs/1512.03385
ResNet V1 15276.893.2https://arxiv.org/abs/1512.03385
ResNet V2 5075.692.8https://arxiv.org/abs/1603.05027
ResNet V2 10177.093.7https://arxiv.org/abs/1603.05027
ResNet V2 15277.894.1https://arxiv.org/abs/1603.05027
ResNet V2 20079.995.2https://arxiv.org/abs/1603.05027
Xception79.094.5https://arxiv.org/abs/1610.02357
MobileNet V1 versions41.3 到 70.766.2 到 89.5https://arxiv.org/abs/1704.04861

在上表中,Top-1和Top-5指标指的是模型在ImageNet验证数据集上的性能。

Google Research最近发布了一种名为MobileNets的新型号。MobileNets采用移动优先策略开发,牺牲了低资源使用的准确性。MobileNets旨在消耗低功耗并提供低延迟,以便在移动和嵌入式设备上提供更好的体验。Google为MobileNet模型提供了16个预先训练好的检查点文件,每个模型提供不同数量的参数和Multiply-Accumulates(MAC)。MAC和参数越高,资源使用和延迟就越高。因此,您可以在更高的准确性与更高的资源使用/延迟之间进行选择。

模型检查点百万 MAC百万参数Top-1 精确度Top-5 精确度
MobileNet_v1_1.0_2245694.2470.789.5
MobileNet_v1_1.0_1924184.2469.388.9
MobileNet_v1_1.0_1602914.2467.287.5
MobileNet_v1_1.0_1281864.2464.185.3
MobileNet_v1_0.75_2243172.5968.488.2
MobileNet_v1_0.75_1922332.5967.487.3
MobileNet_v1_0.75_1601622.5965.286.1
MobileNet_v1_0.75_1281042.5961.883.6
MobileNet_v1_0.50_2241501.3464.085.4
MobileNet_v1_0.50_1921101.3462.184.0
MobileNet_v1_0.50_160771.3459.982.5
MobileNet_v1_0.50_128491.3456.279.6
MobileNet_v1_0.25_224410.4750.675.0
MobileNet_v1_0.25_192340.4749.073.6
MobileNet_v1_0.25_160210.4746.070.7
MobileNet_v1_0.25_128140.4741.366.2

有关MobileNets的更多信息,请访问以下资源:

https://ai.googleblog.com/2017/06/mobilenets-open-source-models-for.html

https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet_v1.md

https://arxiv.org/pdf/1704.04861.pdf

再训练或精调模型

在像ImageNet这样的大型和多样化数据集上训练的模型能够检测和捕获一些通用特征,如曲线,边缘和形状。其中一些功能很容易适用于其他类型的数据集。因此,在迁移学习中,我们采用这样的通用模型,并使用以下一些技术来微调或再训练它们到我们的数据集:

  • 废除并替换最后一层:最常见的做法是删除最后一层并添加与我们的数据集匹配的新分类图层。例如,ImageNet模型使用1,000个类别进行训练,但我们的COCO动物数据集只有8个类别,因此我们删除了softmax图层,即使用softmax图层生成1,000个类别的概率的图层,换做生成8个类别的概率的图层。通常,当新数据集几乎与训练模型的数据集类似时使用此技术,因此仅需要重新训练最后一层。
  • 冻结前几层:另一种常见做法是冻结前几层,以便仅使用新数据集更新最后未冻结层的权重。我们将看到一个例子,我们冻结前15层,同时只重新训练最后10层。通常,当新数据集与训练模型的数据集非常不相似时使用此技术,因此不仅需要训练最后的层。
  • 调整超参数:你也可以在再训练前调整超参数,例如改变学习率或尝试不同的损失函数或不同的优化器。

TensorFlow 和 Keras 中都可以使用预训练模型。

我们将通过TensorFlow Slim展示我们的示例,TensorFlow Slim在编写本文时可以在文件夹tensorflow/models/research/slim/nets中找到几个预训练模型。我们将使用TensorFlow Slim来实例化预先训练的模型,然后从下载的检查点文件加载权重。然后,加载的模型将用于使用新数据集进行预测。然后我们将再训练模型以微调预测。

我们还将通过keras.applications模块中提供的Keras预训练模型演示迁移学习。虽然TensorFlow有大约20多种预训练模型,但keras.appplications只有以下7种预训练模型:

COCO 动物数据集和预处理图像

对于我们的例子,我们将使用COCO动物数据集,这是COCO数据集的一小部分,由斯坦福大学的研究人员提供,链接如下:http://cs231n.stanford.edu/coco-animals.zip。 COCO动物数据集有800个训练图像和200个动物类别的测试图像:熊,鸟,猫,狗,长颈鹿,马,绵羊和斑马。 该图像已为VGG16和Inception模型下载和预处理。

对于VGG模型,图像大小为224 x 224,预处理步骤如下:

  1. 图像大小调整为224 x 224,其功能类似于TensorFlow中的tf.image.resize_image_with_crop_or_pad函数。 我们实现了这个功能如下:
def resize_image(self,in_image:PIL.Image, new_width, new_height, crop_or_pad=True):
    img = in_image
    if crop_or_pad:
        half_width = img.size[0] // 2
        half_height = img.size[1] // 2
        half_new_width = new_width // 2
        half_new_height = new_height // 2
        img = img.crop((half_width-half_new_width, half_height-half_new_height, half_width+half_new_width, half_height+half_new_height))
    img = img.resize(size=(new_width, new_height))

    return img
  1. 调整大小后,将图像从PIL.Image转换为NumPy Array并检查图像是否具有深度通道,因为数据集中的某些图像仅为灰度。
img = self.pil_to_nparray(img)
if len(img.shape)==2:
    greyscale or no channels then add three channels
    h=img.shape[0]
    w=img.shape[1]
    img = np.dstack([img]*3)
  1. 然后我们从图像中减去VGG数据集平均值以使数据居中。我们将新训练图像的数据居中的原因是这些特征具有与用于降雨模型的初始数据类似的范围。通过在相似范围内制作特征,我们确保重新训练期间的渐变不会变得太高或太低。此外,通过使数据居中,学习过程变得更快,因为以零均值为中心的每个通道的梯度变得均匀。
means = np.array([[[123.68, 116.78, 103.94]]]) #shape=[1, 1, 3]
img = img - means

完整的预处理功能如下:

def preprocess_for_vgg(self,incoming, height, width):
    if isinstance(incoming, six.string_types):
        img = self.load_image(incoming)
    else:
        img=incoming
    img_size = vgg.vgg_16.default_image_size
    height = img_size
    width = img_size
    img = self.resize_image(img,height,width)
    img = self.pil_to_nparray(img)
    if len(img.shape)==2:
       # greyscale or no channels then add three channels
       h=img.shape[0]
       w=img.shape[1]
       img = np.dstack([img]*3)

    means = np.array([[[123.68, 116.78, 103.94]]]) #shape=[1, 1, 3]
    try:
        img = img - means
    except Exception as ex:
        print('Error preprocessing ',incoming)
        print(ex)

    return img

对于Inception模型,图像大小为299 x 299,预处理步骤如下:

  1. 图像大小调整为299 x 299,其功能类似于TensorFlow的tf.image.resize_image_with_crop_or_pad函数。 我们实现了之前在VGG预处理步骤中定义的此功能。
  2. 然后使用以下代码将图像缩放到范围(-1,+ 1):
img = ((img/255.0) - 0.5) * 2.0

完整的预处理功能如下:

def preprocess_for_inception(self,incoming):
    img_size = inception.inception_v3.default_image_size
    height = img_size
    width = img_size
    if isinstance(incoming, six.string_types):
        img = self.load_image(incoming)
    else:
        img=incoming
    img = self.resize_image(img,height,width)
    img = self.pil_to_nparray(img)
    if len(img.shape)==2:
        # greyscale or no channels then add three channels
        h=img.shape[0]
        w=img.shape[1]
        img = np.dstack([img]*3)
    img = ((img/255.0) - 0.5) * 2.0

    return img

让我们加载COCO动物数据集:

from datasetslib.coco import coco_animals
coco = coco_animals()
x_train_files, y_train, x_val_files, x_val = coco.load_data()

我们从验证集中的每个类中获取一个图像,以制作列表,获取x_test并预处理图像以生成列表images_test:

x_test = [x_val_files[25*x] for x in range(8)]
images_test=np.array([coco.preprocess_for_vgg(x) for x in x_test])

我们使用这个辅助函数来显示与图像相关的前五个类的图像和概率:

# helper function
def disp(images,id2label=None,probs=None,n_top=5,scale=False):
    if scale:
        imgs = np.abs(images + np.array([[[[123.68, 116.78, 103.94]]]]))/255.0
    else:
        imgs = images
    ids={}
    for j in range(len(images)):
        if scale:
            plt.figure(figsize=(5,5))
            plt.imshow(imgs[j])
        else:
            plt.imshow(imgs[j].astype(np.uint8) )
        plt.show()
        if probs is not None:
            ids[j] = [i[0] for i in sorted(enumerate(-probs[j]), key=lambda x:x[1])]
            for k in range(n_top):
                id = ids[j][k]
                print('Probability {0:1.2f}% of[{1:}]'.format(100*probs[j,id],id2label[id]))

上述函数中的以下代码恢复为预处理的效果,以便显示原始图像而不是预处理图像:

imgs = np.abs(images + np.array([[[[123.68, 116.78, 103.94]]]]))/255.0

在Inception模型的情况下,用于反转预处理的代码如下:

 imgs = (images / 2.0) + 0.5

您可以使用以下代码查看测试图像:

images=np.array([mpimg.imread(x) for x in x_test])
disp(images)

按照Jupyter笔记本中的代码查看图像。它们看起来都有不同的尺寸,所以让我们打印它们的原始尺寸:

print([x.shape for x in images])

以及维度为:

[(640, 425, 3), (373, 500, 3), (367, 640, 3), (427, 640, 3), (428, 640, 3), (426, 640, 3), (480, 640, 3), (612, 612, 3)]

让我们预处理测试图像并查看维度:

images_test=np.array([coco.preprocess_for_vgg(x) for x in x_test])
print(images_test.shape)

以及维度为:

(8, 224, 224, 3)

在Inception的情况下,维度是:

(8, 299, 299, 3)

Inception的预处理图像不可见,但让我们打印VGG的预处理图像,以了解它们的外观:

disp(images_test)

实际上图像会被裁剪,我们可以在保持裁剪的同时反转预处理时看到它们的样子:

现在我们已经有来自ImageNet的标签以及来自COCO图像数据集的图像和标签,我们试试迁移学习示例。

TensorFlow 中的 VGG16

您可以按照Jupyter笔记本ch-12a_VGG16_TensorFlow中的代码进行操作。

对于TensorFlow中VGG16的所有示例,我们首先从http://download.tensorflow.org/models/vgg_16_2016_08_28.tar.gz 下载检查点文件。并使用以下代码初始化变量:

model_name='vgg_16'
model_url='http://download.tensorflow.org/models/'
model_files=['vgg_16_2016_08_28.tar.gz']
model_home=os.path.join(models_root,model_name)

dsu.download_dataset(source_url=model_url, source_files=model_files, dest_dir = model_home, force=False, extract=True)

我们还定义了一些常见的导入和变量:

from tensorflow.contrib import slim
from tensorflow.contrib.slim.nets import vgg
image_height=vgg.vgg_16.default_image_size
image_width=vgg.vgg_16.default_image_size

使用TensorFlow中预先训练的VGG16进行图像分类

现在让我们首先尝试预测测试图像的类别,而不进行再训练。首先,我们清除默认图并定义图像的占位符:

tf.reset_default_graph()
x_p = tf.placeholder(shape=(None,image_height, image_width,3), dtype=tf.float32,name='x_p')

占位符 x_p 的形状为(?, 224, 224, 3)。接下来,加载 vgg16 模型:

with slim.arg_scope(vgg.vgg_arg_scope()):
    logits,_ = vgg.vgg_16(x_p,num_classes=inet.n_classes, is_training=False)

添加softmax图层以生成类的概率:

probabilities = tf.nn.softmax(logits)

定义初始化函数以恢复变量,例如检查点文件中的权重和偏差。

init = slim.assign_from_checkpoint_fn(os.path.join(model_home, '{}.ckpt'.format(model_name)), slim.get_variables_to_restore())

在TensorFlow会话中,初始化变量并运行概率张量以获取每个图像的概率:

with tf.Session() as tfs:
    init(tfs)
    probs = tfs.run([probabilities],feed_dict={x_p:images_test})
    probs=probs[0]

让我们看看我们得到的类别:

disp(images_test,id2label=inet.id2label,probs=probs,scale=True)
输入图像输出概率
Probability 99.15% of [zebra]
Probability 0.37% of [tiger cat]
Probability 0.33% of [tiger, Panthera tigris]
Probability 0.04% of [goose]
Probability 0.02% of [tabby, tabby cat]
Probability 99.50% of [horse cart, horse-cart]
Probability 0.37% of [plow, plough]
Probability 0.06% of [Arabian camel, dromedary, Camelus dromedarius]
Probability 0.05% of [sorrel]
Probability 0.01% of [barrel, cask]
Probability 19.32% of [Cardigan, Cardigan Welsh corgi]
Probability 11.78% of [papillon]
Probability 9.01% of [Shetland sheepdog, Shetland sheep dog, Shetland]
Probability 7.09% of [Siamese cat, Siamese]
Probability 6.27% of [Pembroke, Pembroke Welsh corgi]
Probability 97.09% of [chickadee]
Probability 2.52% of [water ouzel, dipper]
Probability 0.23% of [junco, snowbird]
Probability 0.09% of [hummingbird]
Probability 0.04% of [bulbul]
Probability 24.98% of [whippet]
Probability 16.48% of [lion, king of beasts, Panthera leo]
Probability 5.54% of [Saluki, gazelle hound]
Probability 4.99% of [brown bear, bruin, Ursus arctos]
Probability 4.11% of [wirehaired fox terrier]
Probability 98.56% of [brown bear, bruin, Ursus arctos]
Probability 1.40% of [American black bear, black bear, Ursus americanus, Euarctos americanus]
Probability 0.03% of [sloth bear, Melursus ursinus, Ursus
ursinus]
Probability 0.00% of [wombat]
Probability 0.00% of [beaver]
Probability 20.84% of [leopard, Panthera pardus]
Probability 12.81% of [cheetah, chetah, Acinonyx jubatus]
Probability 12.26% of [banded gecko]
Probability 10.28% of [jaguar, panther, Panthera onca, Felis
onca]t
Probability 5.30% of [gazelle]
Probability 8.09% of [shower curtain]
Probability 3.59% of [binder, ring-binder]
Probability 3.32% of [accordion, piano accordion, squeeze box]
Probability 3.12% of [radiator]
Probability 1.81% of [abaya]

从未见过我们数据集中的图像,并且对数据集中的类没有任何了解的预训练模型已正确识别斑马,马车,鸟和熊。它没能认出长颈鹿,因为它以前从未见过长颈鹿。我们将在我们的数据集上重新训练这个模型,只需要更少的工作量和800个图像的较小数据集大小。但在我们这样做之前,让我们看看在TensorFlow中进行相同的图像预处理。

TensorFlow中的图像预处理用于预训练的VGG16

我们为TensorFlow中的预处理步骤定义一个函数,如下所示:

def tf_preprocess(filelist):
    images=[]
    for filename in filelist:
        image_string = tf.read_file(filename)
        image_decoded = tf.image.decode_jpeg(image_string, channels=3)
        image_float = tf.cast(image_decoded, tf.float32)
        resize_fn = tf.image.resize_image_with_crop_or_pad
        image_resized = resize_fn(image_float, image_height, image_width)
        means = tf.reshape(tf.constant([123.68, 116.78, 103.94]), [1, 1, 3])
        image = image_resized - means
        images.append(image)
    images = tf.stack(images)
    return images

在这里,我们创建images变量而不是占位符:

images=tf_preprocess([x for x in x_test])

我们按照与以前相同的过程来定义VGG16模型,恢复变量然后运行预测:

with slim.arg_scope(vgg.vgg_arg_scope()):
    logits,_ = vgg.vgg_16(images, num_classes=inet.n_classes, is_training=False)
    probabilities = tf.nn.softmax(logits)

    init = slim.assign_from_checkpoint_fn(os.path.join(model_home, '{}.ckpt'.format(model_name)), slim.get_variables_to_restore())

我们获得与以前相同的类概率。我们只是想证明预处理也可以在TensorFlow中完成。 但是,TensorFlow中的预处理仅限于TensorFlow提供的功能,并将您与框架深深联系在一起。

我们建议您将预处理管道与TensorFlow 魔性训练和预测代码分开。保持独立使其具有模块化并具有其他优势,例如您可以保存数据以便在多个模型中重复使用。

使用TensorFlow中的再训练的VGG16进行图像分类

现在,我们将为COCO动物数据集重新训练VGG16模型。让我们从定义三个占位符开始:

  • is_training 占位符指定我们使用模型用于训练还是预测。
  • x_p 是用于输入的占位符,形状为(None, 图像高, 图像宽, 3)。
  • y_p 是用于输出的占位符,形状为(None, 1)。
is_training = tf.placeholder(tf.bool,name='is_training')
x_p = tf.placeholder(shape=(None,image_height, image_width,3), type=tf.float32,name='x_p')
y_p = tf.placeholder(shape=(None,1),dtype=tf.int32,name='y_p')

正如我们在策略部分中所解释的那样,除了最后一层(称为vgg / fc8层)之外,我们将从检查点文件中恢复图层:

with slim.arg_scope(vgg.vgg_arg_scope()):
    logits, _ = vgg.vgg_16(x_p,num_classes=coco.n_classes, is_training=is_training)

probabilities = tf.nn.softmax(logits)
# restore except last last layer fc8
fc7_variables=tf.contrib.framework.get_variables_to_restore(exclude=['vgg_16/fc8'])
fc7_init = tf.contrib.framework.assign_from_checkpoint_fn(os.path.join(model_home, '{}.ckpt'.format(model_name)), fc7_variables)

接下来,定义要初始化但未恢复的最后一个层的变量:

# fc8 layer
fc8_variables = tf.contrib.framework.get_variables('vgg_16/fc8')
fc8_init = tf.variables_initializer(fc8_variables)

正如我们在前面的文章中所学到的,使用tf.losses.sparse_softmax_cross_entropy()定义损失函数。

tf.losses.sparse_softmax_cross_entropy(labels=y_p, logits=logits)
loss = tf.losses.get_total_loss()

训练最后一层几个周期,然后训练整个网络几层。因此,定义两个单独的优化器和训练操作。

learning_rate = 0.001
fc8_optimizer = tf.train.GradientDescentOptimizer(learning_rate)
fc8_train_op = fc8_optimizer.minimize(loss, var_list=fc8_variables)

full_optimizer = tf.train.GradientDescentOptimizer(learning_rate)
full_train_op = full_optimizer.minimize(loss)

我们决定对两个优化器函数使用相同的学习速率,但如果您决定进一步调整超参数,则可以定义单独的学习速率。

像往常一样定义精度函数:

y_pred = tf.to_int32(tf.argmax(logits, 1))
n_correct_pred = tf.equal(y_pred, y_p)
accuracy = tf.reduce_mean(tf.cast(n_correct_pred, tf.float32))

最后,我们运行最后一层10个周期的训练,然后使用批量大小为32的10个周期的完整网络。我们还使用相同的会话来预测类:

fc8_epochs = 10
full_epochs = 10
coco.y_onehot = False
coco.batch_size = 32
coco.batch_shuffle = True

total_images = len(x_train_files)
n_batches = total_images // coco.batch_size

with tf.Session() as tfs:
    fc7_init(tfs)
    tfs.run(fc8_init)
    for epoch in range(fc8_epochs):
        print('Starting fc8 epoch ',epoch)
        coco.reset_index()
        epoch_accuracy=0
        for batch in range(n_batches):
            x_batch, y_batch = coco.next_batch()
            images=np.array([coco.preprocess_for_vgg(x) \
                for x in x_batch])
            feed_dict={x_p:images,y_p:y_batch,is_training:True}
            tfs.run(fc8_train_op, feed_dict = feed_dict)
            feed_dict={x_p:images,y_p:y_batch,is_training:False}
            batch_accuracy = tfs.run(accuracy,feed_dict=feed_dict)
            epoch_accuracy += batch_accuracy
            except Exception as ex:
                epoch_accuracy /= n_batches
                print('Train accuracy in epoch {}:{}' .format(epoch,epoch_accuracy))
        for epoch in range(full_epochs):
            print('Starting full epoch ',epoch)
            coco.reset_index()
            epoch_accuracy=0
            for batch in range(n_batches):
                x_batch, y_batch = coco.next_batch()
                images=np.array([coco.preprocess_for_vgg(x) \
                    for x in x_batch])
                feed_dict={x_p:images,y_p:y_batch,is_training:True}
                tfs.run(full_train_op, feed_dict = feed_dict )
                feed_dict={x_p:images,y_p:y_batch,is_training:False}
                batch_accuracy = tfs.run(accuracy,feed_dict=feed_dict)
                epoch_accuracy += batch_accuracy
            epoch_accuracy /= n_batches
            print('Train accuracy in epoch {}:{}' .format(epoch,epoch_accuracy))
        # now run the predictions
        feed_dict={x_p:images_test,is_training: False}
        probs = tfs.run([probabilities],feed_dict=feed_dict)
        probs=probs[0]

让我们看看打印我们的预测结果:

disp(images_test,id2label=coco.id2label,probs=probs,scale=True)
输入图像输出概率
Probability 100.00% of [zebra]
Probability 100.00% of [horse]
Probability 98.88% of [cat]
Probability 100.00% of [bird]
Probability 68.88% of [bear]
Probability 31.06% of [sheep]
Probability 0.02% of [dog]
Probability 0.02% of [bird]
Probability 0.01% of [horse]
Probability 100.00% of [bear]
Probability 0.00% of [dog]
Probability 0.00% of [bird]
Probability 0.00% of [sheep]
Probability 0.00% of [cat]
Probability 100.00% of [giraffe]
Probability 61.36% of [cat]
Probability 16.70% of [dog]
Probability 7.46% of [bird]
Probability 5.34% of [bear]
Probability 3.65% of [giraffe]

它正确识别了猫和长颈鹿,并将其他概率提高到100%。它仍然犯了一些错误,因为最后一张照片被归类为猫,这实际上是裁剪后的噪音图片。我们把对结果进行改进的任务留给你。

Keras 中的 VGG16

您可以按照Jupyter笔记本ch-12a_VGG16_Keras中的代码进行操作。

现在让我们对Keras进行相同的分类和再培训。您将看到我们可以轻松地使用较少量的代码在Keras中使用VGG16预训练模型。

使用Keras中预先训练的VGG16进行图像分类

加载模型是一个单行操作:

from keras.applications import VGG16
model=VGG16(weights='imagenet')

我们可以使用这个模型来预测类的概率:

probs = model.predict(images_test)

以下是此分类的结果:

输入图像输出概率
Probability 99.41% of [zebra]
Probability 0.19% of [tiger cat]
Probability 0.13% of [goose]
Probability 0.09% of [tiger, Panthera tigris]
Probability 0.02% of [mushroom]
Probability 87.50% of [horse cart, horse-cart]
Probability 5.58% of [Arabian camel, dromedary, Camelus dromedarius]
Probability 4.72% of [plow, plough]
Probability 1.03% of [dogsled, dog sled, dog sleigh]
Probability 0.31% of [wreck]
Probability 34.96% of [Siamese cat, Siamese]
Probability 12.71% of [toy terrier]
Probability 10.15% of [Boston bull, Boston terrier]
Probability 6.53% of [Italian greyhound]
Probability 6.01% of [Cardigan, Cardigan Welsh corgi]
Probability 56.41% of [junco, snowbird]
Probability 38.08% of [chickadee]
Probability 1.93% of [bulbul]
Probability 1.35% of [hummingbird]
Probability 1.09% of [house finch, linnet, Carpodacus mexicanus]
Probability 54.19% of [brown bear, bruin, Ursus arctos]
Probability 28.07% of [lion, king of beasts, Panthera leo]
Probability 0.87% of [Norwich terrier]
Probability 0.82% of [Lakeland terrier]
Probability 0.73% of [wild boar, boar, Sus scrofa]
Probability 88.64% of [brown bear, bruin, Ursus arctos]
Probability 7.22% of [American black bear, black bear, Ursus americanus, Euarctos americanus]
Probability 4.13% of [sloth bear, Melursus ursinus, Ursus
ursinus]
Probability 0.00% of [badger]
Probability 0.00% of [wombat]
Probability 38.70% of [jaguar, panther, Panthera onca, Felis onca]
Probability 33.78% of [leopard, Panthera pardus]
Probability 14.22% of [cheetah, chetah, Acinonyx jubatus]
Probability 6.15% of [banded gecko]
Probability 1.53% of [snow leopard, ounce, Panthera uncia]
Probability 12.54% of [shower curtain]
Probability 2.82% of [binder, ring-binder]
Probability 2.28% of [toilet tissue, toilet paper, bathroom tissue]
Probability 2.12% of [accordion, piano accordion, squeeze box]
Probability 2.05% of [bath towel]

它无法识别绵羊、长颈鹿以及狗的图像被裁剪出来的最后一张噪音图像。 现在,让我们用我们的数据集重新训练Keras中的模型。

使用Keras中的再训练VGG16进行图像分类

让我们使用COCO图像数据集来重新训练模型以微调分类任务。我们将删除Keras模型中的最后一层,并添加我们自己的完全连接层和softmax激活8个类。我们还将通过将前15层的可训练属性设置为False来演示冻结前几层。

  1. 首先导入VGG16模型而不使用顶层变量,方法是将include_top设置为False:
# load the vgg model
from keras.applications import VGG16
base_model=VGG16(weights='imagenet',include_top=False, input_shape=(224,224,3))

我们还在上面的代码中指定了input_shape,否则Keras会在以后抛出异常。

  1. 现在我们构建分类器模型以置于导入的VGG模型之上:
top_model = Sequential()
top_model.add(Flatten(input_shape=base_model.output_shape[1:]))
top_model.add(Dense(256, activation='relu'))
top_model.add(Dropout(0.5))
top_model.add(Dense(coco.n_classes, activation='softmax'))
  1. 接下来,在VGG基础之上添加模型:
model=Model(inputs=base_model.input, outputs=top_model(base_model.output))
  1. 冻结前15层:
for layer in model.layers[:15]:
    layer.trainable = False
  1. 我们随机挑选了15层冻结,你可能想要把玩这个数字。让我们编译模型并打印模型摘要:
model.compile(loss='categorical_crossentropy', optimizer=optimizers.SGD(lr=1e-4, momentum=0.9), metrics=['accuracy'])
model.summary()
层(类型)输出形状参数 #
input_1(输入层)(None, 224, 224, 3)0
block1_conv1(Conv2D)(None, 224, 224, 64)1792
block1_conv2(Conv2D)(None, 224, 224, 64)36928
block1_pool(MaxPooling2D)(None, 112, 112, 64)0
block2_conv1(Conv2D)(None, 112, 112, 128)73856
block2_conv2(Conv2D)(None, 112, 112, 128)147584
block2_pool(MaxPooling2D)(None, 56, 56, 128)0
block3_conv1(Conv2D)(None, 56, 56, 256)295168
block3_conv2(Conv2D)(None, 56, 56, 256)590080
block3_conv3(Conv2D)(None, 56, 56 ,256)590080
block3_pool(MaxPooling2D)(None, 28, 28, 256)0
block4_conv1(Conv2D)(None, 28, 28, 512)1180160
block4_conv2(Conv2D)(None, 28, 28, 512)2359808
block4_conv3(Conv2D)(None, 28, 28, 512)2359808
block4_pool(MaxPooling2D)(None, 14, 14, 512)0
block5_conv1(Conv2D)(None, 14, 14, 512)2359808
block5_conv2(Conv2D)(None, 14, 14, 512)2359808
block5_conv3(Conv2D)(None, 14, 14, 512)2359808
block5_pool(MaxPooling2D)(None, 7, 7, 512)0
sequential_1(Sequential)(None, 8)6424840
Total params: 21,139,528
Trainable params: 13,504,264
Non-trainable params: 7,635,264

我们看到近40%的参数是冻结的和不可训练的。

  1. 接下来,训练Keras模型20个周期,批量大小为32:
from keras.utils import np_utils

batch_size=32
n_epochs=20

total_images = len(x_train_files)
n_batches = total_images // batch_size
for epoch in range(n_epochs):
    print('Starting epoch ',epoch)
    coco.reset_index_in_epoch()
    for batch in range(n_batches):
        try:
            x_batch, y_batch = coco.next_batch(batch_size=batch_size)
            images=np.array([coco.preprocess_image(x) for x in x_batch])
            y_onehot = np_utils.to_categorical(y_batch, num_classes=coco.n_classes)
            model.fit(x=images,y=y_onehot,verbose=0)
        except Exception as ex:
           print('error in epoch {} batch {}'.format(epoch,batch))
           print(ex)
  1. 让我们使用新重新训练的模型对图像进行分类:
probs = model.predict(images_test)

以下是分类结果:

输入图像输出概率
Probability 100.00% of [zebra]
Probability 0.00% of [dog]
Probability 0.00% of [horse]
Probability 0.00% of [giraffe]
Probability 0.00% of [bear]
Probability 96.11% of [horse]
Probability 1.85% of [cat]
Probability 0.77% of [bird]
Probability 0.43% of [giraffe]
Probability 0.40% of [sheep]
Probability 99.75% of [dog]
Probability 0.22% of [cat]
Probability 0.03% of [horse]
Probability 0.00% of [bear]
Probability 0.00% of [zebra]
Probability 99.88% of [bird]
Probability 0.11% of [horse]
Probability 0.00% of [giraffe]
Probability 0.00% of [bear]
Probability 0.00% of [cat]
Probability 65.28% of [bear]
Probability 27.09% of [sheep]
Probability 4.34% of [bird]
Probability 1.71% of [giraffe]
Probability 0.63% of [dog]
Probability 100.00% of [bear]
Probability 0.00% of [sheep]
Probability 0.00% of [dog]
Probability 0.00% of [cat]
Probability 0.00% of [giraffe]
Probability 100.00% of [giraffe]
Probability 0.00% of [bird]
Probability 0.00% of [bear]
Probability 0.00% of [sheep]
Probability 0.00% of [zebra]
Probability 81.05% of [cat]
Probability 15.68% of [dog]
Probability 1.64% of [bird]
Probability 0.90% of [horse]
Probability 0.43% of [bear]

除最后一个噪声图像外,所有类都已正确识别。通过适当的超参数调整,也可以进行改进。

到目前为止,您已经看到了使用预训练模型进行分类并对预训练模型进行微调的示例。接下来,我们将使用Inception v3模型显示分类示例。

TensorFlow 中的 Inception V3

您可以按照Jupyter笔记本ch-12c_InceptionV3_TensorFlow中的代码进行操作。

TensorFlow的Inception v3在1,001个标签上训练,而不是1,000个。此外,用于训练的图像被不同地预处理。我们在前面的部分中展示了预处理代码。让我们直接深入了解使用TensorFlow恢复Inception v3模型。

让我们先下载 Inception V3 的检查点文件:

# load the inception V3 model
model_name='inception_v3'
model_url='http://download.tensorflow.org/models/'
model_files=['inception_v3_2016_08_28.tar.gz']
model_home=os.path.join(models_root,model_name)

dsu.download_dataset(source_url=model_url, source_files=model_files, dest_dir = model_home, force=False, extract=True)

定义初始模块和变量的常见导入:

### define common imports and variables
from tensorflow.contrib.slim.nets import inception
image_height=inception.inception_v3.default_image_size
image_width=inception.inception_v3.default_image_size

使用TensorFlow中的Inception v3进行图像分类

图像分类与使用VGG 16模型的上一节中说明的相同。Inception v3模型的完整代码如下:

x_p = tf.placeholder(shape=(None, image_height, image_width, 3 ), dtype=tf.float32, name='x_p')
with slim.arg_scope(inception.inception_v3_arg_scope()):
    logits,_ = inception.inception_v3(x_p, num_classes=inet.n_classes, is_training=False)

probabilities = tf.nn.softmax(logits)

init = slim.assign_from_checkpoint_fn(os.path.join(model_home, '{}.ckpt'.format(model_name)), slim.get_variables_to_restore())

with tf.Session() as tfs:
    init(tfs)
    probs = tfs.run([probabilities],feed_dict={x_p:images_test})
    probs=probs[0]

让我们看看我们的模型如何处理测试图像:

输入图像输出概率
Probability 95.15% of [zebra]
Probability 0.07% of [ostrich, Struthio camelus]
Probability 0.07% of [hartebeest]
Probability 0.03% of [sock]
Probability 0.03% of [warthog]
Probability 93.09% of [horse cart, horse-cart]
Probability 0.47% of [plow, plough]
Probability 0.07% of [oxcart]
Probability 0.07% of [seashore, coast, seacoast, sea-coast]
Probability 0.06% of [military uniform]
Probability 18.94% of [Cardigan, Cardigan Welsh corgi]
Probability 8.19% of [Pembroke, Pembroke Welsh corgi]
Probability 7.86% of [studio couch, day bed]
Probability 5.36% of [English springer, English springer spaniel]
Probability 4.16% of [Border collie]
Probability 27.18% of [water ouzel, dipper]
Probability 24.38% of [junco, snowbird]
Probability 6.91% of [chickadee]
Probability 0.99% of [magpie]
Probability 0.73% of [brambling, Fringilla montifringilla]
Probability 93.00% of [hog, pig, grunter, squealer, Sus scrofa]
Probability 2.23% of [wild boar, boar, Sus scrofa]
Probability 0.65% of [ram, tup]
Probability 0.43% of [ox]
Probability 0.23% of [marmot]
Probability 84.27% of [brown bear, bruin, Ursus arctos]
Probability 1.57% of [American black bear, black bear, Ursus americanus, Euarctos americanus]
Probability 1.34% of [sloth bear, Melursus ursinus, Ursus ursinus]
Probability 0.13% of [lesser panda, red panda, panda, bear cat, cat bear, Ailurus fulgens]
Probability 0.12% of [ice bear, polar bear, Ursus Maritimus, Thalarctos maritimus]
Probability 20.20% of [honeycomb]
Probability 6.52% of [gazelle]
Probability 5.14% of [sorrel]
Probability 3.72% of [impala, Aepyceros melampus]
Probability 2.44% of [Saluki, gazelle hound]
Probability 41.17% of [harp]
Probability 13.64% of [accordion, piano accordion, squeeze box]
Probability 2.97% of [window shade]
Probability 1.59% of [chain]
Probability 1.55% of [pay-phone, pay-station]

虽然它在与VGG模型几乎相同的地方失败了,但并不算太糟糕。现在让我们用COCO动物图像和标签重新训练这个模型。

使用TensorFlow中的再训练的Inception v3进行图像分类

初始v3的再训练与VGG16不同,因为我们使用softmax激活层作为输出,并使用tf.losses.softmax_cross_entropy()作为损失函数。

  1. 首先定义占位符:
is_training = tf.placeholder(tf.bool,name='is_training')
x_p = tf.placeholder(shape=(None, image_height, image_width, 3 ), dtype=tf.float32, name='x_p')
y_p = tf.placeholder(shape=(None,coco.n_classes), dtype=tf.int32, name='y_p')
  1. 接下来,加载模型:
with slim.arg_scope(inception.inception_v3_arg_scope()):
    logits,_ = inception.inception_v3(x_p, num_classes=coco.n_classes, is_training=True )

probabilities = tf.nn.softmax(logits)
  1. 接下来,定义函数以恢复除最后一层之外的变量:
with slim.arg_scope(inception.inception_v3_arg_scope()):
    logits,_ = inception.inception_v3(x_p, num_classes=coco.n_classes, is_training=True )

probabilities = tf.nn.softmax(logits)
# restore except last layer
checkpoint_exclude_scopes=["InceptionV3/Logits", "InceptionV3/AuxLogits"]
exclusions = [scope.strip() for scope in checkpoint_exclude_scopes]

variables_to_restore = []
for var in slim.get_model_variables():
    excluded = False
    for exclusion in exclusions:
        if var.op.name.startswith(exclusion):
            excluded = True
            break
    if not excluded:
        variables_to_restore.append(var)

init_fn = slim.assign_from_checkpoint_fn(
    os.path.join(model_home, '{}.ckpt'.format(model_name)), variables_to_restore)
  1. 定义损失函数、优化器和训练操作:
tf.losses.softmax_cross_entropy(onehot_labels=y_p, ogits=logits)
loss = tf.losses.get_total_loss()
learning_rate = 0.001
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
train_op = optimizer.minimize(loss)
  1. 训练模型并在同一会话中完成训练后运行预测:
n_epochs=10
coco.y_onehot = True
coco.batch_size = 32
coco.batch_shuffle = True
total_images = len(x_train_files)
n_batches = total_images // coco.batch_size

with tf.Session() as tfs:
    tfs.run(tf.global_variables_initializer())
    init_fn(tfs)
    for epoch in range(n_epochs):
        print('Starting epoch ',epoch)
        coco.reset_index()
        epoch_accuracy=0
        epoch_loss=0
        for batch in range(n_batches):
            x_batch, y_batch = coco.next_batch()
            images=np.array([coco.preprocess_for_inception(x) \
                for x in x_batch])
            feed_dict={x_p:images,y_p:y_batch,is_training:True}
            batch_loss,_ = tfs.run([loss,train_op], feed_dict = feed_dict)
            epoch_loss += batch_loss
        epoch_loss /= n_batches
        print('Train loss in epoch {}:{}' .format(epoch,epoch_loss))
    # now run the predictions
    feed_dict={x_p:images_test,is_training: False}
    probs = tfs.run([probabilities],feed_dict=feed_dict)
    probs=probs[0]

我们看到每个周期的损失都在减少:

INFO:tensorflow:Restoring parameters from
/home/armando/models/inception_v3/inception_v3.ckpt
Starting epoch 0
Train loss in epoch 0:2.7896385192871094
Starting epoch 1
Train loss in epoch 1:1.6651896286010741
Starting epoch 2
Train loss in epoch 2:1.2332031989097596
Starting epoch 3
Train loss in epoch 3:0.9912329530715942
Starting epoch 4
Train loss in epoch 4:0.8110128355026245
Starting epoch 5
Train loss in epoch 5:0.7177265572547913
Starting epoch 6
Train loss in epoch 6:0.6175705575942994
Starting epoch 7
Train loss in epoch 7:0.5542363750934601
Starting epoch 8
Train loss in epoch 8:0.523461252450943
Starting epoch 9
Train loss in epoch 9:0.4923107647895813

这次结果正确识别了绵羊,但错误地将猫图片识别为狗:

输入图像输出概率
Probability 98.84% of [zebra]
Probability 0.84% of [giraffe]
Probability 0.11% of [sheep]
Probability 0.07% of [cat]
Probability 0.06% of [dog]
Probability 95.77% of [horse]
Probability 1.34% of [dog]
Probability 0.89% of [zebra]
Probability 0.68% of [bird]
Probability 0.61% of [sheep]
Probability 94.83% of [dog]
Probability 4.53% of [cat]
Probability 0.56% of [sheep]
Probability 0.04% of [bear]
Probability 0.02% of [zebra]
Probability 42.80% of [bird]
Probability 25.64% of [cat]
Probability 15.56% of [bear]
Probability 8.77% of [giraffe]
Probability 3.39% of [sheep]
Probability 72.58% of [sheep]
Probability 8.40% of [bear]
Probability 7.64% of [giraffe]
Probability 4.02% of [horse]
Probability 3.65% of [bird]
Probability 98.03% of [bear]
Probability 0.74% of [cat]
Probability 0.54% of [sheep]
Probability 0.28% of [bird]
Probability 0.17% of [horse]
Probability 96.43% of [giraffe]
Probability 1.78% of [bird]
Probability 1.10% of [sheep]
Probability 0.32% of [zebra]
Probability 0.14% of [bear]
Probability 34.43% of [horse]
Probability 23.53% of [dog]
Probability 16.03% of [zebra]
Probability 9.76% of [cat]
Probability 9.02% of [giraffe]

总结

迁移学习是一项伟大的发现,它允许我们通过将在较大数据集中训练的模型应用于不同的数据集来节省时间。当数据集很小时,迁移学习也有助于热启动训练过程。在本文中,我们学习了如何使用预训练模型(如VGG16和Inception v3)将不同数据集中的图像分类为它们所训练的数据集。我们还学习了如何使用TensorFlow和Keras中的示例重新训练预训练模型,以及如何预处理图像以供给两个模型。

我们还了解到有几种模型在ImageNet数据集上进行了培训。尝试查找在不同数据集上训练的其他模型,例如视频数据集,语音数据集或文本/NLP数据集。尝试使用这些模型重新训练并在您自己的数据集中使用您自己的深度学习问题。

此站点使用Akismet来减少垃圾评论。了解我们如何处理您的评论数据