# COCO 数据集中目标检测标注说明

coco 数据集有五种标注类型，分别是“目标检测”、“关键点检测”、“素材分割”、“全景分割”和“图像说明”。标注信息使用 JSON 格式存储。请注意，下载页面上描述的 COCO API 可用于访问和操作所有“标注”。 所有“标注”都具有下面相同的基本数据结构：

```{
"info" : info,
"images" : [image],
"annotations" : [annotation],
}

info{
"year" : int,
"version" : str,
"description" : str,
"contributor" : str,
"url" : str,
"date_created" : datetime,
}

image{
"id" : int,
"width" : int,
"height" : int,
"file_name" : str,
"flickr_url" : str,
"coco_url" : str,
"date_captured" : datetime,
}

"id" : int,
"name" : str,
"url" : str,
}```

## 一、对象检测

```annotation{
"id" : int,
"image_id" : int,
"category_id" : int,
"segmentation" : RLE or [polygon],
"area" : float,
"bbox" : [x,y,width,height],
"iscrowd" : 0 or 1,
}

categories[{
"id" : int,
"name" : str,
"supercategory" : str,
}]```

TODO:其它四种格式的说明。

## 二、对象检测示例

COCO 数据集提供了应用程序编程接口和使用实例，并托管在了 GitHub 仓库中。

## 三、对象检测评估

COCO数据集允许将检测结果上传到服务器进行评估。

### 1. 度量

• 平均精度（Average Precision, AP）
• $AP$ 平均精度在 IoU=.50:.05:.95 （主要度量挑战）
• $AP^{IoU=.50}$ 平均精度在 IoU=.50 （PASCAL VOC 度量）
• $AP^{IoU=.75}$ 平均精度在 IoU=.75 （严格度量）
• 不同尺度的平均精度（Average Precision Across Scale, APAS）
• $AP^{small}$ 小型对象平均精度，面积小于 $32^2$
• $AP^{medium}$ 中等对象平均精度，面积介于 $32^2$$96^2$ 之间
• $AP^{large}$ 大型对象平均精度，面积超过 $96^2$
• 平均重复调用次数（Average Recall, AR）
• $AR^{max=1}$ 每张图片检测 1 次
• $AR^{max=10}$ 每张图片检测 10 次
• $AR^{max=100}$ 每张图片检测 100 次
• 不同尺度的平均重复调用次数（Average Recall Across Scale, ARAS）
• $AR^{small}$ 小型对象平均重复调用次数，面积小于 $32^2$
• $AR^{medium}$ 中等对象平均重复调用次数，面积介于 $32^2$$96^2$ 之间
• $AR^{large}$ 大型对象平均重复调用次数，面积超过 $96^2$

### 2. 评估代码

```params{
"imgIds" : [all] N img ids to use for evaluation
"catIds" : [all] K cat ids to use for evaluation
"iouThrs" : [.5:.05:.95] T=10 IoU thresholds for evaluation
"recThrs" : [0:.01:1] R=101 recall thresholds for evaluation
"areaRng" : [all,small,medium,large] A=4 area ranges for evaluation
"maxDets" : [1 10 100] M=3 thresholds on max detections per image
"useSegm" : [1] if true evaluate against ground-truth segments
"useCats" : [1] if true use category labels for evaluation
}```

```evalImgs[{
"dtIds" : [1xD] id for each of the D detections (dt)
"gtIds" : [1xG] id for each of the G ground truths (gt)
"dtImgIds" : [1xD] image id for each dt
"gtImgIds" : [1xG] image id for each gt
"dtMatches" : [TxD] matching gt id at each IoU or 0
"gtMatches" : [TxG] matching dt id at each IoU or 0
"dtScores" : [1xD] confidence of each dt
"dtIgnore" : [TxD] ignore flag for each dt at each IoU
"gtIgnore" : [1xG] ignore flag for each gt
}]```

```eval{
"params" : 用于评估的参数
"date" : 执行评估的日期
"counts" : [T,R,K,A,M] 参数维度（见上）
"precision" : [TxRxKxAxM] 每次评估设置的精度
"recall" : [TxKxAxM] 每次评估设置的最多重复调用次数
}```

### 3. 分析代码

1. C75: PR at IoU=.75 (AP at strict IoU), area under curve corresponds to APIoU=.75 metric.
2. C50: PR at IoU=.50 (AP at PASCAL IoU), area under curve corresponds to APIoU=.50 metric.
3. Loc: PR at IoU=.10 (localization errors ignored, but not duplicate detections). All remaining settings use IoU=.1.
4. Sim: PR after supercategory false positives (fps) are removed. Specifically, any matches to objects with a different class label but that belong to the same supercategory don’t count as either a fp (or tp). Sim is computed by setting all objects in the same supercategory to have the same class label as the class in question and setting their ignore flag to 1. Note that person is a singleton supercategory so its Sim result is identical to Loc.
5. Oth: PR after all class confusions are removed. Similar to Sim, except now if a detection matches any other object it is no longer a fp (or tp). Oth is computed by setting all other objects to have the same class label as the class in question and setting their ignore flag to 1.
6. BG: PR after all background (and class confusion) fps are removed. For a single category, BG is a step function that is 1 until max recall is reached then drops to 0 (the curve is smoother after averaging across categories).
7. FN: 删除所有剩余错误后的PR（通常AP = 1）。