機械学習・AI【物体検出】vol.15 ：Darknet YOLOv3→YOLOv4の変更点（私家版）

【物体検出】vol.15 ：Darknet YOLOv3→YOLOv4の変更点（私家版）

妙な意訳が嫌な方は、AlexeyAB氏のGithubの方をご覧ください。

※変更点のメモです。

後に、自分で手を動かして確認した部分について追記していきます。

Improvements in this repository　の不完全意訳

・developed State-of-the-Art object detector YOLOv4

　最新の物体検出YOLOv4

・added State-of-Art models: CSP, PRN, EfficientNet
　最新のモデル、
　　　　CSP：　
　　　　PRN：
　　　　EfficientNet：

・added layers: [conv_lstm], [scale_channels] SE/ASFF/BiFPN, [local_avgpool], [sam], [Gaussian_yolo], [reorg3d] (fixed [reorg]), fixed [batchnorm]
　レイヤの追加
　　　　[conv_lstm]
　　　　[scale_channels] SE/ASFF/BiFPN,
　　　　[local_avgpool],
　　　　[sam]
　　　　[Gaussian_yolo]
　　　　[reorg3d] (fixed [reorg])
　　　　fixed [batchnorm]

・added the ability for training recurrent models (with layers conv-lstm[conv_lstm]/conv-rnn[crnn]) for accurate detection on video
　動画検出のためのリカレントモデル学習に対応

・added data augmentation: [net] mixup=1 cutmix=1 mosaic=1 blur=1. Added activations: SWISH, MISH, NORM_CHAN, NORM_CHAN_SOFTMAX
　データの水増しに対応　　
　　　　mixup：
　　　　cutmix：
　　　　mosaic：
　　　　blur：

yolov4.cfgを見るとmosaicはデフォルトになってます。（一般的に効くってことなんでしょうね）

・added the ability for training with GPU-processing using CPU-RAM to increase the mini_batch_size and increase accuracy (instead of batch-norm sync)
　CPU-RAMを使ったGPU演算で学習が可能に
　mini_batch_sizeを増加させられることでaccuracyが向上する

・improved binary neural network performance 2x-4x times for Detection on CPU and GPU if you trained your own weights by using this XNOR-net model (bit-1 inference) : https://github.com/AlexeyAB/darknet/blob/master/cfg/yolov3-tiny_xnor.cfg
　XNOR-netモデルを使った独自モデルのCPUとGPUによる検出でバイナリネットワークパフォーマンスが2-4倍に改善。

・improved neural network performance ~7% by fusing 2 layers into 1: Convolutional + Batch-norm
　ニューラルネットワークのパフォーマンスが最大7%向上（畳み込みレイヤとBatch-norm）

・improved performance: Detection 2x times, on GPU Volta/Turing (Tesla V100, GeForce RTX, ...) using Tensor Cores if CUDNN_HALF defined in the Makefile or darknet.sln
　TensorCoreに対応したGPU(TeslaV100やGeForceRTX)で検出速度が二倍に改善。（CUDNN_HALFを有効にしてコンパイル）

・improved performance ~1.2x times on FullHD, ~2x times on 4K, for detection on the video (file/stream) using darknet detector demo...
　darknet detector demoコマンドでの動画検出が、FullHD動画で1.2倍、4K動画で2倍に向上

・improved performance 3.5 X times of data augmentation for training (using OpenCV SSE/AVX functions instead of hand-written functions) - removes bottleneck for training on multi-GPU or GPU Volta
　OpenCV SSE/AVXによっ学習時のデータ水増パフォーマンスが3.5倍に向上。（Voltaやmulti-GPU環境での学習のボトルネックを解消）

・improved performance of detection and training on Intel CPU with AVX (Yolo v3 ~85%)
　AVXが有効なインテルCPUでの学習や検出がYOLOv3の場合の85%に改善。

・optimized memory allocation during network resizing when random=1
　random=1の場合のネットワークリサイズ時のメモリ配置を最適化

・optimized GPU initialization for detection - we use batch=1 initially instead of re-init with batch=1　
　検出時のバッチサイズを1を初期値として再設定。

・added correct calculation of mAP, F1, IoU, Precision-Recall using command darknet detector map...
　darknet detector mapコマンド時に、　
　　　　mAP
　　　　F1
　　　　IoU
　　　　Precision-Recall
　を算出

　※これまではtestコマンドを各weightに対して発行して、個別に算出していた。

・added drawing of chart of average-Loss and accuracy-mAP (-map flag) during training
　学習時のグラフにaverage-Loss and accuracy-mAPを追加

・run ./darknet detector demo ... -json_port 8070 -mjpeg_port 8090 as JSON and MJPEG server to get results online over the network by using your soft or Web-browser
　/darknet detector demo ... -json_port 8070 -mjpeg_port 8090でブラウザや独自に開発したソフトウェアで、検出結果の動画をオンラインで取得可能

・added calculation of anchors for training
　学習時にanchorの算出を追加

・added example of Detection and Tracking objects: https://github.com/AlexeyAB/darknet/blob/master/src/yolo_console_dll.cpp
　検出とトラッキングオブジェクトのサンプルを追加（yolo_console_dll.cpp→yolo_console_dll.exe）

・run-time tips and warnings if you use incorrect cfg-file or dataset
　誤ったcfgやデータセットを指定した場合に実行時のtipsや警告を表示

・added support for Windows
　Windowsのサポートを追加
　※以前からサポートされているんじゃ？

・many other fixes of code...
　その他にもコードの修正を実施

What is new in YOLOv4?　の不完全意訳

What's new in YOLOv4?の原文はこちら

What is new in YOLOv4?　何が新しいん？

YOLOv4's architecture is composed of CSPDarknet53 as a backbone, spatial pyramid pooling additional module, PANet path-aggregation neck and YOLOv3 head.
YOLOv4はCSPDarknet53をバックボーンに持ち、spatial pyramid poolingが追加され、PANet path-aggregation neckとYOLOv3 headから成る。

CSPDarknet53 is a novel backbone that can enhance the learning capability of CNN. The spatial pyramid pooling block is added over CSPDarknet53 to increase the receptive field and separate out the most significant context features. Instead of Feature pyramid networks (FPN) for object detection used in YOLOv3, the PANet is used as the method for parameter aggregation for different detector levels.

（Google翻訳）

CSPDarknet53は、CNNの学習機能を強化できる新しいバックボーンです。空間ピラミッドプーリングブロックがCSPDarknet53に追加され、受容野を増やし、最も重要なコンテキスト機能を分離します。 YOLOv3で使用されるオブジェクト検出用の機能ピラミッドネットワーク（FPN）の代わりに、PANetがさまざまな検出器レベルのパラメーター集約の方法として使用されます。

What is an improvement in results?　結果はどうなん？

YOLOv4 is twice as fast as EfficientDet (competitive recognition model) with comparable performance. In addition, AP (Average Precision) and FPS (Frames Per Second) increased by 10% and 12% compared to YOLOv3.

（Google翻訳）

YOLOv4は、EfficientDet（競合する認識モデル）の2倍の速度で、同等のパフォーマンスを発揮します。さらに、AP（平均精度）およびFPS（1秒あたりのフレーム数）は、YOLOv3と比較して10％および12％増加しました。

Conclusion　結論

YOLO is a futuristic recognizer that has faster FPS and is more accurate than available detectors. The detector can be trained and used on a conventional GPU which enables widespread adoption. New features in YOLOv4 improve accuracy of the classifier and detector and may be used for other research projects.

YOLOは、より高速なFPS、現在利用可能などの検出器よりも正確な、先進的な認識機能。検出器は、よく使われるGPUでトレーニングできます。 YOLOv4の新機能により、分類器と検出器の精度が向上し、様々な研究プロジェクトに使用できます。

▼この記事を書いたひと