Experimenting with training a model for facial recognition to detect specific people.
First I trained it on yolov8n then on yolov8n-face (yolov8n trained on WIDERFACE).
Training on top of yolov8n model
- First I trained on top of the yolov8n model with my own dataset composed of images of random faces and images of my face.
Results
- OK results given I didn’t optimize any model parameters and my dataset was only 82 images in total (train, test, and val).
- When I tested it with another person live, however, it could not discern between us 2.
Training on top of yolov8n model with frozen backbone
- Next I trained on top of the yolov8n model but with the first 10 layers of the network frozen.
Results
Training on top of yolov8n-face model
- Next, I took the yolov8n-face model above, which is a yolov8n model trained on the WIDER FACE dataset, and trained it on my own dataset.
Training on top of yolov8n-face model with frozen backbone
- Next I trained the yolov8n-face model on my own dataset with a frozen YOLO backbone (first 10 layers frozen).
Conclusion
The yolov8n model achieved a mAP50 of 0.887, the yolov8n model with 10 frozen layers achieved a mAP50 of 0.918, the yolov8n-face model achieved a mAP50 of 0.928, and the yolov8n-face model with 10 frozen layers achieved a mAP50 of 0.948.
Transfer learning helped the model significantly. The model performed best when the yolo backbone was frozen and when the model was first trained on a large dataset of faces, specifically the WIDER FACE dataset.
The models I trained detected other faces as me often (high FP rate). i.e. though precision was high, my dataset was not large/representative enough.
Significant improvements can be made with a better dataset and with augmenting the images.