For anyone in the same boat as my self, I later found out that this is actually very easy to achieve (thanks to ChatGPT). Theoretically, this is how it is done
1. Encode faces, there is a library called face_recognition, that can grab faces from pictures and encode them
2. Group the faces data using `pairwise_distances(encodings, metric='euclidean')`, you only need sklearn library for this
Amongst all the WhatsApp media on my phone I would like to get a list of all the videos and photos with my family in it and then delete the rest.
Is something like this possible with immich?