I was expecting the use of sentence-transformers (sbert.net). If you have a long...

I was expecting the use of sentence-transformers (sbert.net). If you have a long list of entities you could use an approximate similarity search library such as annoy. The authors store the embeddings in a database and decode json for each comparison. Very inefficient in my opinion. At least load the whole table of embeddings in a np.array from the start, np.dot is plenty fast if your list not huge.

The problem is still not solved. Having a list of the most similar entities does not tell you which are similar and which are just related. You need a classifier. For that you can label a few hundred pairs of positive and negative examples and use the same sbert.net to finetune a transformer. The authors use the easier route of thresholding cosine similarity score at 0.8, but this threshold might not work for your case.