Caveat: just my personal hunch. My suspicion is O(kn), where k is both constant ...

benmanns · on June 1, 2022

Thinking about this, I think you would end up with something like O(s^2*b) where s is the size of your biggest block (e.g. all the "Smith"s or company names containing "Inc." if that doesn't get removed), and b is the number of blocks. This would be at least n, so O(kn) with a big k makes some sense. If you have some way to get away with not comparing every element in a block to every other element, e.g. exact string match only I think you could get away with O(s*log(s)*b), but most of the time the matches are not going to be good enough.