My suspicion is O(kn), where k is both constant and rather large. This would be highly reliant on an extremely good blocking algorithm. Maybe trending towards O(nlogn)? I don't know. It's really tough.
The blocking algorithm for this hypothetical solution would most likely be an approximate hashing algorithm sort of resembling a bloom filter. The input data would have to be amenable to hashing, like a single string field. Add multiple fields and shit just got complicated fast. Look up q-gram blocking if you want to read a sort of simple example of a blocking algorithm like this.
Thinking about this, I think you would end up with something like O(s^2*b) where s is the size of your biggest block (e.g. all the "Smith"s or company names containing "Inc." if that doesn't get removed), and b is the number of blocks. This would be at least n, so O(kn) with a big k makes some sense. If you have some way to get away with not comparing every element in a block to every other element, e.g. exact string match only I think you could get away with O(s*log(s)*b), but most of the time the matches are not going to be good enough.
My suspicion is O(kn), where k is both constant and rather large. This would be highly reliant on an extremely good blocking algorithm. Maybe trending towards O(nlogn)? I don't know. It's really tough.
The blocking algorithm for this hypothetical solution would most likely be an approximate hashing algorithm sort of resembling a bloom filter. The input data would have to be amenable to hashing, like a single string field. Add multiple fields and shit just got complicated fast. Look up q-gram blocking if you want to read a sort of simple example of a blocking algorithm like this.