We used tesseract ocr, plus other image processing libraries to improve ocr's chances, and although we were close to usable results there were still many challenges. One major problem was that different data sources had completely different page layouts, and often times the original documents were out of reach and the client only had available the scans of fax transmissions!
The bottom line was that we were able to achieve very good results consistently, but the margin of error was not acceptable for our specific use case.
The bottom line was that we were able to achieve very good results consistently, but the margin of error was not acceptable for our specific use case.