Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

We used tesseract ocr, plus other image processing libraries to improve ocr's chances, and although we were close to usable results there were still many challenges. One major problem was that different data sources had completely different page layouts, and often times the original documents were out of reach and the client only had available the scans of fax transmissions!

The bottom line was that we were able to achieve very good results consistently, but the margin of error was not acceptable for our specific use case.



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: