addressalign-toparrow-leftarrow-leftarrow-right-10x10arrow-rightbackbellblockcalendarcameraccwcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-checkcircle-with-crosscircle-with-pluscontroller-playcredit-cardcrossdots-three-verticaleditemptyheartexporteye-with-lineeyefacebookfolderfullheartglobe--smallglobegmailgooglegroupshelp-with-circleimageimagesinstagramFill 1languagelaunch-new-window--smalllight-bulblightning-boltlinklocation-pinlockm-swarmSearchmailmediummessagesminusmobilemoremuplabelShape 3 + Rectangle 1ShapeoutlookpersonJoin Group on CardStartprice-ribbonprintShapeShapeShapeShapeImported LayersImported LayersImported Layersshieldstar-shapestartickettrashtriangle-downtriangle-uptwitteruserwarningyahooyoutube

Re: [newtech-1] Ocr for handwriting

From: Jonathan J.
Sent on: Thursday, August 2, 2012, 8:54 AM
Hi Josh,

From my recent experience with Abby FlexiCapture, ICR (individual character recognition, which encompasses handwriting) is still not that accurate unless you've created your form initially with ICR in mind, so here's a few questions:

  • Are the forms all the same? (if so you can zone each field)
  • Do they have square marks at the corner of each page? (you'll need that to deskew)
  • Do you have a blank form? (you'll need that to mark out the initial fields and adjust the dropout level for text on the form you don't want)
  • Are there circles/bubbles/checkboxes on the form you need to capture? (this then suggests more of an OMR (optical mark recognition tool) that you'll need to process))
  • Did the form have comb fields or boxes restricting to one letter in each spot? |_|_|_|_| (this is much more successful)
  • Do the forms have a lot of free form fields where someone could write a paragraph? (these are low accuracy because it's much easier to restrict a DOB field to a valid date)
  • How many forms total?
If the number of forms is relatively low and you need relatively little information and the information on the forms is not sensitive - account numbers, social security, electronic patient information, you might be better off using ODesk or Elance to hire a small team offshore to extract and proof the extraction of the data you need.

If you go the ICR route, you will need proofers, not to mention that Abbyy FlexiCapture isn't cheap and it can be a pain to setup, especially if you don't need this as a repeatable process.

Feel free to reach out off list.

Good luck,

Jonathan Jaffe | Founding Owner
www.its-your-internet.com 
business on cloud2020

7050 Austin Street | Suite 120LL | Forest Hills, NY 11375 |[masked]


On Twitter: @jkjaffe
On LinkedIn: http://www.linkedin.com/in/jonathankjaffe
On Facebook: @jkjaffe
On Quora: 
http://www.quora.com/Jonathan-Jaffe-1




On Thu, Aug 2, 2012 at 12:39 AM, Josh Cohen <[address removed]> wrote:
Ive got a very large set of scanned pages im looking to OCR.
The scans are of forms filled out by people, including signatures.

Anyone know of software that could handle this?

Sent from mobile



--
Please Note: If you hit "REPLY", your message will be sent to everyone on this mailing list ([address removed])
http://www.meetup.com/ny-tech/
This message was sent by Josh Cohen ([address removed]) from NY Tech Meetup.
To learn more about Josh Cohen, visit his/her member profile: http://www.meetup.com/ny-tech/members/13520482/
Set my mailing list to email me

As they are sent
http://www.meetup.com/ny-tech/list_prefs/?pref=1

In one daily email
http://www.meetup.com/ny-tech/list_prefs/?pref=2

Don't send me mailing list messages
http://www.meetup.com/ny-tech/list_prefs/?pref=0
Meetup, PO Box 4668 #37895 New York, New York[masked] | [address removed]


People in this
group are also in: