One of my friends who works in a lab was given a list of ≈1,000 patients. For each of these patients, she was supposed to navigate through the software that stores scan data and copy their measurements into an Excel spreadsheet. Basically, her job was just to click-through and copy. Click, type a bit, click a bit more, copy over some numbers, and repeat a thousand times.
I found out that she was assigned this task when I visited her one day, and I told her that a computer could easily do this – there's no reason that she should be wasting so many hours of her time serving as a robot, clicking through some software and copying down a few numbers. I told her to stop doing it manually, because I was determined to figure out a way to make the computer do this itself – to automate it.
So, as promised, I created a Python script which would interface with the GUI of the database software. This script employed the Tesseract OCR Engine in order to read the text and numbers on the screen. I had her run it overnight, as the script was GUI-level automation, meaning that it had to click and type things, so she unfortunately couldn't use her computer while it ran. At least in the end, as she once joked, she was practically "working in her sleep."
The script ran at a rate of about 1 patient/20 seconds, so within just a few nights of running, it was all done. It had about a 90% overall success rate (900/1,000 charts), as if the script was not fully confident in what it was reading, it would flag the patient's chart and move on rather than record potentially wrong measurements. Along with tremendous testing on my part while developing the script, we also randomly sampled the "successful" charts (charts for which it did record measurements) and found 0 incorrect readings.
Thank you very, very much to the following people for their help and support on this project: