Retinal Scan Data Automation

Oct. 2021

Project Summary

One of my friends who works in a lab was given a list of ≈1,000 patients. For each of these patients, she was supposed to navigate through the software that stores scan data and copy their measurements into an Excel spreadsheet. Basically, her job was just to click-through and copy. Click, type a bit, click a bit more, copy over some numbers, and repeat a thousand times.

I found out that she was assigned this task when I visited her one day, and I told her that a computer could easily do this – there's no reason that she should be wasting so many hours of her time serving as a robot, clicking through some software and copying down a few numbers. I told her to stop doing it manually, because I was determined to figure out a way to make the computer do this itself – to automate it.

So, as promised, I created a Python script which would interface with the GUI of the database software. This script employed the Tesseract OCR Engine in order to read the text and numbers on the screen. I had her run it overnight, as the script was GUI-level automation, meaning that it had to click and type things, so she unfortunately couldn't use her computer while it ran. At least in the end, as she once joked, she was practically "working in her sleep."

The script ran at a rate of about 1 patient/20 seconds, so within just a few nights of running, it was all done. It had about a 90% overall success rate (900/1,000 charts), as if the script was not fully confident in what it was reading, it would flag the patient's chart and move on rather than record potentially wrong measurements. Along with tremendous testing on my part while developing the script, we also randomly sampled the "successful" charts (charts for which it did record measurements) and found 0 incorrect readings.

Skills Learned & Used

  • Python
    • This project was pretty much all done in Python 3.
  • GUI Automation/Scripting
    • While I've done countless automations/scripts before, this one was new for me, as it used Optical Character Recognition (OCR) to read what was on the screen. While I much prefer conventional automation and scripting (i.e., using APIs), it was really nice to know that if there is no API available, then GUI scripting – simulating the actions of a human user – always works.
  • Optical Character Recognition
    • Before this project, I had never used the Tesseract OCR engine before. While it extremely powerful, it's also extremely sensitive to the slightest of changes, and there is a long way to go.
  • Image Filtering & Processing
    • In order to make parts of the screen readable by Tesseract, I had to devise my own algorithms for processing images and getting them extremely clean. This involved comparing R, G, and B values on a per-pixel basis, among other methods. I pretty much had to throw everything at the wall here, in terms of ideas, in order to see what would stick!
  • Computer Vision
    • I used OpenCV in this project in order to reliably crop the screenshot to read text. For example, the location of the measurements and white circle overlaying the retinal scan depended on where the patient's eye fell during the time of image collection – i.e., it was pretty much random. I thus first used the Circle Hough Transform to detect the circle, then used its center point to extrapolate the locations of the measurements for processing & reading.

Acknowledgements


Thank you very, very much to the following people for their help and support on this project:

  • My friend

    Thank you very much to her for trusting me to finish this project. Even before I had started it, I told her to stop working on it (so as to not waste any more of her time). While it took a few weeks for me to code it, debug it, and test it, I'm glad that it worked out in the end!