Update (2015-09-08):
A pull request I submitted to Homebrew to add a --with-opencl option to the tesseract formula has now been accepted, so you should be able to just do brew install --HEAD --with-opencl tesseract. For issues with OpenCL-enabled Tesseract on OS X, please see this issue.
After coming across these instructions for building Tesseract with OpenCL support, I wanted to experiment with this feature to see if it would enable faster OCR processing. I also came across this blog post experimenting with the feature under Linux and Windows, but I wanted to try it on Mac OS X and AWS EC2 GPU instances.
Using Mac OS X with Homebrew
Here I built off my existing work modifying the Tesseract Homebrew formula to install the Tesseract training tools.
The only gotcha (as I serendipitously found out) is that there appears to be a bug in the OpenCL build under OS X that will cause it to fail if you don’t have a /opt/local directory for it to include. As I didn’t feel like fixing this, you can simply work around it by running sudo mkdir -p /opt/local before installing with the command:
brew install --training-tools --all-languages --opencl --HEAD https://github.com/ryanfb/homebrew/raw/tesseract_training/Library/Formula/tesseract.rb
If all went well, you should now have an OpenCL-enabled build of Tesseract.
Using an AWS GPU-Enabled Docker Host
For this I built off my existing work using Docker for VisualSFM under AWS. I’ve published the Docker build for this on Docker Hub as ryanfb/tesseract-opencl. For clarity, I’ll repeat the instructions for using this on EC2 here:
- Launch an EC2 instance with instance type
g2.2xlarge, community AMIami-2cbf3e44, and 20+ GB of storage - Connect to your EC2 instance
- Install
dockerinside your EC2 instance:sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 36A1D7869245C8950F966E92D8576A8BA88D21E9sudo sh -c "echo deb https://get.docker.com/ubuntu docker main > /etc/apt/sources.list.d/docker.list"sudo apt-get updatesudo apt-get install lxc-docker
- Run a GPU-enabled Tesseract docker image:
- Build the CUDA samples and run
deviceQueryinside your Docker host (this seems to be necessary to init the nvidia devices in/dev):cd ~/nvidia_installerssudo ./cuda-samples-linux-6.5.14-18745345.run -noprompt -cudaprefix=/usr/local/cuda-6.5/cd /usr/local/cuda/samples/1_Utilities/deviceQuerysudo make./deviceQuery
- Find your nvidia devices with:
ls -la /dev | grep nvidia - Set these as
--devicearguments in a variable you’ll pass to thedocker runcommand:export DOCKER_NVIDIA_DEVICES="--device /dev/nvidia0:/dev/nvidia0 --device /dev/nvidiactl:/dev/nvidiactl --device /dev/nvidia-uvm:/dev/nvidia-uvm"sudo docker run -ti $DOCKER_NVIDIA_DEVICES ryanfb/tesseract-opencl /bin/bash
- Follow the instructions here for more explanation and to verify CUDA access inside the container
- Build the CUDA samples and run
Results
With OpenCL suppport enabled, an initial run of tesseract will perform some automatic device detection and profiling on first run and save the results to various .bin files and a tesseract_opencl_profile_devices.dat file in the current working directory, which it will re-use on subsequent runs.
Here’s the diagnostic information for the three machines I tested with:
Here, (null) is the non-OpenCL Tesseract implementation (i.e. what you get if you build without OpenCL). You can see that on OS X, the OpenCL implementation also detects/reports the CPU as an available device for OpenCL. “Score” is the result of the timing profile, so higher values are worse. I’m not sure if the profiling/timing is correct on OS X or the OpenCL implementation is just simply always outperformed by the general implementation, but we can see on both sets of hardware here that that’s what gets selected.
The AWS EC2 g2.2xlarge results appeared promising, but in practice (testing my OCR process against this 527-page volume) I didn’t notice a giant speed improvement over running it on my iMac (about 20 vs. 30 minutes).
So, I think I’ll be sticking to building Tesseract without OpenCL for now. I think there are still great parallelization improvements that could be made in Tesseract, especially in the training process, but the current OpenCL implementation doesn’t appear to have completely solved that problem.