Update (2015-09-08):
A pull request I submitted to Homebrew to add a --with-opencl
option to the tesseract
formula has now been accepted, so you should be able to just do brew install --HEAD --with-opencl tesseract
. For issues with OpenCL-enabled Tesseract on OS X, please see this issue.
After coming across these instructions for building Tesseract with OpenCL support, I wanted to experiment with this feature to see if it would enable faster OCR processing. I also came across this blog post experimenting with the feature under Linux and Windows, but I wanted to try it on Mac OS X and AWS EC2 GPU instances.
Using Mac OS X with Homebrew
Here I built off my existing work modifying the Tesseract Homebrew formula to install the Tesseract training tools.
The only gotcha (as I serendipitously found out) is that there appears to be a bug in the OpenCL build under OS X that will cause it to fail if you don’t have a /opt/local
directory for it to include. As I didn’t feel like fixing this, you can simply work around it by running sudo mkdir -p /opt/local
before installing with the command:
brew install --training-tools --all-languages --opencl --HEAD https://github.com/ryanfb/homebrew/raw/tesseract_training/Library/Formula/tesseract.rb
If all went well, you should now have an OpenCL-enabled build of Tesseract.
Using an AWS GPU-Enabled Docker Host
For this I built off my existing work using Docker for VisualSFM under AWS. I’ve published the Docker build for this on Docker Hub as ryanfb/tesseract-opencl
. For clarity, I’ll repeat the instructions for using this on EC2 here:
- Launch an EC2 instance with instance type
g2.2xlarge
, community AMIami-2cbf3e44
, and 20+ GB of storage - Connect to your EC2 instance
- Install
docker
inside your EC2 instance:sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 36A1D7869245C8950F966E92D8576A8BA88D21E9
sudo sh -c "echo deb https://get.docker.com/ubuntu docker main > /etc/apt/sources.list.d/docker.list"
sudo apt-get update
sudo apt-get install lxc-docker
- Run a GPU-enabled Tesseract docker image:
- Build the CUDA samples and run
deviceQuery
inside your Docker host (this seems to be necessary to init the nvidia devices in/dev
):cd ~/nvidia_installers
sudo ./cuda-samples-linux-6.5.14-18745345.run -noprompt -cudaprefix=/usr/local/cuda-6.5/
cd /usr/local/cuda/samples/1_Utilities/deviceQuery
sudo make
./deviceQuery
- Find your nvidia devices with:
ls -la /dev | grep nvidia
- Set these as
--device
arguments in a variable you’ll pass to thedocker run
command:export DOCKER_NVIDIA_DEVICES="--device /dev/nvidia0:/dev/nvidia0 --device /dev/nvidiactl:/dev/nvidiactl --device /dev/nvidia-uvm:/dev/nvidia-uvm"
sudo docker run -ti $DOCKER_NVIDIA_DEVICES ryanfb/tesseract-opencl /bin/bash
- Follow the instructions here for more explanation and to verify CUDA access inside the container
- Build the CUDA samples and run
Results
With OpenCL suppport enabled, an initial run of tesseract
will perform some automatic device detection and profiling on first run and save the results to various .bin
files and a tesseract_opencl_profile_devices.dat
file in the current working directory, which it will re-use on subsequent runs.
Here’s the diagnostic information for the three machines I tested with:
Here, (null)
is the non-OpenCL Tesseract implementation (i.e. what you get if you build without OpenCL). You can see that on OS X, the OpenCL implementation also detects/reports the CPU as an available device for OpenCL. “Score” is the result of the timing profile, so higher values are worse. I’m not sure if the profiling/timing is correct on OS X or the OpenCL implementation is just simply always outperformed by the general implementation, but we can see on both sets of hardware here that that’s what gets selected.
The AWS EC2 g2.2xlarge
results appeared promising, but in practice (testing my OCR process against this 527-page volume) I didn’t notice a giant speed improvement over running it on my iMac (about 20 vs. 30 minutes).
So, I think I’ll be sticking to building Tesseract without OpenCL for now. I think there are still great parallelization improvements that could be made in Tesseract, especially in the training process, but the current OpenCL implementation doesn’t appear to have completely solved that problem.