Call Us India :- +91 9925144200       US :- +1 (732) 927-5544       Email us :


Mar 16

Configuring OCR in Alfresco

OCR (Optical Character Recognition) is the recognition of printed or written text characters by a computer. It recognizes the characters from the images or scanned documents, and that makes the images (which contain text) searchable. OCR is a very useful feature for any ECM product or software. In this blog, we will see how we can configure it in Alfresco Community Edition. We have tested this with Alfresco versions 5.1.f and 5.2.e. It should also work with other nearby versions.

Read the blog for OCR in Alfresco [Video]


  1. Alfresco Community / Enterprise Edition installed and running
  2. Basic knowledge of Alfresco administration

Steps to Configure Tesseract:

1. Download Tesseract and install

apt-get install tesseract-ocr

2. Stop the alfresco tomcat server

./ stop tomcat

3. Download the Linux /Windows context file and place at



4. Place ocr.bat(Windows) and at <ALFRESCO-HOME>/

a) ocr.bat (for Windows)

REM to see what happens
mkdir c:\tmp
echo from %1 to %2 >> C:\\tmp\ocrtransform.log
copy /Y %1 "C:\TMP\%~n1%~x1"
echo target %~d2%~p2%~n2
REM call tesseract and redirect output to $TARGET
"C:\Program Files (x86)\Tesseract-OCR\tesseract.exe" "C:\tmp\%~n1%~x1" "%~d2%~p2%~n2" -l eng

b) (for Linux)

# save arguments to variables
# Create temp directory if it doesn't exist
sudo mkdir -p $TMPDIR
# to see what happens
#echo "from $SOURCE to $TARGET" >>/tmp/ocrtransform.log
# call tesseract and redirect output to $TARGET
sudo /usr/local/bin/tesseract $TMPDIR/$OCRFILE ${TARGET%\.*} -l eng
#sudo tesseract $TMPDIR/$OCRFILE ${TARGET%\.*} -l eng
sudo rm -f $TMPDIR/$OCRFILE

Note: Make sure that the path for tesseract command is correct in the / ocr.bat file

/usr/local/bin or /usr/bin


C:\Program Files(x86)\Tesseract-ocr\tesseract.exe
or C:\Program Files\Tesseract-ocr\tesseract.exe

5. If the current user does not have read or execute permissions on then give it.

chmod +rx /opt/<ALFRESCO-HOME>/

6. Add following properties in the file located at






7. Start tomcat server

./ start tomcat


C:\<ALFRESCO-HOME>\tomcat\bin\startup.bat press enter.
Or use manager-windows.exe

Note: Existing files in alfresco will not be OCRed, you have to upload new image files to test.


  1. Make sure you are passing correct arguments in the context file (Entries in context files will be  different for Windows and Linux).
  2. Check whether your .bat or .sh commands are properly working or not
  3. Verify that tesseract creates text file for the image file
    1. To verify that go to the directory where tesseract is installed and run the following command
    2. tesseract ./<image file-name> ./<text file-name> -l eng

If the text file is created with content in it, your tesseract is working.

Comment here, if your contents are still not searchable. We are happy to know your ECM challenges, as we love solving them Contact us!

Kintu Barot

About The Author

Kintu is an Alfresco Certified Engineer (ACE501). Apart from shaping Alfresco projects to the requirements of the clients, he keeps himself busy in coordinating Alfresco training program for the new developers at ContCentric.


  1. Christa
    May 22, 2018 at 3:26 pm · Reply


  2. Christa
    May 22, 2018 at 3:28 pm · Reply

    how exactly did you get to integrate tesseract into alfresco

  3. yusen
    July 9, 2018 at 1:18 am · Reply

    Can I configure it to add support to pdf with images?

  4. Vikas
    October 2, 2018 at 7:12 am · Reply

    followed your steps but still i am not able to reed image pdf files what is the sorce and target mentioned on do we need to mention folder name

    • ContCentric
      October 29, 2018 at 5:15 am · Reply

      Hello Vikas- Hope you could integrate it properly after we guided you on the call. Thanks!

  5. Madhushani
    March 6, 2019 at 1:12 pm · Reply

    Hi ContCentric,
    I followed these steps but still can’t read an image pdf. What should I do?

    • ContCentric
      March 8, 2019 at 4:44 am · Reply

      I am contacting you on your email to help you fix this. Thanks.

  6. Mo
    March 26, 2019 at 2:39 am · Reply

    I configured already with the alfresco but when I put image to the alfresco and then text file is blank information.

    • ContCentric
      July 18, 2019 at 1:19 pm · Reply

      You can try to run the following command manually and test if it creates text file then it should work. Otherwise there may be a problem with tesseract installation or version.

      tesseract image-file-name text-file-name -l eng

Leave a reply

Your email address will not be published. Required fields are marked *