Tesseract OCR gui fix + romanian language patch


On Fedora first make sure you have Tesseract OCR installed (and romanian langpack):

sudo yum install tesseract tesseract-langpack-ron

Download and extract tesseract-gui from Sourceforge (version 2.8 was latest at the time of this writting and the patch below applies to this version):

http://sourceforge.net/projects/tesseract-gui/files/tesseract-gui/tesseract-gui-2/

Here is the romanian.patch:

--- tesseract-gui.py    2012-03-20 14:20:37.000000000 +0200
+++ tesseract-gui.py.ro    2012-11-29 19:55:19.721193111 +0200
@@ -1321,6 +1321,7 @@
                     self.LCdutch:"nld",\
                     self.LCgerman:"deu",\
                     self.LCfrench:"fra",\
+                    self.LCromanian:"ron",\
                     self.LCenglish:"eng"}
         self.ListLanguages={}
         nnfin=len(Languages)
@@ -1329,7 +1330,7 @@
         for lang in Languages.keys():
             #print Languages[lang]
             #print os.path.join(TESSDATA_PATH, Languages[lang] + ".lm")
-            if os.path.isfile(os.path.join(TESSDATA_PATH, Languages[lang] + ".cube.lm")) or os.path.isfile(os.path.join(TESSDATA_PATH, Languages[lang] + ".unicharset")):
+            if os.path.isfile(os.path.join(TESSDATA_PATH, Languages[lang] + ".cube.lm")) or os.path.isfile(os.path.join(TESSDATA_PATH, Languages[lang] + ".traineddata")):

                 self.ListLanguages[lang] = Languages[lang]
                 self.cmbLang.append_text(lang)
@@ -1508,6 +1509,7 @@
             self.LCfrench = "Francés"
             self.LCitalian = "Italiano"
             self.LCportuguese = "Portugués"
+            self.LCromanian = "Romanian"
             #--- tips spanish
             self.LCtooltips_tip = "Mostrar o esconder mensajes de ayuda"
             self.LCselimg_tip = "Seleccionar las imágenes a convertir en texto"
@@ -1590,6 +1592,7 @@
             self.LCfrench = "Français"
             self.LCitalian = "Italien"
             self.LCportuguese = "Portugais"
+            self.LCromanian = "Romanian"
             #--- tips french
             self.LCtooltips_tip = "Montrez ou masquer messages d'aide"
             self.LCselimg_tip = "Sélectionner des fichiers d'images à changer en texte"
@@ -1672,6 +1675,7 @@
             self.LCfrench = "Francese"
             self.LCitalian = "Italiano"
             self.LCportuguese = "Portoghese"
+            self.LCromanian = "Romanian"
             #--- tips italian
             self.LCtooltips_tip = "Mostrare o nascondere i messaggi di aiuto"
             self.LCselimg_tip = "Selezionare le immagini da convertire in testo"
@@ -1757,6 +1761,7 @@
             self.LCfrench = "French"
             self.LCitalian = "Italian"
             self.LCportuguese = "Portuguese"
+            self.LCromanian = "Romanian"
             #--- tips english
             self.LCtooltips_tip = "Show or hide tooltips"
             self.LCselimg_tip = "Selecting images to convert to text"
@@ -1922,7 +1927,7 @@
     #figure out where '/usr/share' is relative to the direcotry where we are located
     SHARE_PATH = os.path.join(os.path.split(os.path.abspath(sys.argv[0]))[0], "..", "share")
     TESSDATA_PATH = "."
-    for path in ("/usr/share/tessdata",
+    for path in ("/usr/share/tesseract/tessdata",
                  "/usr/local/share/tessdata",
                  os.path.join(SHARE_PATH, "tesseract-ocr", "tessdata")):
         if os.path.isdir(path):

Create romanian.patch in tesseract-gui-2.8/bin directory and apply.

patch -p0 < romanian.patch

Then install tesseract gui from tesseract-gui-2.8 with

make install

Romanian language should be available in the language selector from tesseract gui.

Tagged with: , , ,
Posted in Linux