Page tree
Skip to end of metadata
Go to start of metadata
  1. What are the image file types supported?

    1. The following types of files are supported for extracting text:

      1. png

      2. jpg

      3. jpeg

      4. bmp
      5. pdf (note that the PDF file should only have images)

  2. What are the languages supported?

    1. Currently following languages are supported: 

      1. English
      2. Danish
      3. Dutch
      4. French
      5. German
      6. Italian
      7. Polish
      8. Portuguese
      9. Russian
      10. Spanish
      11. Swedish

  3. When are the image files indexed for searching?

    1. When a new attachment is added in Jira, it is queued for processing. ExtracText background job runs every one minute and processes all queued items. Attachments that pre-existed before install of ExtracText will not be processed unless the admin adds them to the queue. For more details visit here

  4. How to index the old image files that were added prior to installing ExtracText?

    1. An admin can specify any JQL query to process the attachments from the matching issues as part of the ExtracText background job. For more details visit here

  5. How to force rerun text extraction for an image?

    1. By default, the text extraction happens only once on a given attachment and ExtracText caches the result. If you would like to force run the OCR again then you can choose the "Force Run" option when specifying the JQL in the admin configuration. For more details visit here

  6. How to view the whole text extracted from an image?

    1. The whole text extracted from the image is added to search index as a custom field named "ExtracText" of the issue. If you want to view the whole text extracted from any image, you can do so by going to Issues>Search for images and while the image is shown in its thumbnail form, hover over the image and click on "ExtracText" button. A dialog pops up which shows the whole text extracted from that image and you can copy it in one click.

  7. How to search for text extracted from images?

    The text is added to the Jira search index as a Jira custom field called "ExtracText" for all issues and you can find attachments by simply doing a global search or JQL search. For example the JQL - "ExtracText ~ gmail" will show all image attachments from issues that have the string "gmail". You can conveniently view the images in thumbnail form when you run this JQL from "Issues>Search for images" menu option. In addition if you want to clear out images in the issue that do not have this text, you can specify the text in "Text to search in images:" field as well. If there is no text entered in "JQL" field or "Search text" field, all images processed by ExtracText will be shown. 

  8. I cannot find the attachment even though I give the correct search string?

    Make sure the attachment is actually processed by ExtracText by going to "Search for images" page and enter attachment file name. If you do not see your attachment then it is not processed by ExtracText yet. The background job runs every one minute and processes all attachments added in the last one minute. If it was an attachment added prior to ExtracText install, then your admin has to run a job to make sure the attachments will be processed. For more details visit here .  If the image shows up in the search then you can verify what text got extracted from it. Hover over it and click on "ExtracText" button to see the text extracted. Alternately try doing fuzzy search to see if some characters were not identified correctly by the OCR engine. For example if you want to search for a word like "Chrome browser" you could use a JQL like ExtracText ~ "Chrome~ browser~". This will return the results even if it matches "Ohrome drowser".

  9. When trying to install ExtracText or while restarting Jira, ExtracText app is disabled. How to enable it?

    1. ExtracText will not get enabled if the runtime free memory is less than 300MB. You will see an error message in the log that says "Free memory is xxxMB which is less than required memory (300MB) for ExtracText. Hence not proceeding with the install". This could happen on instances running on very low memory (say around 1GB) for Jira. You can try enabling the plugin after some time to see if memory may have freed up or increase memory allotted to Jira.
    2. If you are trying to install ExtracText on Windows make sure you have Visual C++ 2015 redistributable installed before installing ExtracText. If not, uninstall ExtracText, install the redistributable from here and then install ExtracText again. 

  10. What can I do to make sure the text extracted is better in quality?

    1. Following are some tips to get better results of text extraction:
      1. The images are on a clear background - Although ExtracText can detect text in any background color, it may fail if a single block of text (a word, sentence or a paragraph) has different background colors. 
      2. ExtracText is tuned for detecting text on machine/computer generated images like screenshots or scanned documents. It is not good for detecting text on natural images and hence it may fail to detect text in images captured from cameras
      3. When saving screenshot images, choose PNG as file format which provides lossless compression for images
      4. Avoid having noise in images ("noise" are randomly distributed pixels in the background)
      5. Text is not too small (characters less than 10 pixels height) or too big (characters greater than 40 pixels in width)
      6. Text is properly segmented (e.g. if text in multiple windows are overlapping one another in the screenshot results may not be accurate)

  11. Where can I download the jar file for installation?

    Visit here and click on Download link
  12. Unable to install ExtracText on 32 bit Linux system. How to fix it?

    1. If you need support to install ExtracText on a 32 bit Linux system, please contact us.

  13. Why is the text from images in my PDF document are not extracted?

    1. ExtracText extracts text if the PDF document contains "ONLY" images (like in scanned documents). If the document contains text or any other content ExtracText does not process that file.

  14. The output from images on my PDF document are totally wrong. How to fix it?

    1. If your pdf document contains images that are rotated, then you may see just junk characters being printed. Verify that the image is positioned in the PDF. If you are scanning documents, make sure to get the direction right. Note that ExtracText can handle some degree of skew.
  15. Some thumbnails are not showing the actual thumbnail of the image but shows a general file icon.
    1. Jira generates thumbnails for some files depending on the size and type of the file. If a thumbnail was generated by Jira then it will be displayed by ExtracText. If not it will display a general file type icon.