Skip to main content

Running Oracle Outsidein Technology (OIT) in Docker

Oracle Outsidein Technology provides a set of tools and SDKs to convert many forms of data to readable documents. It also holds data extraction and reduction capabilities. Used by multiple firms and tech fronts for more than 3 decades, OIT is a vast product here to stay for a long time.
Here, I made a starting attempt on using OIT's Image export using docker. Created a small docker file with the oracle-java8 base image, loaded the image export jars and dependencies, created a few mounting volumes and ran it on a single sample pdf file. An enriching and learning experience for me and my father all the same. This blog shows the steps I took to get image export working on my machine.

Prerequisites

  • Docker (to be installed on your machine, you can go to docker's website and download it for your OS windows/linux/mac)
  • OIT Image Export SDK (Get it here)
The image export SDK also contains some samplefiles to play with. Make sure to do all the tasks below in the same folder or in folders inside the same folder.

Steps

  1. Create a Dockerfile
    # Get the oracle java image
        FROM store/oracle/serverjre:1.8.0_241-b07
        LABEL Author="Divyaksh (@divyaksh-shukla)"
        
        # Installing additional tools
        RUN yum install -y unzip
        
        WORKDIR /home/oit
        COPY ix-8-5-4-linux-x86-64.zip .
        RUN unzip ix-8-5-4-linux-x86-64.zip -d ix_image
        WORKDIR /home/oit/ix_image
        RUN sh makedemo.sh
        WORKDIR /home/oit/ix_image/sdk/demo
        
        ENV CLASSPATH=oilink.jar:oitsample.jar 
  2. Create Docker Volume mounts for inputs/samplefiles, outputs and fonts (may not be needed in some cases)
    $ docker volume create oit_output oit_output $ docker volume create oit_samplefiles oit_samplefiles $ docker volume create oit_fonts oit_fonts

    See if the docker volume is created by listing

    $ docker volume ls DRIVER VOLUME NAME local oit_fonts local oit_output local oit_samplefiles

    Inspect the docker volume for addtional information

    $ docker volume inspect oit_output [ { "CreatedAt": "2020-05-07T14:53:00+05:30", "Driver": "local", "Labels": {}, "Mountpoint": "/var/lib/docker/volumes/oit_output/_data", "Name": "oit_output", "Options": {}, "Scope": "local" } ]
  3. Copy the contents of sdk/samplefiles in the downloaded "image export" to the right location, on the hist system, by inspecting the docker volume
  4. For this example we need the liberation font, found in /usr/share/fonts/liberation on Linux
  5. Build the Image
    docker build --rm -t oit-sample:1 -f Dockerfile .
  6. Run the Image on a container
    docker run -it --rm -v oit_output:/home/oit/outputs -v oit_samplefiles:/home/oit/samplefiles -v oit_fonts:/home/oit/fonts oit-sample:1 java -cp oilink.jar:oitsample.jar OITSample /home/oit/samplefiles/adobe-acrobat.pdf /home/oit/outputs/test.tif tiff /home/oit/fonts
  7. Also available as a shell command

    bash run-it.sh
We can now finally see some output as follows
$ bash run-it.sh Sending build context to Docker daemon 125.5MB Step 1/11 : FROM store/oracle/serverjre:1.8.0_241-b07 ---> ef9c1a0152ab Step 2/11 : LABEL Author="Divyaksh (@divyaksh-shukla)" ---> Using cache ---> 375998c9ca19 Step 3/11 : RUN yum install -y unzip ---> Using cache ---> 3116220fc555 Step 4/11 : WORKDIR /home/oit ---> Using cache ---> e9a5f7f9978b Step 5/11 : COPY ix-8-5-4-linux-x86-64.zip . ---> Using cache ---> 6f07b04abbae Step 6/11 : RUN unzip ix-8-5-4-linux-x86-64.zip -d ix_image ---> Using cache ---> 85d6e5a3ce1e Step 7/11 : WORKDIR /home/oit/ix_image ---> Using cache ---> 20256453949d Step 8/11 : RUN ls ---> Running in 5c9d8519c1de makedemo.sh README redist sdk Removing intermediate container 5c9d8519c1de ---> 6b8673720c9b Step 9/11 : RUN sh makedemo.sh ---> Running in 47f7e4b17e60 Removing intermediate container 47f7e4b17e60 ---> 0fa1b311067f Step 10/11 : WORKDIR /home/oit/ix_image/sdk/demo ---> Running in cd60511e8e72 Removing intermediate container cd60511e8e72 ---> 53a3eb155b79 Step 11/11 : ENV CLASSPATH=oilink.jar:oitsample.jar ---> Running in 0d95affdf08d Removing intermediate container 0d95affdf08d ---> bdad82cab940 Successfully built bdad82cab940 Successfully tagged oit-sample:1 File Identifier : Adobe Acrobat (PDF)(1557) File Identifier (Raw): Adobe Acrobat (PDF)(1557) Creating file: /home/oit/outputs/test.tif Creating file: /home/oit/outputs/test0001.tiff Creating file: /home/oit/outputs/test0002.tiff Export Successful
The Dockerfile and bash script run-it.sh are available on my github

Comments

Popular posts from this blog

Arduino with 7-Segment Display (LT542)

A 7-segment display is a LED-LCD display with 8 LCD cells are controlled by 8 pins. Usually a 7-segment display has 10 pins, 2 are common pins and the rest 8 control each LCD cell. Now, a 7-segment display is of 2 types, common cathodic and common anodic display. While the common cathodic display has its common pins hooked up to the ground(GND), the common anodic display has its common pins hooked up to high voltage(+5). A diagram explaining this is given. I have used a display numbered LT542 which is a common anodic display. This project is aimed to control the LT542 to display each number from 0 to 9 at a second's gap. MATERIALS REQUIRED: Arduino Uno LT542 Jumper wires (male-to-male) Breadboard STEPS: Wire the setup as shown in the schematic and pictures. Copy the code given and paste it into the arduino IDE. Plug in your board tho the computer using a USB cable. Upload the code. CODE:  /**   PINOUT DIAGRAM FOR THIS CODE   ...

NAS (National Achivement Survey) data extraction

I had recently been to a datathon (A hackathon related to data science) in PES University, Bangalore. There my team was given a task to extract data from the National Achievement Survey - 2017 conducted by NCERT. NAS collects data about CBSE schools across states and districts of India to collect data about student achievements and their overall reports. This data is present in PDF formats. We were tasked to extract data from PDF and tabulate it. $ pdftotext is a linux utility to convert pdf to text. By supplying a -layout option the default layout of the data is mostly preserved. I made a python script (pdf_convert.py) to convert the pdf data to text files sequentially. Next I made a script to convert the text files to csv data. So each text file was turned to a record (row) in the csv file. Here is a snapshot of the directory structure of PDF file that we got. . ├── Andaman & Nicobar Islands │   ├── Andaman │   │   ├── Andamans Class - ...