Optical Character Recognition

Project Overview

OCR-DNN: Optical Character Recognition using Deep Neural Networks

OCR-DNN is a deep learning-based Optical Character Recognition (OCR) system designed to recognize alphanumeric characters from both RGB and binary image inputs. The model leverages a deep neural network (DNN) architecture with convolutional layers to extract features from images and classify them into 62 possible outputs (letters and digits). GitHub: https://github.com/ViratSrivastava/OCR

Key Features

Features
Dual Input Architecture: The model accepts both RGB and binary image inputs, processing each through separate convolutional layers.
Convolutional Neural Networks (CNNs): Uses CNNs to extract hierarchical features from images.
Fully Connected Layers: After feature extraction, the model passes through dense layers with dropout for robust classification.
62 Output Classes: The system can recognize uppercase and lowercase letters along with digits (0-9).

Technologies Used

Programming: Python, C++ executable
Machine Learning Frameworks: TensorFlow, Keras
Data Processing Tools: Pandas, NumPy
Visualization: Matplotlib, Plotly
Version Control: Git, GitHub

Performance

Model structure has a variable performance of 70-88% with respect to the type of data it is trained on. Training data belong to the following categories:

                        ┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
                        ┃ Layer (type)        ┃ Output Shape      ┃    Param # ┃ Connected to      ┃
                        ┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
                        │ input_layer_4       │ (None, 28, 28, 3) │          0 │ -                 │
                        │ (InputLayer)        │                   │            │                   │
                        ├─────────────────────┼───────────────────┼────────────┼───────────────────┤
                        │ input_layer_5       │ (None, 28, 28, 1) │          0 │ -                 │
                        │ (InputLayer)        │                   │            │                   │
                        ├─────────────────────┼───────────────────┼────────────┼───────────────────┤
                        │ conv2d_12 (Conv2D)  │ (None, 28, 28,    │      1,792 │ input_layer_4[0]… │
                        │                     │ 64)               │            │                   │
                        ├─────────────────────┼───────────────────┼────────────┼───────────────────┤
                        │ conv2d_15 (Conv2D)  │ (None, 28, 28,    │        640 │ input_layer_5[0]… │
                        │                     │ 64)               │            │                   │
                        ├─────────────────────┼───────────────────┼────────────┼───────────────────┤
                        │ max_pooling2d_12    │ (None, 14, 14,    │          0 │ conv2d_12[0][0]   │
                        │ (MaxPooling2D)      │ 64)               │            │                   │
                        ├─────────────────────┼───────────────────┼────────────┼───────────────────┤
                        │ max_pooling2d_15    │ (None, 14, 14,    │          0 │ conv2d_15[0][0]   │
                        │ (MaxPooling2D)      │ 64)               │            │                   │
                        ├─────────────────────┼───────────────────┼────────────┼───────────────────┤
                        │ conv2d_13 (Conv2D)  │ (None, 14, 14,    │     73,856 │ max_pooling2d_12… │
                        │                     │ 128)              │            │                   │
                        ├─────────────────────┼───────────────────┼────────────┼───────────────────┤
                        │ conv2d_16 (Conv2D)  │ (None, 14, 14,    │     73,856 │ max_pooling2d_15… │
                        │                     │ 128)              │            │                   │
                        ├─────────────────────┼───────────────────┼────────────┼───────────────────┤
                        │ max_pooling2d_13    │ (None, 7, 7, 128) │          0 │ conv2d_13[0][0]   │
                        │ (MaxPooling2D)      │                   │            │                   │
                        ├─────────────────────┼───────────────────┼────────────┼───────────────────┤
                        │ max_pooling2d_16    │ (None, 7, 7, 128) │          0 │ conv2d_16[0][0]   │
                        │ (MaxPooling2D)      │                   │            │                   │
                        ├─────────────────────┼───────────────────┼────────────┼───────────────────┤
                        │ conv2d_14 (Conv2D)  │ (None, 7, 7, 256) │    295,168 │ max_pooling2d_13… │
                        ├─────────────────────┼───────────────────┼────────────┼───────────────────┤
                        │ conv2d_17 (Conv2D)  │ (None, 7, 7, 256) │    295,168 │ max_pooling2d_16… │
                        ├─────────────────────┼───────────────────┼────────────┼───────────────────┤
                        │ max_pooling2d_14    │ (None, 3, 3, 256) │          0 │ conv2d_14[0][0]   │
                        │ (MaxPooling2D)      │                   │            │                   │
                        ├─────────────────────┼───────────────────┼────────────┼───────────────────┤
                        │ max_pooling2d_17    │ (None, 3, 3, 256) │          0 │ conv2d_17[0][0]   │
                        │ (MaxPooling2D)      │                   │            │                   │
                        ├─────────────────────┼───────────────────┼────────────┼───────────────────┤
                        │ flatten_4 (Flatten) │ (None, 2304)      │          0 │ max_pooling2d_14… │
                        ├─────────────────────┼───────────────────┼────────────┼───────────────────┤
                        │ flatten_5 (Flatten) │ (None, 2304)      │          0 │ max_pooling2d_17… │
                        ├─────────────────────┼───────────────────┼────────────┼───────────────────┤
                        │ concatenate_2       │ (None, 4608)      │          0 │ flatten_4[0][0],  │
                        │ (Concatenate)       │                   │            │ flatten_5[0][0]   │
                        ├─────────────────────┼───────────────────┼────────────┼───────────────────┤
                        │ dense_6 (Dense)     │ (None, 256)       │  1,179,904 │ concatenate_2[0]… │
                        ├─────────────────────┼───────────────────┼────────────┼───────────────────┤
                        │ dropout_4 (Dropout) │ (None, 256)       │          0 │ dense_6[0][0]     │
                        ├─────────────────────┼───────────────────┼────────────┼───────────────────┤
                        │ dense_7 (Dense)     │ (None, 128)       │     32,896 │ dropout_4[0][0]   │
                        ├─────────────────────┼───────────────────┼────────────┼───────────────────┤
                        │ dropout_5 (Dropout) │ (None, 128)       │          0 │ dense_7[0][0]     │
                        ├─────────────────────┼───────────────────┼────────────┼───────────────────┤
                        └─────────────────────┴───────────────────┴────────────┴───────────────────┘

Handwritten Characters: Upper case and Lower case with numbers with both RGB and Binary Images of the characters
Digital Character: Upper case and Lower case with numbers with Binary Images of the characters
Multi Font Digital Character: Upper case and Lower case with numbers with Binary Images of the characters in multiple fonts
Handwritten Text: Upper case and Lower case with numbers with both RGB and Binary Images of the Text
Digital Text: Upper case and Lower case with numbers with Binary Images of the Text