Optical Character Recognition

Project Overview
OCR-DNN: Optical Character Recognition using Deep Neural Networks
OCR-DNN is a deep learning-based Optical Character Recognition (OCR) system designed to recognize alphanumeric characters from both RGB and binary image inputs. The model leverages a deep neural network (DNN) architecture with convolutional layers to extract features from images and classify them into 62 possible outputs (letters and digits). GitHub: https://github.com/ViratSrivastava/OCR
Key Features
- Features
- Dual Input Architecture: The model accepts both RGB and binary image inputs, processing each through separate convolutional layers.
- Convolutional Neural Networks (CNNs): Uses CNNs to extract hierarchical features from images.
- Fully Connected Layers: After feature extraction, the model passes through dense layers with dropout for robust classification.
- 62 Output Classes: The system can recognize uppercase and lowercase letters along with digits (0-9).
Technologies Used
- Programming: Python, C++ executable
- Machine Learning Frameworks: TensorFlow, Keras
- Data Processing Tools: Pandas, NumPy
- Visualization: Matplotlib, Plotly
- Version Control: Git, GitHub
Performance
Model structure has a variable performance of 70-88% with respect to the type of data it is trained on. Training data belong to the following categories:
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ Connected to ┃ ┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩ │ input_layer_4 │ (None, 28, 28, 3) │ 0 │ - │ │ (InputLayer) │ │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ input_layer_5 │ (None, 28, 28, 1) │ 0 │ - │ │ (InputLayer) │ │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_12 (Conv2D) │ (None, 28, 28, │ 1,792 │ input_layer_4[0]… │ │ │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_15 (Conv2D) │ (None, 28, 28, │ 640 │ input_layer_5[0]… │ │ │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ max_pooling2d_12 │ (None, 14, 14, │ 0 │ conv2d_12[0][0] │ │ (MaxPooling2D) │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ max_pooling2d_15 │ (None, 14, 14, │ 0 │ conv2d_15[0][0] │ │ (MaxPooling2D) │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_13 (Conv2D) │ (None, 14, 14, │ 73,856 │ max_pooling2d_12… │ │ │ 128) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_16 (Conv2D) │ (None, 14, 14, │ 73,856 │ max_pooling2d_15… │ │ │ 128) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ max_pooling2d_13 │ (None, 7, 7, 128) │ 0 │ conv2d_13[0][0] │ │ (MaxPooling2D) │ │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ max_pooling2d_16 │ (None, 7, 7, 128) │ 0 │ conv2d_16[0][0] │ │ (MaxPooling2D) │ │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_14 (Conv2D) │ (None, 7, 7, 256) │ 295,168 │ max_pooling2d_13… │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_17 (Conv2D) │ (None, 7, 7, 256) │ 295,168 │ max_pooling2d_16… │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ max_pooling2d_14 │ (None, 3, 3, 256) │ 0 │ conv2d_14[0][0] │ │ (MaxPooling2D) │ │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ max_pooling2d_17 │ (None, 3, 3, 256) │ 0 │ conv2d_17[0][0] │ │ (MaxPooling2D) │ │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ flatten_4 (Flatten) │ (None, 2304) │ 0 │ max_pooling2d_14… │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ flatten_5 (Flatten) │ (None, 2304) │ 0 │ max_pooling2d_17… │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ concatenate_2 │ (None, 4608) │ 0 │ flatten_4[0][0], │ │ (Concatenate) │ │ │ flatten_5[0][0] │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ dense_6 (Dense) │ (None, 256) │ 1,179,904 │ concatenate_2[0]… │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ dropout_4 (Dropout) │ (None, 256) │ 0 │ dense_6[0][0] │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ dense_7 (Dense) │ (None, 128) │ 32,896 │ dropout_4[0][0] │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ dropout_5 (Dropout) │ (None, 128) │ 0 │ dense_7[0][0] │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ └─────────────────────┴───────────────────┴────────────┴───────────────────┘
- Handwritten Characters: Upper case and Lower case with numbers with both RGB and Binary Images of the characters
- Digital Character: Upper case and Lower case with numbers with Binary Images of the characters
- Multi Font Digital Character: Upper case and Lower case with numbers with Binary Images of the characters in multiple fonts
- Handwritten Text: Upper case and Lower case with numbers with both RGB and Binary Images of the Text
- Digital Text: Upper case and Lower case with numbers with Binary Images of the Text