Skip to content

Commit

Permalink
Box extraction using opencv
Browse files Browse the repository at this point in the history
  • Loading branch information
KananVyas committed Jul 22, 2018
0 parents commit cef6d08
Show file tree
Hide file tree
Showing 17 changed files with 102 additions and 0 deletions.
Binary file added 41.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Cropped/1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Cropped/10.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Cropped/2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Cropped/3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Cropped/4.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Cropped/5.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Cropped/6.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Cropped/7.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Cropped/8.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Cropped/9.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Image_bin.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
12 changes: 12 additions & 0 deletions README
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@

When you are working with Optical character recognition(OCR) or any data or
object recognition problem, the first thing to do is preprocessing. Here preprocessing means
to extract the location where our information is located. After extracting the location,
any machine algorithm will be performed on that image. 

The problem arises when you have to detect objects which are located in any tables/boxes or
in row-column format. If the image is like this then you have to detect boxes and extract them one by one.
Now it should be done accurately for all images.

This algorithm helps to detect every boxes accurately and save it in a "Cropped" folder.The code is shown in box_detection.py
and the test image is "41.jpg".
90 changes: 90 additions & 0 deletions box_detection.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
import cv2
import numpy as np

def sort_contours(cnts, method="left-to-right"):
# initialize the reverse flag and sort index
reverse = False
i = 0

# handle if we need to sort in reverse
if method == "right-to-left" or method == "bottom-to-top":
reverse = True

# handle if we are sorting against the y-coordinate rather than
# the x-coordinate of the bounding box
if method == "top-to-bottom" or method == "bottom-to-top":
i = 1

# construct the list of bounding boxes and sort them from top to
# bottom
boundingBoxes = [cv2.boundingRect(c) for c in cnts]
(cnts, boundingBoxes) = zip(*sorted(zip(cnts, boundingBoxes),
key=lambda b: b[1][i], reverse=reverse))

# return the list of sorted contours and bounding boxes
return (cnts, boundingBoxes)

def box_extraction(img_for_box_extraction_path, cropped_dir_path):

img = cv2.imread(img_for_box_extraction_path, 0) # Read the image
(thresh, img_bin) = cv2.threshold(img, 128, 255,
cv2.THRESH_BINARY | cv2.THRESH_OTSU) # Thresholding the image
img_bin = 255-img_bin # Invert the image

cv2.imwrite("Image_bin.jpg",img_bin)

# Defining a kernel length
kernel_length = np.array(img).shape[1]//40

# A verticle kernel of (1 X kernel_length), which will detect all the verticle lines from the image.
verticle_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1, kernel_length))
# A horizontal kernel of (kernel_length X 1), which will help to detect all the horizontal line from the image.
hori_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (kernel_length, 1))
# A kernel of (3 X 3) ones.
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))

# Morphological operation to detect verticle lines from an image
img_temp1 = cv2.erode(img_bin, verticle_kernel, iterations=3)
verticle_lines_img = cv2.dilate(img_temp1, verticle_kernel, iterations=3)
cv2.imwrite("verticle_lines.jpg",verticle_lines_img)

# Morphological operation to detect horizontal lines from an image
img_temp2 = cv2.erode(img_bin, hori_kernel, iterations=3)
horizontal_lines_img = cv2.dilate(img_temp2, hori_kernel, iterations=3)
cv2.imwrite("horizontal_lines.jpg",horizontal_lines_img)

# Weighting parameters, this will decide the quantity of an image to be added to make a new image.
alpha = 0.5
beta = 1.0 - alpha
# This function helps to add two image with specific weight parameter to get a third image as summation of two image.
img_final_bin = cv2.addWeighted(verticle_lines_img, alpha, horizontal_lines_img, beta, 0.0)
img_final_bin = cv2.erode(~img_final_bin, kernel, iterations=2)
(thresh, img_final_bin) = cv2.threshold(img_final_bin, 128, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)

# For Debugging
# Enable this line to see verticle and horizontal lines in the image which is used to find boxes
cv2.imwrite("img_final_bin.jpg",img_final_bin)
# Find contours for image, which will detect all the boxes
im2, contours, hierarchy = cv2.findContours(
img_final_bin, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
# Sort all the contours by top to bottom.
(contours, boundingBoxes) = sort_contours(contours, method="top-to-bottom")

idx = 0
for c in contours:
# Returns the location and width,height for every contour
x, y, w, h = cv2.boundingRect(c)

# If the box height is greater then 20, widht is >80, then only save it as a box in "cropped/" folder.
if (w > 80 and h > 20) and w > 3*h:
idx += 1
new_img = img[y:y+h, x:x+w]
cv2.imwrite(cropped_dir_path+str(idx) + '.png', new_img)

# For Debugging
# Enable this line to see all contours.
# cv2.drawContours(img, contours, -1, (0, 0, 255), 3)
# cv2.imwrite("./Temp/img_contour.jpg", img)


box_extraction("41.jpg", "./Cropped/")
Binary file added horizontal_lines.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img_final_bin.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added verticle_lines.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit cef6d08

Please sign in to comment.