- What is Image recognition?
- How does Image recognition work?
- Working of Convolutional and Pooling layers
- Image recognition using Python
- Image recognition with a pre-trained network
Before starting with this blog, first have a basic introduction to CNN to brush up on your skills. The visual performance of Humans is much better than that of computers, probably because of superior high-level image understanding, contextual knowledge, and massively parallel processing. But human capabilities deteriorate drastically after an extended period of surveillance, also certain working environments are either inaccessible or too hazardous for human beings. So for these reasons, automatic recognition systems are developed for various applications. Driven by advances in computing capability and image processing technology, computer mimicry of human vision has recently gained ground in a number of practical applications.
Learn with India’s #1 program in AIML and build job-ready skills. Designed in collaboration with the University of Texas at Austin, this program offers a comprehensive curriculum to help professionals upskill fast. You will pick up industry-valued skills in all the AIML concepts like Machine Learning, Computer Vision, Natural Language Processing, Neural Networks, and more. This program also includes several guided projects to help you become experts. Great Learning also offers personalized career coaching and interview preparation to help you ace the recruiting process.
Check out our machine learning course
What is Image recognition?
Image recognition refers to technologies that identify places, logos, people, objects, buildings, and several other variables in digital images. It may be very easy for humans like you and me to recognise different images, such as images of animals. We can easily recognise the image of a cat and differentiate it from an image of a horse. But it may not be so simple for a computer.
A digital image is an image composed of picture elements, also known as pixels, each with finite, discrete quantities of numeric representation for its intensity or grey level. So the computer sees an image as numerical values of these pixels and in order to recognise a certain image, it has to recognise the patterns and regularities in this numerical data.
Image recognition should not be confused with object detection. In object detection, we analyse an image and find different objects in the image while image recognition deals with recognising the images and classifying them into various categories.
How does Image recognition work?
Typically the task of image recognition involves the creation of a neural network that processes the individual pixels of an image. These networks are fed with as many pre-labelled images as we can, in order to “teach” them how to recognize similar images.
So let me break the process for you in some simple steps:
- We need a dataset containing images with their respective labels. For example, an image of a dog must be labelled as a dog or something that we can understand.
- Next, these images are to be fed into a Neural Network and then trained on them. Usually, for the tasks concerned with images, we use convolutional neural network. These networks consist of convolutional layers and pooling layers in addition to Multiperceptron layers(MLP). The working of convolutional and pooling layers are explained in the below.
- We feed in the image that is not in the training set and get predictions.
In the coming sections, by following these simple steps we will make a classifier that can recognise RGB images of 10 different kinds of animals.
Note: The model will only be able to recognise animals that are in the dataset. For example, a model trained to recognise dogs and cat cannot recognise boats
Working of Convolutional and Pooling layers
Convolutional layers and Pooling layers are the major building blocks used in convolutional neural networks. Let us see them in detail
How does Convolutional Layer work?
The convolutional layer’s parameters consist of a set of learnable filters (or kernels), which have a small receptive field. These filters scan through image pixels and gather information in the batch of pictures/photos. Convolutional layers convolve the input and pass its result to the next layer. This is like the response of a neuron in the visual cortex to a specific stimulus.
Below is an example of how convolution operation is done on an image. A similar process is done for all the pixels.
Here is an example of an image in our test set that has been convoluted with four different filters and hence we get four different images.
How does Pooling Layer work?
The pooling operation involves sliding a two-dimensional filter over each channel of the feature map and summarising the features lying within the region covered by the filter. A pooling layer is usually incorporated between two successive convolutional layers. The pooling layer reduces the number of parameters and computation by down-sampling the representation. The pooling function can be either max or average. Max pooling is commonly used as it works better
The pooling operation involves sliding a two-dimensional filter over each channel of the feature map and summarising the features lying within the region covered by the filter. This process is illustrated below.
When passing the four images we got after convolution through a max-pooling layer of dimension 2×2, we get this as output
As we can see, the dimensions have decreased by one half but the information in the image is still preserved.
Image recognition using Python
Here I am going to use deep learning, more specifically convolutional neural networks that can recognise RGB images of ten different kinds of animals. An RGB image can be viewed as three different images(a red scale image, a green scale image and a blue scale image) stacked on top of each other, and when fed into the red, green and blue inputs of a colour monitor, it produces a colour image on the screen. We use a dataset known as Animals-10 from Kaggle.
So, let us start making a classifier using Python and Keras. We are going to implement the program in Colab as we need a lot of processing power and Google Colab provides free GPUs.The overall structure of the neural network we are going to use can be seen in this image. Also, have a look at google colab python online compiler.
The very first step is to get data on your Colab notebook. You don’t need high-speed internet for this as it is directly downloaded into google cloud from the Kaggle cloud.
For getting the data, follow these steps:
- Go to your Kaggle account and click on my accounts. In case you don’t have a Kaggle account, create one, it is free.
- Next, download the kaggle.json file by clicking on the button ‘ create new API token’.
- Go to your Colab notebook and start coding
Note: Take up this free course on Kaggle Competition to get introduced to the Kaggle platform. You will learn why it is an excellent choice for coders to build their expertise in data science by using datasets, code files, etc. Finally, you will introduced to Kaggle competition.
In this tutorial, we are using ImageGenerator to label the images. So, in case you are using some other dataset, be sure to put all images of the same class in the same folder. And then place all the folders in the folder.
# These steps are to be followed when using google colab
#and importing data from kaggle
from google.colab import files
# Install Kaggle library
!pip install -q kaggle
from google.colab import files
#upload the kaggle.json file
uploaded = files.upload()
#make a diectoryin which kajggle.json is stored
# ! mkdir ~/.kaggle
! cp kaggle.json ~/.kaggle/
#download the dataset into the colab
!kaggle datasets download -d alessiocorrado99/animals10
#unzip the data
!unzip /content/animals10.zip
#Incase you are using a local machine, start from here.
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras import Sequential,Model
from tensorflow.keras.layers import BatchNormalization,Dropout,Flatten
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import MaxPooling2D
from tensorflow.keras.layers import GlobalAveragePooling2D
from tensorflow.keras.preprocessing import image
import numpy as np
import os
import cv2
train_data_dir='/kaggle/input/animals10/raw-img/'
img_height=128
img_width=128
batch_size=64
nb_epochs=20
train_datagen = ImageDataGenerator(rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
validation_split=0.2) # set validation split
train_generator = train_datagen.flow_from_directory(
train_data_dir,
target_size=(img_height, img_width),
batch_size=batch_size,
class_mode='categorical',
subset='training') # set as training data
validation_generator = train_datagen.flow_from_directory(
train_data_dir, # same directory as training data
target_size=(img_height, img_width),
batch_size=batch_size,
class_mode='categorical',
subset='validation') # set as validation data
model = Sequential()
inputShape = (128, 128, 3)
model.add(Conv2D(64, (3, 3), padding="same", activation='relu', input_shape=inputShape))
model.add(BatchNormalization())
model.add(Conv2D(32, kernel_size = 5, strides=2, padding='same', activation='relu'))
model.add(MaxPooling2D((2,2)))
model.add(Dropout(0.4))
model.add(Conv2D(64, kernel_size = 5, strides=2, padding='same', activation='relu'))
model.add(MaxPooling2D((2,2)))
model.add(BatchNormalization())
model.add(Dropout(0.4))
model.add(Flatten())
model.add(Dropout(0.4))
model.add(Dense(64, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(10, activation='softmax'))
model.summary()
#compile the model
model.compile(optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"])
#train the model,this step takes alot of time (hours)
model.fit_generator(
train_generator,
steps_per_epoch = train_generator.samples // batch_size,
validation_data = validation_generator,
validation_steps = validation_generator.samples // batch_size,
epochs = nb_epochs)
#save the model for later use
model.save('pathname of model')
#order of the animals array is important
#animals=["dog", "horse","elephant", "butterfly", "chicken", "cat", "cow", "sheep","spider", "squirrel"]
bio_animals=sorted(os.listdir('/content/raw-img'))
categories = {'cane': 'dog', "cavallo": "horse", "elefante": "elephant", "farfalla": "butterfly", "gallina": "chicken", "gatto": "cat", "mucca": "cow", "pecora": "sheep", "scoiattolo": "squirrel","ragno":"spider"}
def recognise(pred):
animals=[categories.get(item,item) for item in bio_animals]
print("The image consist of ",animals[pred])
from tensorflow.keras.preprocessing import image
import numpy as np
img = image.load_img("/kaggle/input/testttt/OIF-e2bexWrojgtQnAPPcUfOWQ.jpeg", target_size=(128, 128))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
prediction=model.predict(x)
# prediction
recognise(np.argmax(prediction))
test_data_path="/content/test data/test_animals"
files=sorted(os.listdir(test_data_path))
files=files[1:]
for img in files:
x=cv2.imread(os.path.join(test_data_path,img))
cv2_imshow(x)
recognise(np.argmax(predict[files.index(img)]))
print("")
Output: I downloaded some images from google and used this model to label them. Here are the results
To predict Images, we need to upload them to the Colab(gets deleted automatically after the session is ended ) or you can even download them to your google drive permanently.
Follow the steps below to create a directory for test data
- Create a new folder called test data
- Next, create another folder in this folder named test animals
- Upload your images to this folder.
As we can see, this model did a decent job and predicted all images correctly except the one with a horse. This is because the size of images is quite big and to get decent results, the model has to be trained for at least 100 epochs. But due to the large size of the dataset and images, I could only train it for 20 epochs ( took 4 hours on Colab ).
To increase the accuracy and get an accurate prediction, we can use a pre-trained model and then customise that according to our problem.
Image Recognition with a pre-trained model
In this example, I am going to use the Xception model that has been pre-trained on Imagenet dataset. This technique is basically called Transfer learning.
Xception Model is proposed by Francois Chollet. Xception is an extension of the inception Architecture which replaces the standard Inception modules with depthwise Separable Convolutions. This model is available on Keras and we just need to import it.So let’s start coding
from google.colab import files
# Install Kaggle library
!pip install -q kaggle
from google.colab import files
#upload the kaggle.json file
uploaded = files.upload()
#make a diectoryin which kajggle.json is stored
# ! mkdir ~/.kaggle
! cp kaggle.json ~/.kaggle/
#download the dataset into the colab
!kaggle datasets download -d alessiocorrado99/animals10
#unzip the data
!unzip /content/animals10.zip
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras import Sequential,Model
from tensorflow.keras.layers import BatchNormalization,Dropout,Flatten
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import MaxPooling2D
from tensorflow.keras.layers import GlobalAveragePooling2D
from tensorflow.keras.preprocessing import image
from tensorflow.keras .layers import GlobalAveragePooling2D
import numpy as np
import os
import cv2
train_data_dir='/kaggle/input/animals10/raw-img/'
img_height=299
img_width=299
batch_size=64
nb_epochs=20
train_datagen = ImageDataGenerator(rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
validation_split=0.2) # set validation split
train_generator = train_datagen.flow_from_directory(
train_data_dir,
target_size=(img_height, img_width),
batch_size=batch_size,
class_mode='categorical',
subset='training') # set as training data
validation_generator = train_datagen.flow_from_directory(
train_data_dir, # same directory as training data
target_size=(img_height, img_width),
batch_size=batch_size,
class_mode='categorical',
subset='validation') # set as validation data
#import a pre-trained model, without the top layers.We will customise
#the top layers for our problem
base_model = tf.keras.applications.Xception(include_top=False, input_shape=(299,299,3))
#For now freeze the initial layers and do not train them
for layer in base_model.layers:
layer.trainable = False
# create a custom top classifier
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(516, activation='relu')(x)
#since our problem has 10 differnt animals we have 10 classes
#thus we keep 10 nodes in the last layer
predictions = Dense(10, activation='softmax')(x)
model = Model(inputs=base_model.inputs, outputs=predictions)
model.summary()
model.compile(optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"])
model.fit_generator(
train_generator,
steps_per_epoch = train_generator.samples // batch_size,
validation_data = validation_generator,
validation_steps = validation_generator.samples // batch_size,
epochs = nb_epochs)
#Now unfreeze the layers and train the whole model
for layer in base_model.layers:
layer.trainable = True
history =model.fit_generator(
train_generator,
steps_per_epoch = train_generator.samples // batch_size,
validation_data = validation_generator,
validation_steps = validation_generator.samples // batch_size,
epochs = nb_epochs)
model.save('pathname of model')
#order of the animals array is important
#animals=["dog", "horse","elephant", "butterfly", "chicken", "cat", "cow", "sheep","spider", "squirrel"]
bio_animals=sorted(os.listdir('/content/raw-img'))
categories = {'cane': 'dog', "cavallo": "horse", "elefante": "elephant", "farfalla": "butterfly", "gallina": "chicken", "gatto": "cat", "mucca": "cow", "pecora": "sheep", "scoiattolo": "squirrel","ragno":"spider"}
def recognise(pred):
animals=[categories.get(item,item) for item in bio_animals]
print("The image consist of ",animals[pred])
from tensorflow.keras.preprocessing import image
import numpy as np
img = image.load_img("/kaggle/input/testttt/OIF-e2bexWrojgtQnAPPcUfOWQ.jpeg", target_size=(299, 299))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
prediction=model.predict(x)
# prediction
recognise(np.argmax(prediction))
test_data_path="/content/test data/test_animals"
files=sorted(os.listdir(test_data_path))
files=files[1:]
for img in files:
x=cv2.imread(os.path.join(test_data_path,img))
cv2_imshow(x)
recognise(np.argmax(predict[files.index(img)]))
print("")
Output:
As we can see the model makes accurate predictions on all of the data in our test dataset. I have saved this model, hence it can be used at any time by using the function shown below:
from tensorflow import keras
model = keras.models.load_model('path .h5')
#e.g. model = keras.models.load_model('/content/simpleconvkag.h5')
In case you want the copy of the trained model or have any queries regarding the code, feel free to drop a comment.
This brings us to the end of this article. We have learned how image recognition works and classified different images of animals.
If you wish to learn more about Python and the concepts of Machine learning, upskill with Great Learning’s PG Program Artificial Intelligence and Machine Learning.