Semantic Segmentation

  1. Introduction

Segmentation is fundamental in different computer vision sectors such as virtual reality, self-driving vehicles as well as human-computer interactions and other related fields. However, there are multiple segmentation methods such as thresholding, clustering based segmentation, edge-based segmentation, and various researchers proposed other methods over a period of time, these researchers have vast computer skills and knowledge. Moreover, the introduction of Deep Networks to the computer vision facilitated this segmentation problem, leading to the development of different new methods for groundbreaking such as mask-RCNN and semantic segmentation. Also, there are other segmentation problems work-outs such as the use of Saliency methods in the current set-ups.

The whole segmentation problem is significant due to applications that facilitate image recognition and search, estimation of motion, tracking of an object in the video, recognition of human actions. But, this project will broadly focus on the semantic segmentation as the best amongst other segmentation methods. Semantic segmentation involves the exploitation of intensive architecture, for example, Convolutional Neural Networks, which is a flexible method that has outdone the traditional methods concerning efficiency and accuracy. Unfortunately, the semantic segmentation is quite outstanding as compared to other segmentation techniques due to its critical significance in the computing vision fields. Semantic segmentation often semantically partitions the image into appropriate parts. Importantly, these segmented parts are used to form different classes that are used in the computing process.

Through semantic segmentation, the formed classes enclosing object region labels their respective pixel this technique facilitates the realization of fine-grained inferences. The comprising activities such as “semantic segmentation,” pixel-based classification and “labeling scenes” are orientated to achieve the same specific goal and objective of the computing vision to semantically understand each pixel’s role in the object’s image. In semantic segmentation an individual can take several routes to achieve the specific goal; however, these routes can result in insignificant gradations in segmentation. However, it is essential to review standard deep networks that have promoted critical advancements for computer vision, and these networks form the basis of semantic segmentation system.

  1. Model

The model selected for semantic segmentation is skip layers as well as Up-sampling to perform image segmentation. This chosen network is referred to as a Fully Connected Network (FCN). FCN network is described as follows; the computing vision, in this case, involves the de-convolutional layer rather than the convolutional layer for up-sampling, this is because of the convolutional layer normally down-sample images. Due to the up-sampling of the image, the next significant aspect of this model is the fusion of the layers, and this technique is essential when performing up-sampling. In FCN when sampling downwards, the nodes number get decreased and attempt to up-sample the anodes by either 32x or 16x, the label map which is the output eventually roughens. Also, there are semantic networks that use convolution layers are AlexNet it has of five convolutional layers and dropout, VGG-16 it consists of stack convolutional layers constituted with small receptive fields on the initial layers. The up-sampling at each stage is further refined through the addition of coarser maps with high resolutions from lower layers of the VGG-16 model.

Up-sampling Methods

Up-sampling feature map’s resolution can be achieved through a couple of different approaches. The first approach is through transpose convolutions, which is the most common strategy used to develop a learned up-sampling. Secondly, typical convolution can also be used; it consumes dot product values in the view of the filter and produces a single corresponding value to the output position, which is a contrary technique to transpose convolution method.

  1. Experiment

For the implementation of the FCN network, it is advisable to use VGG-16 model, which is pre-trained and loads the model. The VGG-16 model facilitates the extraction of the image input, while keeping theprobe of third layer, fourth layer, and seventh layer of the VGG-16 model as per the first semantic segmentation paper. The experimentation was follows the following sequence for appropriate computing vision. The program is;

image_input = tf.get_default_graph().get_tensor_by_name( ’image_input’ + “:0”)

 keep_prob = tf.get_default_graph().get_tensor_by_name( ’keep_prob’ + “:0”)

 layer3_out = tf.get_default_graph().get_tensor_by_name( ’layer3_out’ + “:0”)

 layer4_out = tf.get_default_graph().get_tensor_by_name( ’layer4_out’ + “:0”)

layer7_out = tf.get_default_graph().get_tensor_by_name( ’layer7_out’ + “:0”)

The next stage is known as skip layer, it involves creation of the original model from extracted layers of pre-trained VGG-16 model. The model is created by fusing scores from third and eighth layers, and fusion of the new output with the fourth layer score and FCN- layer 9.The FCN layer-11; the tenth layer of FCN is up-sampled four times for the dataset to match the dimensions of the size of the image so the actual image is retained and the depth is equivalent to the amount of classes based on the convolution network parameters.

The final experimental stage is the training phase of the developed new network. The training process consists of loss function which is performed by softmax cross entropy, and optimizer that is AdamOptimizer. The Adam Optimizer is derived by the following equation; m = beta1 ∗ m + (1 − beta1) ∗ dx v = beta2 ∗ v + (1 − beta2) ∗ (dx ∗∗2) x+ = −learningrate ∗ m/(np.sqrt(v) + eps). The network is trained using the train_nn functions while saving the inference data for record purposes.

3.2.Loss Function Definition

The loss function that is mainly used is pixel-wise cross entropy loss. It has been widely used in the semantic image segmentation. Loss function allows each pixel to examine singly as well as comparing predictions of the class to the developed hot encoded target vectors. This provides the best equal learning to every single pixel because the cross-entropy evaluates each pixel vector’ class prediction, after which it averages over total pixels.

  • Dataset

The suitable dataset is that of KITTI, which consist of various images of roads with training as well as testing datasets. In the training dataset, there are road images as background and mask images as the labels. The KITTI dataset is prepared through records of a platform in motion when driving in and around a particular area. The images consist of images from the camera, GPS measurements of high-precision, accelerations of IMU generated by combined IMU/GPS, and laser scans. Dataset development majorly improves computer vision research alongside robotic algorithms in the autonomous driving field. Therefore this dataset is essential for studying semantic segmentation.

  • Hyper Parameters

During training process of the new model it is necessary to tune the hyperparameters; however, selection of the appropriate hyper-parameter is very fundamental for an elaborate deep learning model; hence the chosen parameters for semantic segmentation were: Epochs and Batch size. The epochs are the number of iterations required by the algorithm to run appropriately, while Batch size is the amount of the samples to be propagated through the developed network.

The batch size is advantageous when it is less than the amount of all samples because this type of batch size needs less memory to train the network; thus the whole training process requires less memory making it efficient. Another point is that the mini-batches train the network faster since it allows updates needed in every propagation; for example, if one propagates 11 batches consisting 100 samples except one with 50 samples, and after each propagation the network parameter is updated, the outcome is a single update for network parameters if all samples were consumed during propagation. Despite advantages for the mini-batch, it has shortcomings such as the fluctuations in the batch gradient as compared to the full batch. This is because small batch tends to estimate the gradient accurately.

4.0. Result

            The trained network resulted in four images as shown in the figure below. It can be identified from the image that the green color is used for segmentation of the road from the background images. The portrayed result is perfect, and it demonstrates the importance of semantic segmentation as a break-through for the computer vision field. The pre-trained VGG- 16 is primarily used as an encoder. However, from the seventh layer of the VGG -16 model acts as the decoder. The FCN layer-8; is the fully joined layer of the VGG -16 model, and the 1×1 convolution constitutes it.

5.0. Conclusion

The semantic segmentation developed is proper, but the project is accommodative, and further research should be performed to make the semantic segmentation perfect for improving the computing vision. The best result depends on the skip layers as well as the up-sampling technique using appropriate networks such as ResNet, GoogLeNet. The step by step convolution by the VGG-16 model with accurate updates of the system can also give out the best segmentation practices. The skip connection should be introduced at the right stage after every convolution block to promote the subsequent block to excerpt more abstract and to depict class-salient characteristics from the previous pooled features.

6.0 Future Works

The future works related to semantic segmentation for computing vision may involve other network models such as GoogLeNet, ResNet, and AlexNet. It will also make a trial on fusing other different layers with the final layer. The future project will involve the improved U-Net variants and dilated convolutions to provide appealing and fascinating semantic segments.

 
Do you need high quality Custom Essay Writing Services?

Custom Essay writing Service