Image classification with Deep Convolutional Neural Network Using Tensorflow and Transfer of Learning

The deep learning algorithm has recently achieved a lot of success, especially in the field of computer vision. This research aims to describe the classification method applied to the dataset of multiple types of images (Synthetic Aperture Radar (SAR) images and non-SAR images). In such a classification, transfer learning was used followed by fine-tuning methods. Besides, pre-trained architectures were used on the known image database ImageNet. The model VGG16 was indeed used as a feature extractor and a new classifier was trained based on extracted features.The input data mainly focused on the dataset consist of five classes including the SAR images class (houses) and the non-SAR images classes (Cats, Dogs, Horses, and Humans). The Convolutional Neural Network (CNN) has been chosen as a better option for the training process because it produces a high accuracy. The final accuracy has reached 91.18% in five different classes. The results are discussed in terms of the probability of accuracy for each class in the image classification in percentage. Cats class got 99.6 %, while houses class got 100 %.Other types of classes were with an average score of 90 % and above.


INTRODUCTION
Recently, image classification is increasing and becoming a trend among technology developers especially with data growth in various parts of the industry such as e-commerce, automotive, healthcare and gaming. With just a few labeled images up to 98 percent accuracy, Facebook can now recognize your face with only a few marked images and label it into your Facebook folder. The technology itself nearly beats human ability in the classification or identification of images (Golob and Regan, 2001, pp.87-121).
Deep learning is one of the leading approaches to technology. Deep learning comes under the artificial intelligence umbrella, where it can behave or think like a human being. The device itself would usually be configured with hundreds of input data to make the training session more effective and swifter. It begins by giving a kind of' training' with all the input data (Patterson and Gibson, 2017). Machine learning is also a commonly used system of image classification. Nevertheless, within machine learning, there are still pieces that can be changed. Therefore, the deep learning system will be occupied with the classification of images. When it comes to image recognition machine vision has its meaning. The combination of artificial intelligence software and computer vision technology will achieve the image classification outstanding result (Papernot, McDaniel, Jha, Fredrikson, Celik and Swami, 2016, pp.372-387).
The basic task of grading images is to ensure that all images are classified according to their particular sectors or groups (Li, VanRullen, Koch & Perona, 2002, pp.9596-9601).
Classification is simple for humans but computer problems have proved to be significant. Similar to detecting an object, it consists of unidentified patterns as it should be categorized into the proper categories. The numerous applications that use image recognition technology, such as vehicle navigation, robot navigation, and remote sensing. This appears to be difficult and there is a need for limited resources to develop this (Lai and Fox, 2010, pp.1019-1037. Classification of images has been a significant obstacle in machine vision and thus has a long history. The challenge involves a wide variety of images multi-class due to color, size, environmental conditions and form. Big data of labeled training images are needed and to prepare this big data, it takes a lot of time and costs for training only (Jain, Nandakuma and Ross,2016, pp.80-105).
Transfer learning faced with the issue of achieving sufficient training data to reconstruct models. Transfer learning aims to transfer information from a large dataset to a smaller dataset called the target domain. If the functional spaces between the domain data are different, or the source and target tasks are focused on specific topics, enhancing target task output (Huang, Pan and Lei, 2017, p.907).
A Convolutional Neural Network (CNN) is a type of artificial neural network that is designed specifically to manipulate pixel data for image recognition and processing applications (Weimer, Scholz-Reiter and Shpitalni, 2016, pp.417-420).
CNN is efficient image processing, Artificial Intelligence (AI) that uses deep learning to execute both generative and descriptive tasks, often using image and video recognition machine vision (Hosny, Parmar, Quackenbush, Schwartz and Aerts, 2018, pp.500-510).
CNN uses a method that represents a multilayer perceptron designed to meet reduced processing needs. A CNN's layers consist of an input layer, an output layer and a hidden layer that contains several convolutional layers, pooling layers, fully connected layers, and layers of normalization. Removing limitations and increasing image processing capacity results in a method that is much more efficient, simpler to trains that are restricted for image processing (Acharya, Oh, Hagiwara, Tan, Adam, Gertych and San Tan, 2017, pp.389-396). This paper will use a mixture of deep convolutional neural networks and transfer learning to achieve high-resolution image classification within a rapid period of SAR images and non-SAR images.

RELATED WORK
A brief overview of previous studies on image classification, using CNN and transfer learning for recognizing the SAR target.

Image classification:
Kamavisdar, Saluja and Agrawa (2013) explain using the Decision Tree (DT) as the image classification techniques. The DT has several datasets under each hierarchical classification. The calculation of membership for each of the groups must be completed. On the intermediate stages, the classifier allowed some rejection of the class. This method also required three parts, the first of which was to find terminal nodes and the second in class placement within it. Third, the nodes are partitioned. This approach is called an efficiency rate which is very simple and high (pp.1005-1009).
Pasolli, Melgani, Tuia, Pacifici and Emery (2013) clarify article addresses active learning on Support Vector Machine (SVM), which has been very strongly growing interest during that period. It also suggested some new ideas by integrating spatial with spectral information from a sequential phase in the trial process. It needs three methods, where the first is the distance from Euclidean. It measured some of the samples from the key spatial part of the study. The second technique is based on the technique of the Parzen window and eventually involves spatial entropy. The result indicated that two of the images are extremely resolute in terms of regular effectiveness (pp.2217-2233).
Gregor, Danihelka, Graves, Rezende and Wierstra ( 2015) show that Neural Network Architecture (NNA) as a method for grading images. The system consists of a mixture of mimics of two human-eye pairs and auto-encoding sequence variation. It involved a lot of complex images but the device gradually improves the Modified National Institute of Standards and Technology database (MNIST) models in the course of this analysis. The MNIST database is the software open-source to use as the training set. It also experiments with Street View House Numbers dataset where the outcome has been enhanced as it can not be identified even by the human eyes (pp.1502-04623). (2016) illustrate an image classification system based on the structure of a CNN. The training was performed such that a balanced number of face images and nonface images were used for training by deriving additional face images from the face images data. The face classification method utilizes the CNN with 120 trained data and the auto-stage training produces a detection rate of 81.6% with just six false positives on the Face Detection Data Set and Benchmark (FDDB), where the existing state of the art hits a detection rate of about 80 % with 50 false positive ( pp.525-542).

Rastegari, Ordonez, Redmon and Farhadi
Korytkowski, Rutkowski and Scherer (2016) clarify a simple classification of images by boosting the Fuzzy Classifiers based on the journal. An easy way of distinguishing between known groups and unknown ones. This method simply boosts meta knowledge where the majority of local characteristics can be found. It was checked with some big image data and compared to the image model of the bag-offeatures. The result offered much greater classification accuracy because it was a research phase that culminated in a short period where it generated 30% shorter than the previous one (pp.175-182). Ahmed and Mahmoud (2018) illustrate using a previously trained model of convolutional neural networks based on certain parameters and hundreds of images to train and predict gender images using probability measurements. Accuracy of precision equals 0.68 and 0.3225, respectively (pp.1717-1732). Abbas, R. F. (2020) shows that a new method inspired by the process of setting edges, one of the basic methods used in many fields, including radar images, which helps to view objects such as mobile vehicles, ships, aircraft, and meteorological and terrain types. To accurately recognize these objects it is important to detect their edges. The proposed methods (Ridgelet transform, Bezier curve and Sobel operator) with Ridgelet transform has been used in her research and has shown better results than wavelet transform; it eliminates noise by using ridgelet transform soft thresholding prior to edge detection. The findings show that the preferred approach has superior effects over Sobel edge detection and the wavelet process in both subjective and objective studies. Whereas in the proposed approach the Peak Signal to Noise Ratio (PSNR) values were increased to 12. 6542, 12.9514, 12.8574 and 12.3013respectively ( pp.185-192).

SAR Target Recognition with CNN's Learning :
Chen and Wang (2014) illustrate a single convolutional layer could successfully extract the feature specification of SAR targets using randomly sampled SAR target patches with unsupervised learning and reached 84.7% accuracy in 10-class rating tasks (pp.541-547). Morgan (2015) explains using the architecture of three convolutional layers, using a fully connected Softmax layer as a classifier, improving the accuracy to 92.3% (p.94750).
Chen, Wang, Xu and Jin (2016) illustrate using an all-Convolutional network of five layers. In a convolutional layer, the authors followed a drop-out approach and removed the fully connected layer to prevent overfitting, as the minimal training data were inadequate to train CNN. The experiment results showed the state-efficiency of SAR target recognition in the MSTAR dataset, reaching an accuracy of 99.13% (pp.4806-4817) .

Transfer learning using CNN :
Oquab, Bottou, Laptev and Sivic ( 2014) explain the PASCAL VOC dataset proves the layers trained on ImageNet can be reused to effectively eliminate the mid-level image features (pp.1717-1724).
Shin, Roth, Gao, Lu, Xu, Nogues, Yao, Mollura and Summers (2016) demonstrate that the area of handling medical images where the data-poor also occurs. Transfer learning is an efficient method for using CNN to identify medical images with the help of appropriately annotated natural images. This paper by fine-tuning CNN models pre-trained from the natural image dataset accomplished two unique computeraided detection problems in medical images. They further discussed various common CNN architectures and dataset sizes, concluding that there should be careful consideration of the trade-off between better learning models and the use of more training data. The classical pattern was common to lung tissue (pp.1285-1298).
Christodoulidis, Anthimopoulos, Ebner, Christe and Mougiakakou (2016) show that using pre-trained the network on six general texture databasesand fine-tuned on the target database after transferring particular layer numbers, reaching 2% performance improvement over the same network trained on the targets (pp.76-84).

TRAINING IMAGES
The total number of images as shown in Table (1) is (908). Based on Figure (1), it is the framework of image classification where deep convolutional neural networks are also applied. For all of this cycle, there are four stages. Each of the phases is included on TensorFlow as the open-source software and Python as its programming language. Then, the images of classes will be collected as (inputs). The process is then continued by downloading pre-trained Model Visual Geometry Group (VGG16) as a feature extractor and by applying CNN to train a new classifier based on extracted features, and eventually, all images will be categorized into their classes.

IMAGE CLASSIFICATION SYSTEM STEPS
A flowchart of the image classification system steps in Figure (2). The flowchart shows that the systems will be started by insert collecting images of the classes as input. These classes consist of five types which (cats, dogs, horses, humans, and houses) and each class will take label (0, 1, 2, 3 and 4) respectively. By specifying labels the learning becomes supervised that will help to clarify the classification is correct or not. After that, the Pre-trained model is downloaded to transfer learning and it will work as an extractor features. In the next step, CNN is applied as a trainer by training the model on the features extracted from the previous step then will generate a trained model that running for validation or testing. After that, the process ends after the output is classified into the right type of classes where CNN will work as a classifier for the tested image into the predicted class and if it is not the image of a particular class that supposedly acts as output then it needs to start over again from CNN. To test another image it needs to start again from CNN.
The result obtained from classifier in this flowchart are five values of probability for labels (0, 1, 2, 3 and 4) and the label of the highest probability is considered the predicted class for the tested image and by matching the specified label for the image in the training step with the label predicted from the classification process will decide the correctness of prediction or not.

PERFORMANCE MEASURES
Performance measure on multiclass of image classification system can be evaluated by(Accuracy, F1 Score, Precision and Recall): a) The Accuracy ratio uses to verify the experiment, it is calculated as the percentage of the total number of corrected predictions (both true positive and true negative) divided by the total number of samples examined as shown in Equation (1) (Baratloo, Hosseini,Negida, and El Ashal, 2015, pp. 48-49).
(1) b) F1 score is metric for measuring the performance of machine learning model and good metric when data is imbalanced and the F1Score computed by as shown in Equation (2): (2) c) Precision computed by the following equation: (3) d) Recall computed by the following equation: (4) Whereas:  TP is the number of the images correctly identified prediction for each class.
 TN is the number of the images correctly rejected prediction for certain class.
 FP is the number of the images incorrectly identified predictions for certain class.
 FN is the number of the images incorrectly rejected predictions for certain class.

.Loading and Preprocessing the Dataset.
As shown in Figure (3) built our dataset consisting of five classes, the first four classes (cats, dogs, horses and humans) representing non-SAR images downloaded from the Kaggle site, the last category is houses representing SAR images downloaded from the site (earthexplorer.usgs.gov) and preprocessing is performed on SAR images by selecting the desired object which is the house.

Figure (3) The CNN process towards images in classes.
The first step is to gather the data. It will be the step most complicated and disquieting element. Inputs are also set with the dimensions(224x224) RGB image having a fixed scale. The process of convolution is designed with VGG16, as it generates powerful convolution neural networks. Fortunately, Kaggle named the images that we can download easily.
After import, all the necessary libraries then defining dimensions and locating images. In this step, are defining the dimensions of the image. The best dimensions to describe image sizes are (224x224). Then a file was constructed that will be used to convert all image pixels to their corresponding number and store them in a file in a specific path. Based on the number of classifications and how many images per classification that consuming time in the process. Finally, define the number of epochs and batch sizes.

7.2.Download pre-trained model in Keas (VGG16).
Transfer learning is useful because it comes with pre-made neural networks. This step will import the transfer learning aspect of CNN. Each move will integrate CNN's transfer learning feature. Many transfer learning models, such as VGG16, VGG19, ResNet50 are available.In this paper, illustrated using VGG16, which includes only 11 convolutional layers.
For this image classifier operated with five classifications so it didnot take too long to use transfer learning on those images, but concluded that the more images and classifications it will increase the needed time to complete this step, but luckily because only needed to convert pixels of images to numbers once, only needed to do the next step with each training, validation, and testing once-unless happened to remove or to corrupt the image. Then saving files in the program. After that, the weights and features are generated using VGG16.

Designing and training a CNN model in Keas.
Based on Figure (4), After creating this model the first step is to initialize the configuration with Sequential function. Then flatten the data, and add three more hidden layers.This is a classification labeled as categorical, so the final activation must always be softmax. Finally, build an evaluation phase to test the model of a training set against the validation set for accuracy. As shown in Figure (

Plotting the Loss and Accuracy curve.
The Accuracy curve has great importance in determining the best number of the epoch of iteration to train the dataset to obtain high accuracy instead of assuming random numbers each time. For example in this model random number was assumed for the number of epoch in the first iteration, then after extracting the accuracy curve. The point of intersection of the precision and loss curves will indicate to the best number of epoch in the next training session. Therefore, the best weights and high accuracy are obtained and giving the classification system a correct prediction in most tests. As shown to these graphs in Figures (7) which documented epoch between (2.5 and 5.0) was the strongest as it was the intersection point between accuracy and loss curves.

Plotting the confusion matrix for the result.
There are two perfect ways to see how well the system can predict or classify. First one is the classification metrics. As shown in Figure (8) and the second way is the confusion matrix as shown in Figure (9) explained that precision and f1-score are major factors here and a higher score means a better pattern.

Saving and Loading the Trained weights and model.
Keeping and loading the weights and the trained models. This step enables to keep the trained weights and the trained model in the device so that enable predict the images that want to test and not start from scratch, and this facilitates effort and shortens the test time.

DISCUSSION
The result of this paper relies on what aim needs to be accomplished. Those parameters used a convolutional neural network (CNN) to play their roles in assessing the accuracy of image classification. The result was checked by classifying five class types that showed high accuracy to the implementation of the image classification system using CNN. This happened because of the amount of the data that was used, and that includes of 4 classes that were configured and downloaded from Kaggle and a class for SAR images (house class) that we have configured for their limitations, the preparing to house class start by capture the images, then extract the desired objects to be expected and employed as posters ready for training, verification, and testing in the system. Also, the data set used to train the model. Of course, CNN worked fantastically because there was a lot of data. In addition to its ability to predict images captured with two different types of Sensors of Remote Sensing Passive Sensors (such as a photographic camera for example cats class), Active Sensors (such as SAR for example houses class) both of which achieved the possibility of high accuracy in predicting them. If the system was validated with images that were not used during the training model and were not in one of the categories of the system dataset, errors would occur after the classification has been carried out. The errors mean that they cannot recognize untrained images by untrained systems. Finally, when the model size becomes smaller, the training session takes a short time to complete, but the accuracy ratio may be slightly lower compared to the large size of the training model. As for the model used in this paper shown in Figure (5) that Clarified the training session lasted 15.74 seconds, and the accuracy is 91.18%, thus this model success by its speed in training and reaching high accuracy.

CONCLUSIONS
To conclude, this research for image classification and overcome the difficulties of CNN's deep training resulting from dataset consist of limited SAR images by using a pre-trained model VGG16 for transfer learning which works as features extractor and it is an effective way to solve the problem of those who need to apply prediction from a limited dataset, then using CNN as a classifier. This target been has achieved where all of the collected tests showed very promising results. CNN technology has been applied in greater detail starting with the collection of training models until image classification. CNN period positions were capable of monitoring precision and avoiding any problems including overproduction. Implementing CNN using the TensorFlow system has also provided good results, as it can simulate, train, and classify the model until it becomes a trained model. This method has shortened a lot of effort and time in the classification of images instead of building a Convolutional Neural Network (CNN) from scratch that needs a lot of tuning and testing.