This blog-post is the subsequent part of my previous article where the fashion MNIST data-set was described. Also, we wrote data loader functions in the blog-post. In this article, we will focus on writing python implementation of fully connected neural network model using tensorflow. As discussed in the previous post, the fashion MNIST data-set consists of 10 classes like digit MNIST data-set.
So, let’s start with defining a python file “config.py” to assign values to parameters of the neural network.
# number of nodes in input layer n_input = 784 # number of nodes in 1st hidden layer n_hidden1 = 128 # number of nodes in 2nd hidden layer n_hidden2 = 128 # number of nodes in output layer n_class = 10 # number of epochs to run n_epoch = 20 # declaring learning rate learning_rate = 0.001 # Training batch size batch_size = 64
Now, we will define various functions to performs the following tasks.
- Building Neural Network (NN) Model
- Defining loss function
- Creating Optimizer
- One hot encoder
- Train, Validate & Test Model
1. Building Dense Neural Network
We will build a 2 hidden layered dense neural network. The output layer is dense layer of 10 nodes (as there are 10 classes) with soft-max activation. The architecture of dense neural network can be depicted in figure below. The figure is only for depiction and actual configuration like number of nodes and output classes can be seen in ‘config.py’.
Let us write a python function using tensorflow to build the model. This also computes the forward propagation step.
import numpy as np import tensorflow as tf from dataset import fashion_MNIST from util import config def model(batch_x): """ We will define the learned variables, the weights and biases, within the method ``model()`` which also constructs the neural network. The variables named ``hn``, where ``n`` is an integer, hold the learned weight variables. The variables named ``bn``, where ``n`` is an integer, hold the learned bias variables. """ b1 = tf.get_variable("b1", [config.n_hidden1], initializer = tf.zeros_initializer()) h1 = tf.get_variable("h1", [config.n_input, config.n_hidden1], initializer = tf.contrib.layers.xavier_initializer()) layer1 = tf.nn.relu(tf.add(tf.matmul(batch_x,h1),b1)) b2 = tf.get_variable("b2", [config.n_hidden2], initializer = tf.zeros_initializer()) h2 = tf.get_variable("h2", [config.n_hidden1, config.n_hidden2], initializer = tf.contrib.layers.xavier_initializer()) layer2 = tf.nn.relu(tf.add(tf.matmul(layer1,h2),b2)) b3 = tf.get_variable("b3", [config.n_class], initializer = tf.zeros_initializer()) h3 = tf.get_variable("h3", [config.n_hidden2, config.n_class], initializer = tf.contrib.layers.xavier_initializer()) layer3 = tf.add(tf.matmul(layer2,h3),b3) return layer3
2. Defining Loss Function
We will define a loss function for the error in this discrete classification task. It computes soft-max cross entropy between logits
and labels
. It measures the probability error between labels and logistic probability.
The equation that define multi-class cross entropy loss (also called Multi-nominal Logistic Loss) is given below
It can be implemented in tensorflow in these 2 steps.
1. Compute softmax function2. Compute cross entropy loss summed over all classes
y_softmax = tf.exp(logits) / tf.reduce_sum(tf.exp(logits), axis)
-tf.reduce_sum(y_true * tf.log(y_softmax), 1)
The above multi class entropy loss can be defined in tensorflow with the single function call tf.nn.softmax_cross_entropy_with_logits_v2
def compute_loss(predicted, actual): """ This routine computes the cross entropy log loss for each of output node/classes. returns mean loss is computed over n_class nodes. """ total_loss = tf.nn.softmax_cross_entropy_with_logits_v2(logits = predicted,labels = actual) avg_loss = tf.reduce_mean(total_loss) return avg_loss
3. Creating Optimizer
Here, we define a function to create an optimizer of our choice. You can explore about various optimization algorithms which compute weights gradients and update them in order to minimize loss. Few of them are Gradient Descent, Adam, Adagrad, Adadelta, Nestrov Adam, RMSProp etc.
def create_optimizer(): """ we will use the Adam method for optimization (http://arxiv.org/abs/1412.6980), because, generally, it requires less fine-tuning. """ optimizer = tf.train.AdamOptimizer(learning_rate=config.learning_rate) return optimizer
4. One hot encoding the labels
This is structuring our labels in order to feed it to output layers of neural network model. The labels are 1d array containing the index of object (among 10 objects, index 0-9). The output layer of model consists of 10 nodes. So, there is a need of passing the labels in vector form. For example, an image label “Shirt” (index 6) can be denoted in one hot encoded form as
[0, 0, 0, 0, 0, 0, 1, 0, 0, 0]
def one_hot(n_class, Y): """ return one hot encoded labels to train output layers of NN model """ return np.eye(n_class)[Y]
5. Train, Test & Validate the model
After defining all the supporting functions, now we will train the network and evaluate the loss and accuracy on train and validation set in epochs. Finally, we evaluate the accuracy on test set.
Comments are provided in each line of the below code snippet for clarity. It is important to understand the following parts in codes.
1. Computing variables inside the tensorflow session.
2. Diving the training data in batches.
3. Evaluating accuracy.
I would suggest going back to Part 1 of this blog-post for understanding how tensorflow works.
def train(X_train, X_val, X_test, y_train, y_val, y_test, verbose = False): """ Trains the network, also evaluates on test data finally. """ # Creating place holders for image data and its labels X = tf.placeholder(tf.float32, [None, 784], name="X") Y = tf.placeholder(tf.float32, [None, 10], name="Y") # Forward pass on the model logits = model(X) # computing sofmax cross entropy loss with logits avg_loss = compute_loss(logits, Y) # create adams' optimizer, compute the gradients and apply gradients (minimize()) optimizer = create_optimizer().minimize(avg_loss) # compute validation loss validation_loss = compute_loss(logits, Y) # evaluating accuracy on various data (train, val, test) set correct_prediction = tf.equal(tf.argmax(logits,1), tf.argmax(Y,1)) accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float")) # initialize all the global variables init = tf.global_variables_initializer() # starting session to actually execute the computation graph with tf.Session() as sess: # all the global varibles holds actual values now sess.run(init) # looping over number of epochs for epoch in range(config.n_epoch): epoch_loss = 0. # calculate number of batches in dataset num_batches = np.round(X_train.shape[0]/config.batch_size).astype(int) # looping over batches of dataset for i in range(num_batches): # selecting batch data batch_X = X_train[(i*config.batch_size):((i+1)*config.batch_size),:] batch_y = y_train[(i*config.batch_size):((i+1)*config.batch_size),:] # execution of dataflow computational graph of nodes optimizer, avg_loss _, batch_loss = sess.run([optimizer, avg_loss], feed_dict = {X: batch_X, Y:batch_y}) # summed up batch loss for whole epoch epoch_loss += batch_loss # average epoch loss epoch_loss = epoch_loss/num_batches # compute validation loss val_loss = sess.run(validation_loss, feed_dict = {X: X_val ,Y: y_val}) # display within an epoch (train_loss, train_accuracy, valid_loss, valid accuracy) if verbose: print("epoch:{epoch_num}, train_loss: {train_loss}, train_accuracy: {train_acc}, val_loss: {valid_loss}, val_accuracy: {val_acc} ".format( epoch_num = epoch, train_loss = round(epoch_loss,3), train_acc = round(float(accuracy.eval({X: X_train, Y: y_train})),2), valid_loss = round(float(val_loss),3), val_acc = round(float(accuracy.eval({X: X_val, Y: y_val})),2) )) # calculate final accuracy on never seen test data print ("Test Accuracy:", accuracy.eval({X: X_test, Y: y_test})) sess.close()
The Main Function
This is where we build the pipeline of loading data-set, visualizing the images, training and evaluating the model. Majorly the steps are:
1. Instantiating the data-set class
2. Loading data-set
3. Visualize few samples
4. One hot encode the labels
5. Train, validate and test the NN model.
def main(_): # Instantiating the dataset class fashion_mnist = fashion_MNIST.Dataset(data_download_path='../data/fashion', validation_flag=True, verbose=True) # Loading the fashion MNIST data X_train, X_val, X_test, Y_train, Y_val, Y_test = fashion_mnist.load_data() # Showing few exapmle images from dataset in 2D grid fashion_mnist.show_samples_in_grid(w=10,h=10) # One hot encoding of labels for output layer training y_train = one_hot(config.n_class, Y_train) y_val = one_hot(config.n_class, Y_val) y_test = one_hot(config.n_class, Y_test) # Let's train and evaluate the fully connected NN model train(X_train, X_val, X_test, y_train, y_val, y_test, True) if __name__ == '__main__' : tf.app.run(main)
The results of the execution is shown in the below screenshot.
Final Thoughts
Hope it was easy to follow this series of tutorial on building tensorflow model from scratch. If you are totally unaware of neural networks then probably it will be a little burdensome to follow. I would suggest to first go through the basics of Neural Networks from the abundant material available online. Before signing off, few more thoughts to the post are:
- A dense neural network with 2 hidden layer is employed for demonstration purpose. I would encourage readers to implement deeper models.
- The model executed on PC with Nvidia GeForce GTX 1080 Ti GPU (11 GB) and 32 GB RAM. It took hardly 30 seconds to train the NN model.
- For most of the computer vision applications, CNN architectures are very popular. I am looking forward to experiment on the same task with these architectures. I would encourage readers to try out the same.
- Exploring other ways like transfer learning may produce better results. Readers must try transfer learning by using a pre-trained model and retraining the last few layers only.
You can get the full python implementation from GitHub link here.
If you liked the post, follow this blog to get updates about upcoming articles. Also, share it so that it can reach out to the readers who can actually gain from this. Please feel free to discuss anything regarding the post. I would love to hear feedback from you.
Happy deep learning 🙂
One thought on “Tensorflow Tutorial from Scratch : Building a Deep Learning Model on Fashion MNIST Dataset (Part 2)”