This blog-post is the subsequent part of my previous article where the data-set was described and we wrote data loader functions. In this article, we will focus on writing python implementation of fully connected neural network model using tensorflow. As discussed in the previous post, the fashion MNIST data-set consists of 10 classes like digit MNIST data-set.

So, let’s start with defining a python file “” to assign values to parameters of the neural network.

# number of nodes in input layer
n_input = 784
# number of nodes in 1st hidden layer
n_hidden1 = 128
# number of nodes in 2nd hidden layer
n_hidden2 = 128
# number of nodes in output layer
n_class = 10
# number of epochs to run
n_epoch = 20
# declaring learning rate
learning_rate = 0.001
# Training batch size
batch_size = 64

Now, we will define various functions to performs the following tasks.

  1. Building Neural Network (NN) Model
  2. Defining loss function
  3. Creating Optimizer
  4. One hot encoder
  5. Train, Validate & Test Model

1. Building Dense Neural Network

We will build a 2 hidden layered dense neural network. The output layer is dense layer of 10 nodes (as there are 10 classes) with soft-max activation. The architecture of dense neural network can be depicted in figure below. The figure is only for depiction and actual configuration like number of nodes and output classes can be seen in ‘’.

Dense neural network

Let us write a python function using tensorflow to build the model. This also computes the forward propagation step.

import numpy as np
import tensorflow as tf
from dataset import fashion_MNIST
from util import config

def model(batch_x):

    We will define the learned variables, the weights and biases,
    within the method ``model()`` which also constructs the neural network.
    The variables named ``hn``, where ``n`` is an integer, hold the learned weight variables. 
    The variables named ``bn``, where ``n`` is an integer, hold the learned bias variables.

    b1 = tf.get_variable("b1", [config.n_hidden1], initializer = tf.zeros_initializer())
    h1 = tf.get_variable("h1", [config.n_input, config.n_hidden1],
                         initializer = tf.contrib.layers.xavier_initializer())
    layer1 = tf.nn.relu(tf.add(tf.matmul(batch_x,h1),b1))

    b2 = tf.get_variable("b2", [config.n_hidden2], initializer = tf.zeros_initializer())
    h2 = tf.get_variable("h2", [config.n_hidden1, config.n_hidden2],
                         initializer = tf.contrib.layers.xavier_initializer())
    layer2 = tf.nn.relu(tf.add(tf.matmul(layer1,h2),b2))

    b3 = tf.get_variable("b3", [config.n_class], initializer = tf.zeros_initializer())
    h3 = tf.get_variable("h3", [config.n_hidden2, config.n_class],
                         initializer = tf.contrib.layers.xavier_initializer())

    layer3 = tf.add(tf.matmul(layer2,h3),b3)

    return layer3

2. Defining Loss Function

We will define a loss function for the error in this discrete classification task. It computes soft-max cross entropy between logits and labels. It measures the probability error between labels and logistic probability.

The equation that define multi-class cross entropy loss (also called Multi-nominal Logistic Loss) is given below

Multi-class entropy loss

It can be implemented in tensorflow in these 2 steps.

1. Compute softmax function
= tf.exp(logits) / tf.reduce_sum(tf.exp(logits), axis)
2. Compute cross entropy loss summed over all classes
-tf.reduce_sum(y_true * tf.log(y_softmax), 1)

The above multi class entropy loss can be defined in tensorflow with the single function call tf.nn.softmax_cross_entropy_with_logits_v2

def compute_loss(predicted, actual):
    This routine computes the cross entropy log loss for each of output node/classes.
    returns mean loss is computed over n_class nodes.

    total_loss = tf.nn.softmax_cross_entropy_with_logits_v2(logits = predicted,labels = actual)
    avg_loss = tf.reduce_mean(total_loss)
    return avg_loss

3. Creating Optimizer

Here, we define a function to create an optimizer of our choice. You can explore about various optimization algorithms which compute weights gradients and update them in order to minimize loss. Few of them are Gradient Descent, Adam, Adagrad, Adadelta, Nestrov Adam, RMSProp etc.

def create_optimizer():
    we will use the Adam method for optimization (,
    because, generally, it requires less fine-tuning.

    optimizer = tf.train.AdamOptimizer(learning_rate=config.learning_rate)
    return optimizer

4. One hot encoding the labels

This is structuring our labels in order to feed it to output layers of neural network model. The labels are 1d array containing the index of object (among 10 objects, index 0-9). The output layer of model consists of 10 nodes. So, there is a need of passing the labels in vector form. For example, an image label “Shirt” (index 6) can be denoted in one hot encoded form as

[0, 0, 0, 0, 0, 0, 1, 0, 0, 0]

def one_hot(n_class, Y):
    return one hot encoded labels to train output layers of NN model
    return np.eye(n_class)[Y]

5. Train, Test & Validate the model

After defining all the supporting functions, now we will train the network and evaluate the loss and accuracy on train and validation set in epochs. Finally, we evaluate the accuracy on test set.

Comments are provided in each line of the below code snippet for clarity. It is important to understand the following parts in codes.

1. Computing variables inside the tensorflow session.
2. Diving the training data in batches.
3. Evaluating accuracy.

I would suggest going back to Part 1 of this blog-post for understanding how tensorflow works.

def train(X_train, X_val, X_test, y_train, y_val, y_test, verbose = False):
    Trains the network, also evaluates on test data finally.
    # Creating place holders for image data and its labels
    X = tf.placeholder(tf.float32, [None, 784], name="X")
    Y = tf.placeholder(tf.float32, [None, 10], name="Y")

    # Forward pass on the model
    logits = model(X)

    # computing sofmax cross entropy loss with logits
    avg_loss = compute_loss(logits, Y)

    # create adams' optimizer, compute the gradients and apply gradients (minimize())
    optimizer = create_optimizer().minimize(avg_loss)

    # compute validation loss
    validation_loss = compute_loss(logits, Y)

    # evaluating accuracy on various data (train, val, test) set
    correct_prediction = tf.equal(tf.argmax(logits,1), tf.argmax(Y,1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))

    # initialize all the global variables
    init = tf.global_variables_initializer()

    # starting session to actually execute the computation graph
    with tf.Session() as sess:

        # all the global varibles holds actual values now

        # looping over number of epochs
        for epoch in range(config.n_epoch):

            epoch_loss = 0.

            # calculate number of batches in dataset
            num_batches = np.round(X_train.shape[0]/config.batch_size).astype(int)

            # looping over batches of dataset
            for i in range(num_batches):

                # selecting batch data
                batch_X = X_train[(i*config.batch_size):((i+1)*config.batch_size),:]
                batch_y = y_train[(i*config.batch_size):((i+1)*config.batch_size),:]

                # execution of dataflow computational graph of nodes optimizer, avg_loss
                _, batch_loss =[optimizer, avg_loss],
                                                       feed_dict = {X: batch_X, Y:batch_y})

                # summed up batch loss for whole epoch
                epoch_loss += batch_loss
            # average epoch loss
            epoch_loss = epoch_loss/num_batches

            # compute validation loss
            val_loss =, feed_dict = {X: X_val ,Y: y_val})

            # display within an epoch (train_loss, train_accuracy, valid_loss, valid accuracy)
            if verbose:
                print("epoch:{epoch_num}, train_loss: {train_loss}, train_accuracy: {train_acc}, val_loss: {valid_loss}, val_accuracy: {val_acc} ".format(
                                                       epoch_num = epoch,
                                                       train_loss = round(epoch_loss,3),
                                                       train_acc = round(float(accuracy.eval({X: X_train, Y: y_train})),2),
                                                       valid_loss = round(float(val_loss),3),
                                                       val_acc = round(float(accuracy.eval({X: X_val, Y: y_val})),2)

        # calculate final accuracy on never seen test data
        print ("Test Accuracy:", accuracy.eval({X: X_test, Y: y_test}))

The Main Function

This is where we build the pipeline of loading data-set, visualizing the images, training and evaluating the model. Majorly the steps are:

1. Instantiating the data-set class
2. Loading data-set
3. Visualize few samples
4. One hot encode the labels
5. Train, validate and test the NN model.

def main(_):

    # Instantiating the dataset class
    fashion_mnist = fashion_MNIST.Dataset(data_download_path='../data/fashion', validation_flag=True, verbose=True)

    # Loading the fashion MNIST data
    X_train, X_val, X_test, Y_train, Y_val, Y_test = fashion_mnist.load_data()

    # Showing few exapmle images from dataset in 2D grid

    # One hot encoding of labels for output layer training
    y_train =  one_hot(config.n_class, Y_train)
    y_val = one_hot(config.n_class, Y_val)
    y_test = one_hot(config.n_class, Y_test)

    # Let's train and evaluate the fully connected NN model
    train(X_train, X_val, X_test, y_train, y_val, y_test, True)

if __name__ == '__main__' :

The results of the execution is shown in the below screenshot.


Final Thoughts

Hope it was easy to follow this series of tensorflow tutorial. If you are totally unaware of neural networks then probably it will be a little burdensome to follow. I would suggest to first go through the basics of Neural Networks from the abundant material available online. Before signing off, few more thoughts to the post are:

  1. I have employed a dense neural network with 2 hidden layer for demonstration purpose. I would encourage readers to implement deeper models.
  2. I ran the model on PC with Nvidia GeForce GTX 1080 Ti GPU (11 GB) and 32 GB RAM. It took hardly 30 seconds to train the NN model.
  3. For most of the computer vision applications, CNN architectures are very popular. I am looking forward to experiment on the same task with these architectures. I would encourage readers to try out the same.
  4. Exploring other ways like transfer learning may produce better results. Readers must try transfer learning by using a pre-trained model and retraining the last few layers only.

You can get the full python implementation from GitHub link here.

If you liked the post, follow this blog to get updates about upcoming articles. Also, share it so that it can reach out to the readers who can actually gain from this. Please feel free to discuss anything regarding the post. I would love to hear feedback from you.

Happy deep learning 🙂