Building Personal Deep Learning System from Scratch: On-Premise Vs Cloud Comparison

Building custom deep learning system

Much recently, I built a system for pursuing deep learning (DL) exercises. Let me give a heads-up that it is purely based on requirement or use cases on what one wants to achieve. For example:

  1. If you want to develop large scale AI applications then the hardware configuration would be high-end server based with multi GPU system (mostly for commercial applications).
  2. If you want to learn, develop prototypes and experiment with open source data-sets a lot (mostly for personal usage) then a single GPU based system would suffice.
  3. For a novice, who has started to get into deep learning, I would suggest to get started with free resources like Google Colab or Kaggle Kernels.

I am not a big fan of cloud based DL as it is expensive if you are using it for personal/research purposes. At times, we need to run lot of experiments in order to conclude or come up with an accurate model. Going for cloud based GPUs may incur huge cost. A comparison between AWS cloud and on-premise DL system has been penned down in last section of this blog-post.

Through this concise blog-post, I will share my personal experience of building a deep learning based system which may help people who want to purchase the same for personal usage. This hardware set up is intended for learning, prototyping and research purposes.

1. Configuration

We will see each component individually which are required for building a deep learning desktop.

Processor:

I purchased Intel Core i7 8700 processor. This processor is 12 hyper-threaded with 6 physical cores. With base frequency of 3.20 GHz and maximum turbo frequency of 4.60 GHz, this one would be a good choice in Intel family. Also, for a multi-GPU based system, one should consider processor like Intel Xeon Processor E5-2630 v4 which is 20 hyper-threaded with 10 physical cores.

It is important to note that Intel Core i7 8700 has 16 PCI Express lanes which would be sufficient for single GPU. Processor and graphic card (GPU) communicate or push data over 1X16 or 2X8 lanes. For dual-GPU system, It is better to have Intel Xeon E5 V4 processor as it has 40 PCI lanes and it can accommodate parallel communications with GPUs. 

For my usage, I went with Intel core I7 8700 processor as I built a single GPU system and had budget constraints. Also, there are cheaper multi-core AMD processors which one can consider but there are lots of packages, libraries or software which are Intel compatible by default (not AMD compatible by default).

GPU:

Nvidia GTX-1080 Ti. Depending on your budget, one can purchase GPU. I would suggest to get at least GTX 1080 (Video RAM 8GB) in order to set up deep learning experiments. I got ZOTAX MINI GTX 1080TI (Video RAM 11 GB).

Motherboard:

I bought “MSI Z 370 PC PRO” which supports Intel 8th gen processors along with 2 Graphics card slots.  The board comes with two PCIe x16 slots, of which one comes with 16 lanes connected to the chipset (processor), while the other uses four lanes. So, it offers the option of installing multiple graphics cards. The extra PCI express lanes accommodates communication with other peripherals like SSD cards.

Cabinet

Bought “Cooler Master 590 III Black Window” computer case. Pre-installed with front blue LED 120mm x 2 Fan Support. This would be good enough to accommodate for single GPU based system.

SMPS:

SMPS (switched-mode power supply) is an electronic power supply that incorporates a switching regulator to convert electrical power efficiently. 500W SMPS would be sufficient enough to regulate power supply for RAM (10W), GPU(250W), I7 processor (70W), SSD, HDD and other circuits in Motherboard. I purchased “Cooler Master GOLD 750W” SMPS.

RAM:

I bought 32 GB RAM consisting two Corsair RAM 16GB DDR4 3000MHz.

SSD:

One of the important recommendations for deep learning system is solid state drive because it fastens read write operations from disk. One may have to work on experiments where you need to access a million audios/images from disk. Moreover, It is recommended to install operating system on SSD. Also, as obvious, all the working data-sets can be kept in SSD. One can keep the other archived data-sets in HDD.

The two types of SSD are SATA based SSD and PCI-E based SSD NVME. NVMe drives are much faster than SATA SSD because it uses the PCI-E bus which has greater speed. I bought Western Digital 240GB SATA SSD (did not know about SATA vs NVME difference at that time). One can purchase 500GB NVME SSD if budget allows.

Hard-Disk:

Purchased Western Digital 1TB HDD. Again, if budget allows, go for 2 TB. It’s worth it as there are large data-sets out there which occupies HDD quickly.

Monitor:

As per choice you can take one. I bought “L.G 22 INCH MONITOR”. Look out for monitors which can be connected to the system with HDMI cable. You may not want a monitor with only VGA port which has to be connected with VGA-to-HDMI converter in order to connect to the system cabinet.

UPS:

Got “APC UPS 1100VA”. For GPU based system, It is important to take at least 1100 VA which can give backup to 600W devices for 5 minutes. You will have enough time to safely abort the process and shutdown the system.

Finally, do not forget to include sound and Wi-Fi cards in your system.

2. Installations

Once all the installations are done, you may want to install linux operating system (I prefer ubuntu),  nvidia drivers, CUDA, CuDNN and finally the neural network frameworks like tensorflow or pytorch. Additionally, readers can follow these links to set up the deep learning systems from scratch.

  1. Ubuntu 18.04 Deep Learning Environment Setup
    2. Up and Running with Ubuntu, Nvidia, Cuda, CuDNN, TensorFlow, and PyTorch
  2. Github link

3. Cloud GPU Vs On-Premise GPU Comparison

Here is a comparison of running same task of training a “Deepspeech” BiLSTM model for automatic speech recognition on AWS cloud as well as my personal deep learning system. The data-set used was 260 hours of telephonic conversations and its transcripts from switchboard data-set. For bench-marking purpose, we kept all the parameters of the model same. For example, number of layers, number of hidden nodes in each layer and train, development, test batch size etc.

Performance

The task took approximately 20 minutes on GTX-1080Ti (above specified DL system in blog-post). Below screenshot shows the elapsed time.

On-premise performance

On-premise GTX-10801Ti (11 GB VRAM)

The task took approximately 10 minutes on AWS Cloud P2.8xlarge instance with 8 K-80 GPU cards in parallel and 32 vCPUs system. Below screenshot shows the same.

On Cloud performance

AWS Cloud P2.8xlarge (8 K-80 cards & 32 vCPUs)

We can conclude that AWS cloud P2.8Xlarge instance with 8 parallel K-80 cards are 2X faster than the on-premise single GPU GTX 1080Ti system. Note that we get only twice the performance even after using 8 GPU cards on AWS cloud.

Cost

The cost of running above AWS instance for 65 hours (up till now) is approximately 500 US dollars (It will be 1000 US dollars if using instance from ‘Asia Mumbai’ region).
The cost of building a DL system specified in this blog-post is 2500 US dollars only. It is including all the components mentioned.

In a months time, the cost incurred on AWS cloud will surpass the whole purchasing cost of DL system. In return, we are only getting twice the performance with AWS cloud instance. Note that life of on-premise system will be a resource for few years. I do concur that cloud would always be a faster option as far as the concern is deployment and setting up the machine.

If you liked the post, follow this blog to get updates about upcoming articles. Also, share it so that it can reach out to the readers who can actually gain from this. Please feel free to discuss anything regarding the post. I would love to hear feedback from you.

Happy deep learning 🙂

 

Advertisements

Leave a Reply