This post is a simple yet illustrative application of K-means clustering technique. Using K-means clustering, we will perform quantization of colours present in the image which will further help in compressing the image.
In a coloured image, each pixel is of size 3 bytes (RGB), where each colour can have intensity values from 0 to 255. Following combinatorics, the total number of colours which can be represented are 256*256*256. Practically, we are able to visualize only a few colours in an image. Shown below is an image of 1280 x 720 pixels taking 1.71 MB in PNG format. PNG is a lossless compression technique for images. Our objective is to compress the image further using colour quantization, though the compression will be lossy.
K-means clustering
It is basically an optimization algorithm to find ‘k’ clusters in the given set of data points. Initially, it randomly assigns k-cluster centers and then on the basis of some distance metric (for example, euclidean distance) it aims to minimize within cluster sum of squared distance of the data points from the cluster center. There are two steps in k-means clustering algorithm:
a) Assignment step – Each data point is assigned to the cluster whose center is nearest to it.
b) Update step – New means (centroids) are calculated from the data points assigned to the new clusters.
Just to give you an idea of clustering data points, Below picture has been taken from internet to depict data points before and after K-means clustering.
In our problem of image compression, K-means clustering will group similar colours together in ‘k’ clusters (say ‘k’ = 128). Therefore, the centroid of each cluster is representative of the 3 dimensional colour vectors (RGB) falling in the respective cluster. By now you might have understood what are we trying to do. These ‘k’ centroids will replace all the colour vectors in their clusters, thereby keeping only ‘k’ colour combinations for the whole image. Thus, we need to keep only the label of each pixel in the image that tells about the cluster in which that pixel falls. Also, we keep the ‘k’ centroids as codebook which are the only colours seen in the compressed image.
Compression
We will write a simple python code to compress the image and store the compressed image along with the code book. The compressed image saved here is nothing but the cluster label of each pixel of the original image. Codebook is the fancy name given to the list of cluster centers (3-d RGB) achieved after running k-means algorithm. Afterwards, both the arrays (the cluster labels and the codebook) are saved in data type ‘unsigned integer’ as the range of intensity values (0-255) and value of ‘k’ is always going to be less than 255 . The code given below does all this.
from skimage import io from sklearn.cluster import KMeans import numpy as np image = io.imread('tiger.png') io.imshow(image) io.show() rows = image.shape[0] cols = image.shape[1] image = image.reshape(image.shape[0]*image.shape[1],3) kmeans = KMeans(n_clusters = 128, n_init=10, max_iter=200) kmeans.fit(image) clusters = np.asarray(kmeans.cluster_centers_,dtype=np.uint8) labels = np.asarray(kmeans.labels_,dtype=np.uint8 ) labels = labels.reshape(rows,cols); np.save('codebook_tiger.npy',clusters) io.imsave('compressed_tiger.png',labels);
We can select ‘k’ sufficient enough to represent the colours of image well. Here, ‘k’ has been chosen as 128. This means all the colour combinations in the original image have been quantized to 128 distinct colours only. These colours will be present in reconstructed image (after decompression) and it should be visually similar to original image.
Decompression
We also need to decompress the image in order to visualise the reconstructed image which is obviously an outcome of lossy compression performed. Below code does the decompression by assigning the 3-d colours from the code book to the each pixel depending upon its label.
from skimage import io import numpy as np centers = np.load('codebook_tiger.npy') c_image = io.imread('compressed_tiger.png') image = np.zeros((c_image.shape[0],c_image.shape[1],3),dtype=np.uint8 ) for i in range(c_image.shape[0]): for j in range(c_image.shape[1]): image[i,j,:] = centers[c_image[i,j],:] io.imsave('reconstructed_tiger.png',image); io.imshow(image) io.show()
We can see the reconstructed image after decompression below. Though the reconstructed image has lost a lot of pixel colour information but still you won’t find any major difference visually.
Also, you can visualise these 128 colours found in the reconstructed image by viewing the colours in the codebook separately (may be by displaying mono-coloured square box). These colours are the centroids of clusters formed after performing k-means on original image.
Caution
- If you will try to compress a ‘jpeg’ image in exactly the same way as followed in the blog-post, you will incur errors as jpeg does a lossy compression. Compression algorithm of jpeg changes the intensity values of the pixel, so the pixels in the compressed image containing the label may become more than ‘k’ which leads to error.
- K-means algorithm is an optimization problem of finding the clusters in the given data-set. Execution time increases as the image dimensions increases or ‘K’ increases. So, initially you can start with a lesser value of ‘k’ in order to quickly get results.
- There is a trade off between the execution time and the number of colours represented in reconstructed image. Higher ‘k’ will produce better quality of compressed image but will take longer to execute.
Conclusion
You can check the disk space taken by the images which were considered in the blog-post here. I have posted the snapshot of working directory. The original png image was of 1757 KB (tiger.png) whereas the compressed tiger image and codebook are of only 433 KB all together. The reconstructed image is also taking less space because png runs its own compression algorithm. As there are only 128 unique colours now, png is able to get compression ratio of more than 2.
One can conclude that the compression applied here is done only by reducing the number of colours in the image which is also called as Colour Quantization. We have not reduced either the size of image or the intensity ranges of pixels.
Hope it was easy to follow this blog-post. The full python implementation of image compression with K-means clustering can be found on Github link here.
If you liked the post, follow this blog to get updates about the upcoming articles. Also, share this article so that it can reach out to the readers who can actually gain from this. Please feel free to discuss anything regarding the post. I would love to hear feedback from you.
Happy machine learning 🙂
Why no response is shown on the screen when inputting “kmeans.fit(image)”? Does it consume too much time with k=128?
Like
Yes, for 128 clusters it would take lot of time. Though it’s been a long time, I remember it took around 20 minutes on my laptop.
Like
aha, maybe i should wait for it patiently. At first i guess it will take couple of hours and i can’t finish it before shutdown
Like
I mean, can you evaluate the time when k is as big as 128, like how many hours approximately does it need to execute
Like
Umm….You need to check that. I am sure it would take a lot of time.
You can fix the number of iterations or number of init of kmeans training algorithm to be less inorder to execute it fast.
Like
thx for your answer, I’m sorry that i didn’t see your reply at first
Like
Hi Abhijeet,
I tried running your code and I’m getting compressed grey scale image.
Can you please help?
Like
Can you please provide more detail ?
Like
Yes, I also got grayscale image after compression. How can we get colored image?
Like
You have to pay attention to the variable dimensions. By doing:
”’
clusters = np.asarray(kmeans.cluster_centers_,dtype=np.uint8)
labels = np.asarray(kmeans.labels_,dtype=np.uint8 )
labels = labels.reshape(rows,cols);
”’
You’ll get the variable labels as [rows x cols] dimensions, right? But for a colored image, you should have 3 “channels”, so [rows x cols x channel_colors]. In order to correct this, you can achieve this through Numpy’s broadcasting clusters[labels]:
”’
clusters = np.asarray(kmeans.cluster_centers_, dtype=np.uint8)
labels = np.asarray(kmeans.labels_,dtype=np.uint8)
colored_image = clusters[labels.reshape(linhas, colunas)]
”’
Like
Hello,
Can this be done as a college final year project? As a beginner in machine learning, I am searching for a method to implement image compression which one would be the best? Please suggest
Liked by 1 person
Yes, definitely it would be a good exercise. You can extend this more to make a proper compression software.
Like
Can this be used as final year project? As a beginner in machine learning, I am searching for a method to implement image compression. Can you please suggest me how should I do it?
Liked by 1 person
Yes, You can extend this one or make comparative study of different techniques of compression.
Like
Can I compress multiple images present in a folder using k means clustering?? Can you help me with this?
Like
It is very simple. Loop through all the images of your folder and apply the same way of compressing single image.
You may like to take lower value of K otherwise it would be very slow.
Like
Can K means clustering be used to compress the multiple images present in a folder?
Like
image = image.reshape(image.shape[0] * image.shape[1], 3)
ValueError: cannot reshape array of size 1093824 into shape (273456,3)
Why I am getting this error with the compression code
Like
Yes, there is mismatch in dimensions. Can you just print the dimensions of image and image array size to see if they are matching ?
Like
for f in os.listdir(‘.’):
if f.endswith(‘.png’):
image = io.imread(f)
rows = image.shape[0]
cols = image.shape[1]
I tried for loop as given but it gives an error: MemoryError. Does K means work well with around 40 image at a time?
Like
This is amazing man! I’m citing you in my lightning talk that will now be amazing too. Thanks!
Liked by 1 person
Hello,
I ran the code with a tif image with resolution of 8181*7221, and it appeared:
ValueError: cannot reshape array of size 59075001 into shape (59075001,3)
Why I am getting this error?
Like
Hi,
It occurs to me that you are using a gray scale image but the codes used in the blog is written for colored images.
For colored images, there are 3 channels i.e, RGB.
Thanks.
Like
Hello,
I ran this code with a tif image with resolution of 8181*7221, but a error appeared:
ValueError: cannot reshape array of size 59075001 into shape (59075001,3)
Why I am getting this error?
Thank you very much.
Like
Hello,
I need to convert images to colored images first, right?
Thank you very much.
Like
May be not, You may want to modify the python codes for compressing grey scale images.
for example:
image = np.zeros((c_image.shape[0],c_image.shape[1],3),dtype=np.uint8 )
would become
image = np.zeros((c_image.shape[0],c_image.shape[1]),dtype=np.uint8 )
Similarly, everywhere the code uses 3 channels, you can modify it to 1 channel and check if it works for you.
Also, tiff images are of large sizes, downsize it so that K-means can run in less time, otherwise system will take huge time to complete the execution.
Like
Hello,
I modified the code in your method. I ran it with Lena.bmp 258KB) and it worked. But for a larger image of 57,748KB it appeared error:
Traceback (most recent call last):
File “compress.py”, line 56, in
kmeans.fit(image)
File “/home/zsf/anaconda2/envs/tensorflow/lib/python3.6/site-packages/sklearn/cluster/k_means_.py”, line 972, in fit
return_n_iter=True)
File “/home/zsf/anaconda2/envs/tensorflow/lib/python3.6/site-packages/sklearn/cluster/k_means_.py”, line 381, in k_means
random_state=random_state)
File “/home/zsf/anaconda2/envs/tensorflow/lib/python3.6/site-packages/sklearn/cluster/k_means_.py”, line 445, in _kmeans_single_elkan
max_iter=max_iter, verbose=verbose)
File “sklearn/cluster/_k_means_elkan.pyx”, line 150, in sklearn.cluster._k_means_elkan.k_means_elkan
MemoryError
Thank you very much.
Like
Clearly “MemoryError”.
RAM is getting exhausted. Thanks
Like
Hello,
Right. I know it is a memory error. I am revising the code to deal with this error and not yet complete.
I tried to read the big original image or the image(nd.array) with chunksize, but it not worked. Do you have any suggestion?
Thank you very much.
Like
NICE BLOG, THANKS FOR SHARING
Like