Visualizing Color Usage
Our goal of this post is to implement color quantization in R, compare several methods of detecting segments of an image, and start creating the visualization UI. It’s no surprise that this topic naturally lends itself to some preliminary data exploration and visualization, and R makes generating some really cool graphics easy.
Implementing Color Quantization in R
Quantization involves expressing a range of values by one quantum value. In images, it is the
quantization of color spaces, such as RGB, CMYK,HSV, etc. RGB is one of the most popular color representation models. Each pixel has three channel values: red, green, and blue. Each channel has a value for its intensity (e.g., an integer between 0 and 255 for an 8-bit color representation). An image’s pixels are rendered by combining intensity values from three RGB channels to produce a color value.
In color quantization, this is achieved by reducing the number of colors and compressing an entire color range into one specific color. Using Edvard Munch’s famous Scream as an example, we applied k-means, median-cut, and hierarchical tree-based algorithms to partition an image and extract the top n dominant colors using the following:
- kmeans – We built the k-means clustering algorithm using the kmeans function in the stats package.
- median-cut – We used image_palette function in the RImagePalette, which is a pure R implementation of the median cut algorithm.
- Oct-tree based – We used image_quantize function in the magick package. Based on its oct-tree algorithm, we built a function get_my_colors to get k number of colors of the image and returns the top hex values for simpler visualization.
We applied k-means to our data frame using different k numbers. Replacing the color of each pixel in the image with the mean R, G, and B values of the cluster in which the pixel resides yields the following results. Figure 3 below shows the segmented image when k=2 and k=6, i.e., breaking the image into 2 color regions or 6 color regions.
Median Cut and Oct-tree Based Algorithms
We got similar top colors by applying the median cut and tree algorithms as shown in Figure 4 below. Both of those two algorithms ran significantly faster than k-means. However, there are some disadvantages those methods have. In median cut, each cluster is equally weighted, while tree-based method doesn’t necessarily always returns the specified number of k clusters.
Considering the computational expense of k-means algorithm and the disadvantage of median cut not being able to rank color dominance, we picked to use the magick package going forward. This package provides a modern and simple image processing toolkit in R. A core component of it is ImageMagick STL , a leading open-source image processing library. By utilizing oct-trees, ImageMagick surveys the colors in an image using a technique known as “Adaptive Spatial Subdivision”. The colors will then be chosen according to a specific image.
Data Wrangling using the Budget Collector Dataset
As a next step, we downloaded the Budget Collector dataset from AirTable by exporting the file as a .csv with images stored in our GitHub repository. We then looped through all of the 396 images in the dataset and applied get_my_colors function with k=6. Our function returns a data frame with the top 6 colors of each image while keeping the UID as the primary lookup key. Lastly, we merged the csv file exported from AirTable and the top 6 colors returned from our function by UID.
Rudimentary Shiny App
We started building out a front-end UI interface using Shiny based on the processed Budget Collector dataset so that we can play with the data and explore ways to better visualize the color information. Currently the UI contains widgets that enables selection of time period and region to slice and dice the data that feeds the visuals. We also intend to add more slicers and radio bottoms to enable selection of artist, style, and media in the next phase.
I’m a data scientist working at a management consulting firm based in the US. I started Georgia Tech’s MS Analytics program in 2019. I’ve been really enjoying the program and have been working on it outside of my full-time job. Outside of work and study, I enjoy outdoor activities such as hiking, camping, kayaking, and paddleboarding.