What’s Your Favorite Color?
To find the dominant colors in paintings, we need to extract the colors from the image. K-means clustering is an efficient way to find the dominant colors. In this article, we explain how k-means clustering is used for color extraction. In the next few weeks, we will use the dominant colors from the Art Institute of Chicago’s digital painting collection to analyze the change of dominant colors over time.
Image recognition is revolutionizing the way we consume and process information online. Deeply integrated into websites and applications, it allows us to understand visual data, in small and large quantities, like never before.
Color extraction is one of the most important and revolutionary possibilities offered by computer vision. The ability to identify and analyze colors in images offers many opportunities for businesses to make better use of their visual libraries, monetize them, and even increase in-store product sales.
By extracting colors from images, one can tag visualization keywords by color. This allows one to easily navigate through large visualization databases. Multicolor search is also a typical part of color extraction technology. This allows you to perform more complex color searches. This means that complex objects containing more than one dominant color can be identified. It also allows for multi-color filtering of image searches on databases and hosted websites in the color palette feature.
How is color extraction done?
There are two general types of graphics: raster and vector. Vector graphics are based on geometric shapes such as points, curves, polygons, etc. Because vector data is drawn with clear geometric patterns, it conveys information correctly and accurately. When you zoom in on a vector image on your device, the boundaries are always clear.
Raster graphics can be recognized as data that looks like arrays of pixels, the smallest unit of a point that contains color information. The number of pixels in an image is called the resolution. For the same image, high resolution (more pixels) usually means higher graphic quality and more detail. Most of the graphic analysis is based on raster analysis.
Each pixel in a raster image is assigned a color. Images can be viewed as a color dataset. All one must do is define the colors with numbers. For example, a color map is a method of representing one-dimensional color information. Each dot represents a specific color, while color palettes and swatches can be viewed as a two-dimensional color index.
While 1d and 2d are easy to understand, the 3d color space is widely used in practice. Common 3D color spaces are RGB, HSV, Lab, etc. Among all color spaces, RGB is the default color space for machine learning. Photoshop’s color picker is a great example of colors and codes for different color spaces.
The images we extract using The API from Art Institute of Chicago using the IIIF Image API 2.0 is understood by the computer as a M*N*3 array where each array element represents a pixel of the image in an RGB format. For each pixel, conveying [R,G,B] data the function that we use changes the data into [R,R,R], [G,G,G] and [B,B,B] respectively. The photo is then deconstructed into pixels using 3D RGB data points that we get and can be visualized in a 3-D axis.
Using K-means for Color Extraction
What is K-means clustering?
A common method for finding dominant colors is K-means clustering. K-means clustering is a popular unsupervised machine learning algorithm. Typically, unsupervised algorithms try and make inferences from a dataset where we lack labelled outcomes and group a collection of data points given their similarities. This method splits some data into k disjoint clusters, where points in the same cluster are considered more “similar” than points in different clusters.
To process the learning data, the K-means algorithm starts with a first group of randomly selected centroids, which are used as beginning points for every cluster. In our case, we represent an image as a dataset of pixels with each pixel represented in an [R,G,B] format, the K-means algorithm, randomly picks a group of pixels in the given dataset as centroids and then performs an iterative calculation to optimize the positions of the centroids. In our K-means clustering we define 8 clusters meaning we are trying to get the first 8 dominant colors in the image.
The K-means algorithm iteratively performs the following steps
- Take random 8 pixels from the image dataset as centroids
- Assign every other pixel in the image to its closest centroid. The closest centroid is defined as the centroid that has the minimum Euclidean distance between the datapoint and one of the centroids selected in step 1 above.
- Label the datapoints that belong to a centroid, all the data points that belong to a centroid are called a cluster.
- For each cluster, the algorithm finds a new centroid by calculation a new center from the points.
- Repeat steps 2,3 and 4 above until the centroids stop changing.
One of the problems with the K-means algorithm is that it converges to a local minimum which is highly sensitive to the initial set of pixels selected which are random in our case. Currently, we don’t have a means to overcome this by eliminating sub optimal points for initial partition. We might have to do a heuristic analysis to check if this does indeed change the dominant colors for the images.
An image as defined above is made of three channels: Red, Green and Blue, each pixel in the image is represented as a point which is a combination of the primary colors red, green and blue. If these colors are represented in a 3D space, we would get a three-dimensional vector with the s coordinates representing red, y coordinates, the color green and z coordinates, the color blue.
Since we don’t have labelled data, we use K-means which is an unsupervised learning algorithm to perform a pixel-wise vector quantization of the image and reducing the number of colors required to show the image. In our case, we represent each image in 8 colors called the dominant colors of the image. The K-means cluster centers thus forms our color palette which represent the 8 dominant colors that are used in each image.
In our approach, as show below, for one of the paintings, we plotted the pixels in a 3D scatter plot.
Clearly, here we can find clusters of dominant colors used in the picture. The histogram shows the extent of dominant colors used in the painting. Intuitively, we can conclude that the data points or pixels in the painting form groups, and some colors are denser than others in the painting. Looking at the painting below we can fairly conclude that our k-means algorithm is doing a good job picking the right dominant colors.
As part of our research, we compared k-means to Agglomerative Clustering which is a hierarchical clustering mechanism. This algorithm, combines and divides existing groups creating a hierarchical structure that reflects the order in which groups are merged or divided. The objects in an agglomerative technique, which creates a hierarchy by merging, originally belong to a list of singleton sets S1,…, S2, Sn. The pair of sets “Si, Sj” that will merge “cheapest” from the list will then be found using a cost function. Si and Sj are combined into Si U Sj, which is then substituted for Si and Sj in the list of sets. Once every object is in a single group, this process is repeated.
The Agglomerative Clustering method from sklearn has a number of parameters. But for our study we will use only the affinity and the linkage parameters. A table listing all conceivable combinations of this is provided below for the image we used in our k-means algorithm above. The links shown by the lines are “ward,” “complete,” “average,” and “single,” respectively. The columns are labeled “euclidean,” “l1,” “l2,” “manhattan,” and “cosine,” which stand for various affinities.
We ran a time complexity analysis on k-means and hierarchical clustering for 2 random images of different sizes and found that the hierarchical clustering eats up much more memory and is a lot slower than the k-means algorithm. Our results are based on running these algorithms to fit images to 8 clusters on a machine that has 16Gb of Ram and 7 cores to process.
There is not a lot of improvement over k-means while using hierarchical clustering. We were able to identify all the relevant colors in k-means and with the additional time that the hierarchical clustering take, we decided to stick to k-means clustering to get the dominant colors.
There are other clustering algorithms like self organization map and expected maximization algorithms which we haven’t explored since k-means is more than sufficient for our purposes.
The largest portion of the work performed by the group thus far has been the extraction of the dominant colors. The other two components of our final visualization will be region and time. In preparation for those, our team has also been focusing our efforts on formatting the region and time for the paintings to be in a standardized format. We will then be able to combine these three efforts into the visualization. We will create a time series visualization that correlates region and color. Art pieces belonging to specific time periods will be grouped together and then further categorized by the region in which they originated. This way, we can analyze the trend of dominant and secondary colors of paintings.
My name is Christine. I am completing my MS in Analytics from Georgia Tech in May of 2023. I have experience working as a music therapist in various mental healthcare facilities, and I transitioned into data analysis to contribute to researching and developing data-driven solutions to improve community health outcomes.
My name is Shivani Garg. I did my undergraduate education at Virginia Tech where I double majored in Computer Science and Computational Modeling & Data Analytics. Since then, I have been working as a Data Scientist at Booz Allen Hamilton and recently started working at PNC Bank. What I love most about this field is how it can be applied to so many different industries and contexts. This is what makes me excited to embark on this project with my team.
My Name is Vishwanath. I did my undergraduate degree in India where I majored in Computer Science and Engineering and then went on to get my master's in finance from Harvard University. I am currently working as a Quantitative Analyst, managing a quant team focusing on Environment Social and Governance factors and creating value-based investment portfolios.