Uncovering deep connections between color changes and global art movements
Last time, we decided to work on some mock-ups, so everyone understands what we want the visualization to look like. This would help us see how our aggregates will be represented. We’ll discuss how we chose our front-end framework based on our mockup evaluation. While choosing a front-end, we encountered a challenge with our colors. We initially were focused on the HSL (Hue Saturation and Luminance) values but experimented with RGB and pixel clustering methods after encountering some issues when plotting paintings.
We tested both real data programmed with Svelte, but also just wireframes using mock data.
- Mock Data Wireframes
- Power point visual ideas: (Left) squares that represent a color trend per period, the square size might vary depending on how many paintings are grouped within it. (Right) on-hover it samples some random paintings from that group along with metadata.
- Layer Cake D3 Wireframe: A colored dot per painting, the trend can be detected when many dots with similar colors are displayed together. This is better to still display outliers that might not fit a period trend. However, we need to be aware of periods with many paintings to sample.
- Real data Tested on Svelte + D3: Testing the capabilities of Svelte (our chosen framework) using real data. Also, it helped us estimate the workload to develop visualizations, to better assess project time
The framework that we chose for our frontend is D3 + Svelte. We decided this because we have experience with D3 (we all took CSE6242), the ease of data iteration, and the available template libraries like Carbon Design and LayerCake. Additionally, Svelte was created by Rich Harris when he was working at The New York Times and subsequently, The New York Times uses it for their world-class data
visualization as do many other newsrooms. Svelte was built with the goal of being used for data visualizations on the web.
Aggregating paintings by mean dominant color: Issues
We initially used HSL (Hue Saturation and Luminance) values to group paintings, with special focus on Hue. We noticed that aggregating these colors using the mean hue value resulted in all paintings being grouped as green, through different years. Here’s a sample, notice how most values tend to be different shades of green.
We learned this happened because HSL green values appear just in the middle of the Hue values (figure below). On top of that, it was difficult to group dark paintings since they land across different hue values depending on luminosity.
First hypothesis: “We were giving too much importance only to the primary color. What if we group using top 5 dominant colors instead?”
Results: We kept getting lots of randomness in the groups. This might be explained because some paintings have high colorfulness.
Second hypothesis: “We might be able to find a balance between complexity and clustering results by using 2 or 3 dominant colors, and removing Saturation values keeping Hue and Luminosity only”
Results: The following samples were taken by applying K-means to the paintings on the top 3 colors only using Hue and Luminosity. Still some outlier paintings tend to sneak in with black-dominant paintings still being hard to group together.
Third hypothesis: “Our model input might already be too complex (too many dominant colors, with similar greens, blues, yellows, etc). How might we simplify the grouping process?”
Research: We found a medium post by Andrea Lalenti that stacked the artwork’s pixels instead of extracting the colors from every piece. To then find clusters which create a palette that represents the trends from the whole set. One thing that we learned, is that using the whole spectrum of colors to group paintings is complex and not as impactful for the human eye. For example, 3 different clusters might be detected for 3 different shades of blue, however our users might not find those subtle differences as relevant when interacting with our visualization.
Results: North America subset test (AIoC Dataset) We isolated North America paintings (Resizing images to 100 x 100 pixels) and generated 15-pixel clusters. The figure below shows the pixel distribution (cluster size) among the top 15 colors (centroids). Notice, we could still group some grays that are similar, so we might be able to tune our model with a lower K value.
For example, this is a random sample of paintings from 1975 to 1999 along with the color distribution for that period. We might always have some outlier paintings; we will explore how we can manage them.
8–color palette by Style (WikiArts Dataset)
Using a larger dataset, we extracted the palettes relevant to each art category. The figure to the right shows the extracted 8–color palette by each artwork category throughout history. We started with early renaissance on the top to minimalism on the bottom. The most dominant color is on the left and the least dominant color is on the right.
If we use the pixel method for our final deployment, we have the chance to experiment with different palette sizes (K values) depending on region and sample size. Also, we can play with the down-sampling factor to take into consideration training time but also results (when we down-sample more, some pixels get blurred, and colors are not as easy to differentiate, but computing is faster). After fitting the clustering model once, we have a quick way to experiment with dimensions like period, style and even region if we decide to fit a unique model for the whole dataset. This flexibility opens a lot of opportunities to group the paintings, spending little time testing.
- Prepare task split for the midterm presentation.
- Plan a timeline for a first draft of the backend and front end to iterate on top of that.