The New AI
Analyzing Artworks¶
Accomplishing this in a statistically meaningful way requires a
substantial volume of art.
Fortunately, in today’s day and age, several museums maintain APIs.
These allow us to programmatically access not just the artwork itself, but metadata about the piece.
One such museum is the Art Institute of Chicago, whose collection contains over 120,000 pieces.
The Institute maintains a feature-rich API, allowing for bothmetadata
andimage acquisition.
Let’s take a tour of these APIs, and see if we can fuse them into our overarching color science process.
import ast
import binascii
from itertools import repeat
import json
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from PIL import Image
import requests
import scipy.cluster
import scipy.stats as ss
import sqlite3
from sqlalchemy import create_engine
First¶
Let’s write a small function to query the Art Institute of Chicago’s
Artworks API,
and test with a random integer.
def get_art_attributes(_id):
url = (
f"https://api.artic.edu/api/v1/artworks/{_id}?"
f"fields=id,image_id,date_end,place_of_origin,artwork_type_title"
)
r = requests.get(url)
data = json.loads(r.text)['data']
return data
get_art_attributes(897)
{'id': 897, 'date_end': 1852, 'place_of_origin': 'France', 'artwork_type_title': 'Painting', 'image_id': '5ae91cbf-66c5-cf9b-f355-629e458cb063'}
Success!¶
We now have several useful
attributes.
Notice theimage_id
, which we can use to retrieve the artwork itself.
Let’s write a small function to do just that, then test it with theimage_id
value we got previously.
def get_art_image(image_id):
im = None
url = (
f"https://www.artic.edu/iiif/2/"
f"{image_id}"
f"/full/843,/0/default.jpg"
)
try:
im = Image.open(requests.get(url, stream=True).raw)
except:
pass
return im.resize((300, 300))
get_art_image(get_art_attributes(893)['image_id'])
Excellent!¶
Now we have three very important data points:
time, place, and the painting itself.
Let’s run this through our color clustering algorithm that we developed a couple weeks ago.
def get_color_stats(im=None, ar=None):
if im and not ar:
ar = np.asarray(im)
shape = ar.shape
ar = ar.reshape(np.prod(shape[:2]), shape[2]).astype(float)
codes, dist = scipy.cluster.vq.kmeans(ar, 4)
vectors, distance = scipy.cluster.vq.vq(ar, codes)
counts, bins = np.histogram(vectors, len(codes))
colors = dict(zip(ss.rankdata(-counts), codes.tolist()))
colors = {int(k): {'rgb': v} for k, v in colors.items()}
for i, v in enumerate(colors):
colors[v]['count'] = counts[i]
for v in colors.values():
v['rgb'] = [round(n) for n in v['rgb']]
v['hex'] = f"#{binascii.hexlify(bytearray(int(c) for c in v['rgb'])).decode('ascii')}"
v['r'] = v['rgb'][0]
v['g'] = v['rgb'][1]
v['b'] = v['rgb'][2]
df = pd.DataFrame.from_dict(
data=colors, orient='index').reset_index().rename(
columns={'index': 'rank'}).sort_values(by='rank')
return df, ar
df, ar = get_color_stats(get_art_image(get_art_attributes(893)['image_id']))
df
rank | rgb | count | hex | r | g | b | |
---|---|---|---|---|---|---|---|
1 | 1 | [32, 23, 16] | 36387 | #201710 | 32 | 23 | 16 |
0 | 2 | [200, 191, 136] | 25097 | #c8bf88 | 200 | 191 | 136 |
3 | 3 | [140, 133, 79] | 17072 | #8c854f | 140 | 133 | 79 |
2 | 4 | [75, 66, 27] | 11444 | #4b421b | 75 | 66 | 27 |
We now have…¶
…the four most dominant
colors
and theircounts.
Let’s visualize the RGB pixels from the painting in 3D.
def plot_rgb(ar,s=0.1):
X = np.hsplit(ar, np.array([1, 2]))[0].flatten().tolist()
Y = np.hsplit(ar, np.array([1, 2]))[1].flatten().tolist()
Z = np.hsplit(ar, np.array([1, 2]))[2].flatten().tolist()
fig = plt.figure(figsize=(10, 10))
ax = fig.add_subplot(111, projection = '3d')
ax.scatter(X, Y, Z, s=s, c=ar / 255.0)
plt.show()
plot_rgb(ar)