Sigma Technology
4 minutes to read
We are given a webpage where we can modify the color of five pixels from a dog image. We can choose the position of the pixels (x
, y
) and the color (RGB values) and the image has 32x32 pixels:
The robot classifies the image as some of these objects:
airplane
automobile
bird
cat
deer
dog
frog
horse
ship
truck
The classification is handled with a Machine Learning algorithm (using tensorflow
). We are also given the model (sigmanet.h5
) and the Python code to load the model and classify images (model.py
).
Understanding the attack
The objective is to fool the robot’s Artificial Intelligence so that it misclassifies the dog’s image after the pixel modification. This is a kind of Adversarial Machine Learning, which is only adding noise to an image to cause a classification error on the model:
First of all, we might want to use a Jupyter Notebook to work with the model. This time I used Google Colab. I entered the model.py
code as a cell and then I imported the model and the dog image:
import imageio
import numpy as np
from matplotlib.pyplot import imshow
img = imageio.imread('dog.png', pilmode='RGB')
dog = np.array(img)
Proof of concept
Messing around with the pixels, drawing crosses in red and white, we get the model to classify the images as a horse
and a cat
:
However, if we try this values on the webpage, we don’t see the flag (although the AI misclassifies the image):
Then I started thinking that we need to fool the AI so that it classifies with every available class, and continued working on the frog
manually.
Manual approach
The process was trying random positions and random colors. Then, I changed one value up and down and selected the one that gave more confidence (the value in the corresponding position of the list). When that value is the max value of all confidences, then the model says that the image is a frog
. Eventually I got it, after a lot of trial and error:
Unfortunately, the webpage didn’t show the flag either. Hence, I thought that maybe we need to make the robot classify an object that is not an animal. Namely, one of these:
airplane
automobile
ship
truck
The manual attempt is very time-consuming and not efficient. Hence, we can design some sort of algorithm to do something similar, but automatically.
Automatic approach
The idea is to take a list of attempts, and generate some new attempts based on the previous ones (modifying randomly a single value for position or color). Then, we classify them using the model and sort them by confidence in the four objects we need. Finally, we take only the best ones and continue onto the next iteration until the confidence is the maximum of the list.
This is the Python code that implements this algorithm:
from random import randint as ri
NUM_WINNERS = 4
MAX_DISTANCE = 10
MAX_COLOR = 4
NUM_FIXES = 5
def generate(x, y, r, g, b):
n = ri(0, 99)
if 0 <= n < 10: x = max( 0, x - ri(1, MAX_DISTANCE))
if 10 <= n < 20: x = min( 31, x + ri(1, MAX_DISTANCE))
if 20 <= n < 30: y = max( 0, y - ri(1, MAX_DISTANCE))
if 30 <= n < 40: y = min( 31, y + ri(1, MAX_DISTANCE))
if 40 <= n < 50: r = max( 0, r - ri(1, MAX_COLOR))
if 50 <= n < 60: r = min(255, r + ri(1, MAX_COLOR))
if 60 <= n < 70: g = max( 0, g - ri(1, MAX_COLOR))
if 70 <= n < 80: g = min(255, g + ri(1, MAX_COLOR))
if 80 <= n < 90: b = max( 0, b - ri(1, MAX_COLOR))
if 90 <= n < 100: b = min(255, b + ri(1, MAX_COLOR))
return x, y, r, g, b
def rand_pixel():
return ri(0, 31), ri(0, 31), ri(0, 255), ri(0, 255), ri(0, 255)
attempts = [[rand_pixel() for __ in range(5)] for _ in range(NUM_WINNERS)]
done = False
while not done:
new_attempts = attempts.copy()
for i, attempt in enumerate(attempts):
for _ in range(50):
new_attempts.append([generate(*attempt[i]) for i in range(NUM_FIXES)])
max_confs = []
for i, attempt in enumerate(new_attempts):
new_img = dog.copy()
for x, y, r, g, b in attempt:
new_img[x, y] = [r, g, b]
pred, conf = sigmanet.predict_one(new_img)
max_conf = max(conf[0], conf[1], conf[8], conf[9])
max_confs.append((i, max_conf))
if pred in {'airplane', 'automobile', 'ship', 'truck'}:
print(pred, attempt)
done = True
break
max_confs.sort(key=lambda x: x[1], reverse=True)
indices = map(lambda x: x[0], max_confs[:NUM_WINNERS])
attempts = [new_attempts[i] for i in indices]
print(max_confs[:NUM_WINNERS])
This algorithm will always increase the confidence for the desired values (or at least, maintain the level). After some time, we get this output:
There are some values to classify as an airplane
. We can test it locally:
Flag
Alright. Finally, if we try these values on the webpage, we will get the flag:
HTB{0ne_tw0_thr33_f0ur_f1v3_p1xel_attack}