I have created 64×64 image type pictures that look like this.
There are 13 different colored images available in dataset, that I quickly came up with. 10 images for each class. So overall there are 130 images.
The first step is to always work with data preprocessing, but in my case, there are no variations that I should consider as I am interested if neural networks will be able to understand this pattern that I have.
But I still have to take into the consideration that all of my labels are words and I need to convert them to the style that will be useful for my neural nets. To do that I use One Hot Encoders. If you don’t know what one hot encoder does, read this.
In my case I have 13 labels. So, for example “red” will be vector that looks like this: [0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0].
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
import os
import pickledef create_one_hot_encoder(y,
enc1_file = 'label_encoder.pkl',
enc2_file = 'hot_encoder.pkl'):if os.path.exists(enc1_file) and os.path.exists(enc2_file):
print("The pickle files already exists")
return Falselabel_encoder = LabelEncoder()
encoded = label_encoder.fit_transform(y)
encoded = encoded.reshape(len(encoded), 1)enc = OneHotEncoder(sparse=False)
enc.fit(encoded)with open(enc1_file, 'wb') as write:
pickle.dump(label_encoder, write)with open(enc2_file, 'wb') as write:
pickle.dump(enc, write)print("Model Saved Sucessfully")
return True
I create pickle files that will allow me to use this encoding, which were generated from training data, for testing my model after I train it.
Now, I will quickly go over with the code that I’ve developed. Firstly, we need to create our input and output for our image. As it is a very simple task, I use placeholders to define my input-output.
X = tf.placeholder(tf.float32, shape = [None, 64, 64, 3])
y = tf.placeholder(tf.float32, shape = [None, 13])
hold_prob = tf.placeholder(tf.float32)
Here x represents the image that we will have as input. As you see it has shape (None, 64,64,3). None, in this case, represents the batch size. The number of images that I have is very low, so I take full batches in this case and set batch size to number of images that are in dataset (in my case 117). But if you want you can change that parameter. “ hold_prob “ defines the probability that will keep layers alive when using dropout.
Next, I define all the necessary steps. As loss function, I use Adamoptimizer with cross entropy.
cnn = CNN(batch_size= 117, learning_rate=0.001, shape = [64,64,3],
num_classes=13)cnn.neural_net(X, hold_prob)cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y,
logits=cnn.y_prob))
optimizer = tf.train.AdamOptimizer(learning_rate = cnn.learning_rate)
loss = optimizer.minimize(cross_entropy)
Here CNN is class created by me which is in “model.py” file and you can check it on GitHub. Next, I call for function neural_net which constructs neural net that I will present below and saves final outputs to cnn.y_prob which we use in cross entropy loss.
To train my dataset I developed my own simple neural net with two convolutions, one pooling, and 2 fully connected layers. I use relu as my activation function.
I used dropout layers to avoid overfitting. Here is the code
def neural_net(self, X, hold_prob = 0.5):with tf.variable_scope('convolution') as scope:
conv1 = self.conv_layer(X, [3, 3, 3, 32])
conv2 = self.conv_layer(conv1, [5,5, 32, 64])
with tf.variable_scope('pooling') as scope:
pool1 = tf.nn.max_pool(conv2,
ksize = [1, self.kernel_size , self.kernel_size ,1],
strides=[1, self.strides_size, self.strides_size , 1],
padding = 'VALID')
with tf.variable_scope('fc_layer') as scope:
flat = tf.reshape(pool1, [-1 , 8 * 8 * 64])
dropout_flat = tf.nn.dropout(flat, keep_prob=hold_prob)
fc1 = self.fc_layer(dropout_flat, 1024)
fc1 = tf.nn.relu(fc1)
dropout_fc1 = tf.nn.dropout(fc1, keep_prob=hold_prob)
fc2 = self.fc_layer(dropout_fc1, self.num_classes)
#fc2 = tf.nn.relu(fc2)
self.y_prob = fc2
This is a very simple model that I decided to use. I trained 117 images for 1000 epochs on GTX-1060 which took up to 4 minutes. I thought at first that it would definitely give me 100% accuracy. But I was wrong. The training accuracy was 94%. Also, I tried testing it on 13 images from the test set and it identified 12 correct. Which is also 92% accuracy. The mistake was that the image that I see as “dark green”, was identified as “dark blue”. It might seem very easy to find the set of additions and multiplications that allows identifying 13 colors, but it turns out during the training process we could not achieve that. There might be several reasons behind it.
- Training time is short
- There are a few images
- Each channel contains the same value for the entire image (we have the same pixel values for each image). Like in purple image every pixel value in red, green or blue channel is equal throughout the whole image. To be more specific there are 10 images for purple images. For first one all pixel values are (200, 0, 170), for second purple image (203, 3, 173) and etc.
In conclusion, I wanted to test simple CNN on a very easy task, to learn to identify patterns of colors in images. It turned out it is just another classification problem, that needs more training and good data structure, as I could not achieve 100% accuracy, even on this simple problem with the small training set and time. Maybe I could have come up with better neural network model, without even using Convolutions or Pooling, but all the fun that I had from this project was to test how this type of neural net would do against this problem.
As I’ve said in the introduction, my inspiration came from the little child who was learning how to identify colors. But she as well struggled at first to identify it and made a lot of mistakes. It took more than 4 minutes for her to learn the color patterns. After that how can we blame AI for not learning this simple construction? Maybe it’s time to start thinking of improving our own neural nets in brain :).
Be the first to comment