Embarking on a deep learning journey is truly an enthralling experience, especially for someone hailing from a machine learning background like me. As I delved deeper, I realized the intricate details and nuances I wasn’t aware of. Determined to chronicle this journey, I’m penning down this series, starting with the foundational concept — Tensors.
At the heart of TensorFlow is, predictably, the tensor. Simply put, tensors are multi-dimensional arrays or matrices, extending beyond the two dimensions we’re familiar with in traditional matrices. Now, one might ask, how are tensors different from the beloved NumPy arrays? A crucial distinction is that tensors can run on GPUs, providing a significant speed boost for numerical computations. This feature makes them the de facto choice for deep learning.
When creating a tensor using tf.constant()
, some of its inherent properties come to the fore:
- Shape of Tensor: Displays the dimensionality.
- Data Type: Specifies the type of elements in the tensor (e.g., float, int).
- Numpy Value: Given the symbiotic relationship between TensorFlow and NumPy, tensors can be easily converted to numpy arrays.
# create a Tensor with the float16
simple_matrix = tf.constant([[10.,7.],
[3.,2.],
[8.,9.]], dtype=tf.float16)
simple_matrix
<tf.Tensor: shape=(3, 2), dtype=float16, numpy=
array([[10., 7.],
[ 3., 2.],
[ 8., 9.]], dtype=float16)>
As one’s experience grows, manually creating tensors becomes impractical due to their complexity. That’s where TensorFlow’s built-in functions become invaluable.
A recurring point of confusion is differentiating between the terms ‘dimension’ and ‘shape’. In the context of tensors, the ‘dimension’ (also known as Rank) refers to the number of indices needed to access a particular element. For example A scalar (a single number) is a tensor of rank 0, A vector (a 1D array of numbers) is a tensor of rank 1, A matrix (a 2D array of numbers) is a tensor of rank 2, etc. To provide clarity: If someone says a tensor is of rank 3 (or has 3 dimensions), it means the tensor has three axes or indices required to access its elements, and its shape might look something like [height, width, depth]
.
# Dimension of Tensor simple_matrix
simple_matrix.ndim
2
On the other hand ‘shape’ denotes the tensor’s layout in terms of the number of elements along each dimension. The shape effectively provides a blueprint for accessing the tensor’s data. For a Rank 2 tensor with a shape [3, 4]
, you know that you need two indices to access any element inside it (one for the row and another for the column), and that there are 3 rows and 4 columns. In TensorFlow, the shape of a tensor is often one of the primary attributes one will work with since it influences how operations like addition, multiplication, reshaping, etc., can be performed.
Random tensors hold particular significance in deep learning. Why? Because when initializing neural networks, we often start with random weights. Over time, as the network trains, these weights adjust according to the input data and chosen hyperparameters.
Creating random tensors from uniform or normal distributions is a common operation, especially when initializing the weights of neural networks. One thing cannot be ignored is setting seeds, both global and local. It has several reasons:
- Reproducibility: Deep learning models are complex systems, and even a small change in initialization can lead to noticeable differences in outcomes. Setting a seed ensures that the random numbers generated are the same every time the code is run. This reproducibility is essential for debugging, sharing research, or deploying models.
- Consistent Results Across Runs: When sharing research or comparing the results of different algorithms, it’s crucial that the comparison is fair. If every algorithm had a different random initialization, some might get an unfair advantage just due to the initial weights. A consistent seed ensures fairness in these comparisons.
- Combining Both Seeds: When both global and operation-specific seeds are set in TensorFlow, they both affect the randomness. This combination ensures that even if the global seed is set for reproducibility, individual operations can have their own predictable randomness by setting their own seed.
- Regularizing and Diagnosing Issues: Sometimes, having different initializations can be useful. For instance, if a model works well with one random initialization but not another, it might indicate that the model is sensitive to its initial weights, which could be a sign of other underlying issues (like vanishing or exploding gradients).
- Global vs. Local Seeds:
- Global Seed: This sets the seed for an entire runtime or session. Once set, any random operation will use this seed (alongside any operation-specific seeds).
- Local Seed (Operation-specific seed): This sets the seed only for a specific operation. This provides more fine-grained control over the randomness, allowing for certain operations to be reproducible while others are truly random.
# creating a Random Tensor
tf.random.set_seed(42) # global seed
random_tensor = tf.random.uniform(shape = [50], seed=13) # with local seed
random_tensor
<tf.Tensor: shape=(50,), dtype=float32, numpy=
array([0.6645621 , 0.44100678, 0.3528825 , 0.46448255, 0.03366041,
0.68467236, 0.74011743, 0.8724445 , 0.22632635, 0.22319686,
0.3103881 , 0.7223358 , 0.13318717, 0.5480639 , 0.5746088 ,
0.8996835 , 0.00946367, 0.5212307 , 0.6345445 , 0.1993283 ,
0.72942245, 0.54583454, 0.10756552, 0.6767061 , 0.6602763 ,
0.33695042, 0.60141766, 0.21062577, 0.8527372 , 0.44062173,
0.9485276 , 0.23752594, 0.81179297, 0.5263394 , 0.494308 ,
0.21612847, 0.8457197 , 0.8718841 , 0.3083862 , 0.6868038 ,
0.23764038, 0.7817228 , 0.9671384 , 0.06870162, 0.79873943,
0.66028714, 0.5871513 , 0.16461694, 0.7381023 , 0.32054043],
dtype=float32)>
Moreover, it’s essential to shuffle the order of tensor elements to minimize any biases in input data, when dealing with neural network inputs and weights. This shuffling provides a more neutral playground for neural networks. A pivotal aspect to note is that because neural networks initialize with random weight patterns, running the same model multiple times might yield different outcomes. Hence, ensuring reproducibility involves consistently shuffling data and initializing the weights in a similar pattern.
# shuffle
tf.random.set_seed(42) # global lever seednot_shuffle = tf.constant([[10,7],
[3,4],
[2,5]])
tf.random.shuffle(not_shuffle, seed = 42) # operational level seed
<tf.Tensor: shape=(3, 2), dtype=int32, numpy=
array([[10, 7],
[ 3, 4],
[ 2, 5]], dtype=int32)>
Appearently, it doesn’t show any shuffle, however, for truly reproducible shuffles across different runs of the program/script, one should only set the operation-specific seed and not the global seed.
Tensors possess certain attributes that help glean crucial insights:
- Shape (
tf.shape
): Denotes the number of elements along each dimension. - Rank (
tf.ndim
): Highlights the number of tensor dimensions. - Axis or Dimensions: Pinpoints specific dimensions of a tensor. For instance,
tensor[0]
ortensor[:, 1]
. - Size (
tf.size(tensor)
): Represents the total number of items in the tensor.
# Rank 4 Tensor
rank_4_dim = tf.zeros(shape = [2,3,4,5])rank_4_dim
<tf.Tensor: shape=(2, 3, 4, 5), dtype=float32, numpy=
array([[[[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.]],[[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.]]],
[[[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.]]]], dtype=float32)>
# various attributes of tensor
print("Datatype of every element: ", rank_4_dim.dtype)
print("no of Dimensions (rank): ", rank_4_dim.ndim)
print("Shape of a Tensor: ", rank_4_dim.shape)
print("Elements along 0th axis: ", rank_4_dim.shape[0])
print("Elements along the last axis: ", rank_4_dim.shape[-1])
print("Total no of elements in out Tensors: ", tf.size(rank_4_dim).numpy())
Datatype of every element: <dtype: 'float32'>
no of Dimensions (rank): 4
Shape of a Tensor: (2, 3, 4, 5)
Elements along 0th axis: 2
Elements along the last axis: 5
Total no of elements in out Tensors: 120
Simple arithmetic operations, when applied to tensors, impact their values. However, specific rules govern tensor operations, especially multiplication. The critical rules to remember are:
- Inner dimensions of the tensors should match.
- The resultant tensor will have the shape of the outer dimensions.
Given these rules, it’s paramount to be cautious when reshaping or transposing tensors. An inadvertent alteration can yield vastly different results, especially during matrix multiplication. tf.matmul()
or tf.tensor()
` can be used interchangabily for matrix multiplication among tensors.
A = tf.constant([[1,2],
[3,4],
[5,6]], float) # 3x2 matrix
B = tf.constant([[7,8],
[9,10],
[11,12]], float) # 3x2 Tensor# 3x2 * 3x2 wont multipliy
# Option 1: To Reshape matrix B - converts from 3x2 Tensor to 2x3 Tensor
C = tf.reshape(B, shape=(2,3))
# option 2: Transpose matrix B - converts from 3x2 Tensor to 2x3 Tensor
D = tf.transpose(B)
C, D
(<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[ 7., 8., 9.],
[10., 11., 12.]], dtype=float32)>,
<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[ 7., 9., 11.],
[ 8., 10., 12.]], dtype=float32)>)
# Tensor Multiplication
E = tf.matmul(A,C)
F = tf.tensordot(A,D, axes=1)E,F
(<tf.Tensor: shape=(3, 3), dtype=float32, numpy=
array([[ 27., 30., 33.],
[ 61., 68., 75.],
[ 95., 106., 117.]], dtype=float32)>,
<tf.Tensor: shape=(3, 3), dtype=float32, numpy=
array([[ 23., 29., 35.],
[ 53., 67., 81.],
[ 83., 105., 127.]], dtype=float32)>)
This shows a sifnificatn difference between Reshape and Transpose. Though both enable multiplication among tensors, however, the outcome is entirely different. Generally when performing matrix multiplication on tensors, and axes don’t line-up, one must do Transpose rather than reshape to achieve matrix multiplication, as reshaping a matrix changes the matrix’s underlying structure in a way that doesn’t maintain the linear algebraic relationships required for valid matrix multiplication.
Aggregating tensors essentially involves reducing multiple values to a smaller set or even a single value. This condensation proves pivotal in determining neural network outputs. Aggregation can take various forms: extracting the minimum, maximum, mean, standard deviation, or sum of a tensor. Additionally, in scenarios involving neural network prediction probabilities, locating the index of the highest probability becomes vital.
# creating a Random tensor using numpy random generator
random_tensor = tf.constant(np.random.randint(0,100, size=50))
random_tensor
<tf.Tensor: shape=(50,), dtype=int64, numpy=
array([63, 65, 19, 78, 12, 44, 74, 42, 72, 62, 75, 3, 30, 87, 80, 23, 32,
21, 46, 47, 3, 77, 95, 39, 6, 29, 76, 0, 25, 24, 10, 78, 97, 26,
74, 98, 54, 10, 54, 45, 19, 48, 41, 15, 21, 48, 95, 81, 60, 10])>
# min, max, sum, mean
tf.reduce_mean(random_tensor), tf.reduce_max(random_tensor), tf.reduce_sum(random_tensor), tf.reduce_mean(random_tensor)
(<tf.Tensor: shape=(), dtype=int64, numpy=98>,
<tf.Tensor: shape=(), dtype=int64, numpy=98>,
<tf.Tensor: shape=(), dtype=int64, numpy=2333>,
<tf.Tensor: shape=(), dtype=int64, numpy=46>)
# find var - need access to tensor flow prob
import tensorflow_probability as tfp
tfp.stats.variance(random_tensor)
<tf.Tensor: shape=(), dtype=int64, numpy=824>
In the world of deep learning, numeric data reigns supreme. This preference poses a problem when dealing with categorical or non-numeric data. The solution? One-Hot Encoding. It’s a technique where categorical values are converted into a numerical format, ensuring models can process them seamlessly.
One-Hot encoding can be accessed by tf.one_hot(tensor, depth)
`. It is important to mention the depth of the tensor, which is just the number of unique categories or classes in the data you’re encoding.
For instance, if you’re working with a dataset that classifies images into five categories: Red, Bue, Green, purple, orange, then the depth is 5 because there are five possible classes.
# Create a list of indices
A = [0,1,2,3,4] # Red, Bue, Green, purple, orange# one-hot encode
tf.one_hot(A, depth=5)
<tf.Tensor: shape=(5, 5), dtype=float32, numpy=
array([[1., 0., 0., 0., 0.],
[0., 1., 0., 0., 0.],
[0., 0., 1., 0., 0.],
[0., 0., 0., 1., 0.],
[0., 0., 0., 0., 1.]], dtype=float32)>
In conclusion, understanding tensors and their operations is fundamental for anyone venturing into deep learning using TensorFlow. They’re the backbone of neural network computations and serve as the gateway to more complex concepts. As I continue my deep learning voyage, I extend my gratitude to Daniel’s insightful YouTube channel and the ever-resourceful ChatGPT. Stay tuned for more articles as I further unravel the intricacies of deep learning.
🙂
Be the first to comment