Linear Algebra 101 for AI/ML – Part 1

Intro

You don't need to be an expert in linear algebra to get started in AI, but you do need to know the basics. This is part 1 of my Linear Algebra 101 for AI/ML series, which is my attempt to compress the 6+ months I spent learning linear algebra before I started my career in AI. With the benefit of hindsight, I know now that you don't need to spend 6+ months or even 6 weeks brushing up on linear algebra to dive into AI. Instead, you can quickly ramp up on the basics and get started coding in AI much faster. As you make progress in AI/ML, you can continue your math studies.

In this article, you will learn:

???? the basics of vector and matrix math

???? vector and matrix operations

???? learn the basics of PyTorch, an open source ML framework

As you read this guide, keep an eye out for interactive question and quiz modules to check your understanding of the material!

Without further ado, here are the topics of the article:

Basic Definitions

Scalar – A scalar is a single numerical value that represents a magnitude without direction. In programming terms, you can think of scalars as simple variables holding a single number, like an integer or float. Examples of scalars include temperature, age, and weight.

Vector – A vector is an ordered list of scalars. Why do we say it's ordered? Because the position of the scalar in the vector matters. Below is an example of a vector. Pretend y ⃗ \color{cyan}{\vec{y}} y is a vector representing the movie "Avengers: Endgame". The vector contains five numbers stacked on top of one another in a single column, each of which describes a specific attribute of the movie.

y ⃗ = [ 0.99 0.52 0.45 0.10 0.26 ] action comedy drama horror romance } 5 rows {\color{cyan}{\vec{y}}} \quad = \left. \begin{bmatrix} 0.99 \\ 0.52 \\ 0.45 \\ 0.10 \\ 0.26 \\ \end{bmatrix} \quad \begin{array}{l} \text{action} \\ \text{comedy} \\ \text{drama} \\ \text{horror} \\ \text{romance} \end{array} \right\}\text{5 rows} y = 0.99 0.52 0.45 0.10 0.26 action comedy drama horror romance ⎭ ⎬ ⎫ 5 rows

We see that the movie has a value of 0.99 for action and 0.10 for horror. This suggests the movie is more of an action movie than a horror movie. If we were to swap the value for action with the value for horror, the vector would no longer accurately represent "Avengers: Endgame", which is not a horror movie. This is why order matters.

[ 0.99 0.52 0.45 0.10 0.26 ] ≠ [ 0.10 0.52 0.45 0.99 0.26 ] action comedy drama horror romance \begin{bmatrix} {\color{cyan}{0.99}} \\ 0.52 \\ 0.45 \\ {\color{orange}{0.10}} \\ 0.26 \\ \end{bmatrix}

eq \begin{bmatrix} {\color{cyan}{0.10}} \\ 0.52 \\ 0.45 \\ {\color{orange}{0.99}} \\ 0.26 \\ \end{bmatrix} \quad \begin{array}{l} {\color{cyan}{\text{action}}} \\ \text{comedy} \\ \text{drama} \\ {\color{orange}{\text{horror}}} \\ \text{romance} \end{array} 0.99 0.52 0.45 0.10 0.26  = 0.10 0.52 0.45 0.99 0.26 action comedy drama horror romance

Are vectors always arranged in column form? No, not necessarily. Below are vectors in either row or column form of different lengths.

[ 18 21 24 27 ] ⏞ 4 columns \color{orange}{ \overbrace{ \begin{bmatrix} 18 & 21 & 24 & 27 \end{bmatrix} }^{\text{4 columns} } } [ 18 21 24 27 ] 4 columns [ 18 21 ] ⏞ 2 columns \color{cyan}{ \overbrace{ \begin{bmatrix} 18 & 21 \end{bmatrix} }^{\text{2 columns}} } [ 18 21 ] 2 columns [ − 1.5 0.89 0.41 ] } 3 rows \color{magenta}{ \left. \begin{bmatrix} -1.5 \\ 0.89 \\ 0.41 \\ \end{bmatrix} \right\}\text{3 rows} } − 1.5 0.89 0.41 ⎭ ⎬ ⎫ 3 rows

Notice a vector either has one row or one column. What if you want a mathematical object that has multiple rows and multiple columns? That's where a matrix comes into play.

Matrix – If a scalar is a single number, and a vector is a one-dimensional ordered list of scalars, then a matrix is a two-dimensional array of scalars. Below, X \color{cyan}{X} X is an example matrix. You can see it has four rows and two columns.

X = [ 3 3 4 3 5 3 5 4 ] 123 Maple Grove Lane 888 Ocean View Terrace 100 Birch Street 987 Sunflower Court {\color{cyan}{X}} \quad = \begin{bmatrix} {\color{magenta}{3}} & {\color{orange}{3}} \\ {\color{magenta}{4}} & {\color{orange}{3}} \\ {\color{magenta}{5}} & {\color{orange}{3}} \\ {\color{magenta}{5}} & {\color{orange}{4}} \\ \end{bmatrix} \quad \begin{array}{l} \text{123 Maple Grove Lane} \\ \text{888 Ocean View Terrace} \\ \text{100 Birch Street} \\ \text{987 Sunflower Court} \\ \end{array} X = 3 4 5 5 3 3 3 4 123 Maple Grove Lane 888 Ocean View Terrace 100 Birch Street 987 Sunflower Court

Each row corresponds to the address of a single home. The first column represents the number of bedrooms in the home, and the second column represents the number of bathrooms.

Concept Check How many bathrooms are in the home located at 100 Birch Street? In the matrix, X = [ 3 3 4 3 5 3 5 4 ] 123 Maple Grove Lane 888 Ocean View Terrace 100 Birch Street 987 Sunflower Court {\color{cyan}{X}} \quad = \begin{bmatrix} 3 & 3 \\ 4 & 3 \\ {\color{magenta}{5}} & {\color{orange}{3}} \\ 5 & 4 \\ \end{bmatrix} \quad \begin{array}{l} \text{123 Maple Grove Lane} \\ \text{888 Ocean View Terrace} \\ {\color{yellow}{\text{100 Birch Street}}} \\ \text{987 Sunflower Court} \\ \end{array} X = 3 4 5 5 3 3 3 4 123 Maple Grove Lane 888 Ocean View Terrace 100 Birch Street 987 Sunflower Court the vector [ 5 3 ] \begin{bmatrix} {\color{magenta}{5}} & {\color{orange}{3}} \end{bmatrix} [53] corresponds to 100 Birch Street. Since the second column represents number of bathrooms, then this home has three bathrooms. Show answer

Any mathematician might find these definitions too simplistic and overly reductionist, but they are good enough to get us started. We'll see later how vectors and matrices can hold data to be processed by machine learning models.

Mathematical Notation Symbols like ∈ \in ∈ or R \color{cyan}{\mathbb{R}} R can be daunting when reading math equations, so let's define and build up familiarity with them. ∈ \in ∈ means "in", and R \color{cyan}{\mathbb{R}} R means "the set of real numbers." Let's break down what this means. The "set of real numbers R \color{cyan}{\mathbb{R}} R" is the mathematician's way of saying all numbers you use in everyday life: all whole numbers, negative numbers, fractions, decimal numbers, and irrational numbers on an infinite number line. Below is a visualization of a portion of R \color{cyan}{\mathbb{R}} R. Therefore, x ∈ R x \in \color{cyan}{\mathbb{R}} x∈R means x x x is one of the infinitely many real numbers. Next, let's see how we use this notation to indicate a vector's number of rows and/or columns, aka its dimensions. m rows { [ x 0 x 1 ⋮ x m − 1 ] ∈ R m {\color{orange}{m}} \text{ rows} \left\{ \begin{bmatrix} x_0 \\ x_1 \\ \vdots \\ x_{{\color{orange}{m}}-1} \\ \end{bmatrix} \right. \in {\color{cyan}{\mathbb{R}}}^{{\color{orange}{m}}} m rows ⎩ ⎨ ⎧ x 0 x 1 ⋮ x m − 1 ∈ R m Above, we see a vector with m \color{orange}{m} m rows and 1 column. Typically in machine learning, these m \color{orange}{m} m numbers are each from the set of real numbers, hence why we're using R {\color{cyan}{\mathbb{R}}} R. Since the vector contains m \color{orange}{m} m real numbers, we say it belongs to (aka ∈ \in ∈) the set of m \color{orange}{m} m real numbers, or in mathematical notation, ∈ R m \in {\color{cyan}{\mathbb{R}}}^{\color{orange}{m}} ∈Rm. Finally, X ∈ R 3 × 5 {\color{orange}{X}} \in \mathbb{R}^{{\color{orange}{3}} \times {\color{orange}{5}}} X∈R3×5 means " X {\color{orange}{X}} X is a matrix with 3 {\color{orange}{3}} 3 rows and 5 {\color{orange}{5}} 5 columns of values, each belonging to the set of real numbers." Knowing this notation is important because imagine if I had to write out "a matrix X {\color{orange}{X}} X with 3 {\color{orange}{3}} 3 rows and 5 {\color{orange}{5}} 5 columns" or "a matrix Y {\color{cyan}{Y}} Y with 100 {\color{cyan}{100}} 100 rows and 27 {\color{cyan}{27}} 27 columns". It's quite verbose. Instead, what if I just wrote X ∈ R 3 × 5 {\color{orange}{X}} \in \mathbb{R}^{{\color{orange}{3}} \times {\color{orange}{5}}} X∈R3×5 or Y ∈ R 100 × 27 {\color{cyan}{Y}} \in \mathbb{R}^{{\color{cyan}{100}} \times {\color{cyan}{27}} } Y∈R100×27? Isn't that more concise? With enough exposure, you'll be very comfortable with math notation. Now let's take a look at a matrix: [ x 0 , 0 x 0 , 1 ⋯ x 0 , n − 1 x 1 , 0 x 1 , 1 ⋯ x 1 , n − 1 ⋮ ⋮ ⋮ ⋮ x m − 1 , 0 x m − 1 , 1 ⋯ x m − 1 , n − 1 ] ⏞ n columns ∈ R m × n \overbrace{ \begin{bmatrix} x_{0,0} & x_{0,1} & \cdots & x_{0,{\color{violet}{n}}-1} \\ x_{1,0} & x_{1,1} & \cdots & x_{1,{\color{violet}{n}}-1} \\ \vdots & \vdots & \vdots & \vdots \\ x_{{\color{orange}{m}}-1,0} & x_{{\color{orange}{m}}-1,1} & \cdots & x_{{\color{orange}{m}}-1,{\color{violet}{n}}-1} \\ \end{bmatrix} }^{{\color{violet}{n}} \text{ columns}} \in {\color{cyan}{\mathbb{R}}}^{ {\color{orange}{m}} \times {\color{violet}{n}}} x 0 , 0 x 1 , 0 ⋮ x m − 1 , 0 x 0 , 1 x 1 , 1 ⋮ x m − 1 , 1 ⋯ ⋯ ⋮ ⋯ x 0 , n − 1 x 1 , n − 1 ⋮ x m − 1 , n − 1 n columns ∈ R m × n The matrix above has m {\color{orange}{m}} m rows and n {\color{violet}{n}} n columns. If you count them all, there are a total of m ⋅ n {\color{orange}{m}} \cdot {\color{violet}{n}} m⋅n real numbers. Thus, this matrix belongs to the set of matrices, R m × n {\color{cyan}{\mathbb{R}}}^{ {\color{orange}{m}} \times {\color{violet}{n}}} Rm×n.

Concept Check M ∈ R 1000 × 80 M \in \mathbb{R}^{1000 \times 80} M∈R1000×80. How many elements does M M M contain? Let's write out what M M M looks like: M = [ m 0 , 0 m 0 , 1 ⋯ m 0 , 79 m 1 , 0 m 1 , 1 ⋯ m 1 , 79 ⋮ ⋮ ⋮ ⋮ m 999 , 0 m 999 , 1 ⋯ m 999 , 79 ] M = \begin{bmatrix} m_{0,0} & m_{0,1} & \cdots & m_{0,79} \\ m_{1,0} & m_{1,1} & \cdots & m_{1,79} \\ \vdots & \vdots & \vdots & \vdots \\ m_{999,0} & m_{999,1} & \cdots & m_{999,79} \\ \end{bmatrix} M = m 0 , 0 m 1 , 0 ⋮ m 999 , 0 m 0 , 1 m 1 , 1 ⋮ m 999 , 1 ⋯ ⋯ ⋮ ⋯ m 0 , 79 m 1 , 79 ⋮ m 999 , 79 We see that M M M has 1000 1000 1000 rows and 80 80 80 columns. That means M M M has 1000 × 80 = 80 , 000 1000 \times 80 = 80,000 1000×80=80,000 elements. Show answer

Element-wise Operations with PyTorch

Code Environment Setup

Now that we've established the definitions of vectors and matrices and their mathematical notation, let's play around with them in code to gain some intuition and familiarity. To do this, we're going to use an open source machine learning framework called PyTorch. PyTorch is widely used throughout academia and industry for cutting edge AI research and production grade software at institutions and companies such as OpenAI, Amazon, Meta, Salesforce, Stanford University, and thousands of startups, so it'll be practical to build up experience with the framework. Visit the official PyTorch installation instructions page to get started.

After you install PyTorch, open up your Python REPL. Copy the code below (tip: on desktop, hover over the code and click on the clipboard that appears to copy the code):

a = [ 3 4 5 5 ] ∈ R 4 × 1 a = \begin{bmatrix} 3 \\ 4 \\ 5 \\ 5 \\ \end{bmatrix} \in \mathbb{R}^{4 \times 1} a = 3 4 5 5 ∈ R 4 × 1 Python import torch a = torch . tensor ( [ [ 3 ] , [ 4 ] , [ 5 ] , [ 5 ] ] )

Above, on the left hand side we see a vector with four elements, and on the right hand side is its equivalent in code.

Concept Check Now that we know how to create vectors, can you guess how you create the following matrix in PyTorch? m = [ 3 4 5 6 ] m = \begin{bmatrix} 3 & 4 \\ 5 & 6 \\ \end{bmatrix} m = [ 3 5 4 6 ] Python torch . tensor ( [ [ 3 , 4 ] , [ 5 , 6 ] ] ) Show answer

Set up your REPL with the following before continuing.

Python

>> > import torch >> > a = torch . tensor ( [ 1.0 , 2.0 , 4.0 , 8.0 ] ) >> > b = torch . tensor ( [ 1.0 , 0.5 , 0.25 , 0.125 ] )

We're going to look at a class of operations performed on vectors and matrices called element-wise operations. Element-wise operations are operations that are applied independently to each element of a vector or matrix, resulting in a new vector or matrix of the same shape. These operations include addition, subtraction, multiplication, division, and many more.

Element-wise addition

[ 1 2 4 8 ] + [ 1 0.5 0.25 0.125 ] = [ 1 + 1 2 + 0.5 4 + 0.25 8 + 0.125 ] \begin{bmatrix} {\color{cyan}{1}} \\ {\color{orange}{2}} \\ {\color{yellow}{4}} \\ {\color{magenta}{8}} \\ \end{bmatrix} + \begin{bmatrix} {\color{cyan}{1}} \\ {\color{orange}{0.5}} \\ {\color{yellow}{0.25}} \\ {\color{magenta}{0.125}} \\ \end{bmatrix} = \begin{bmatrix} {\color{cyan}{1}} + {\color{cyan}{1}} \\ {\color{orange}{2}} + {\color{orange}{0.5}} \\ {\color{yellow}{4}} + {\color{yellow}{0.25}} \\ {\color{magenta}{8}} + {\color{magenta}{0.125}} \\ \end{bmatrix} 1 2 4 8 + 1 0.5 0.25 0.125 = 1 + 1 2 + 0.5 4 + 0.25 8 + 0.125 Python >> > a + b tensor ( [ 2.00 , 2.50 , 4.25 , 8.125 ] )

Element-wise subtraction

[ 1 2 4 8 ] − [ 1 0.5 0.25 0.125 ] = [ 1 − 1 2 − 0.5 4 − 0.25 8 − 0.125 ] \begin{bmatrix} {\color{cyan}{1}} \\ {\color{orange}{2}} \\ {\color{yellow}{4}} \\ {\color{magenta}{8}} \\ \end{bmatrix} - \begin{bmatrix} {\color{cyan}{1}} \\ {\color{orange}{0.5}} \\ {\color{yellow}{0.25}} \\ {\color{magenta}{0.125}} \\ \end{bmatrix} = \begin{bmatrix} {\color{cyan}{1}} - {\color{cyan}{1}} \\ {\color{orange}{2}} - {\color{orange}{0.5}} \\ {\color{yellow}{4}} - {\color{yellow}{0.25}} \\ {\color{magenta}{8}} - {\color{magenta}{0.125}} \\ \end{bmatrix} 1 2 4 8 − 1 0.5 0.25 0.125 = 1 − 1 2 − 0.5 4 − 0.25 8 − 0.125 Python >> > a - b tensor ( [ 0.0 , 1.5 , 3.75 , 7.8750 ] )

Element-wise multiplication

[ 1 2 4 8 ] ⊙ [ 1 0.5 0.25 0.125 ] = [ 1 ⋅ 1 2 ⋅ 0.5 4 ⋅ 0.25 8 ⋅ 0.125 ] \begin{bmatrix} {\color{cyan}{1}} \\ {\color{orange}{2}} \\ {\color{yellow}{4}} \\ {\color{magenta}{8}} \\ \end{bmatrix} \odot \begin{bmatrix} {\color{cyan}{1}} \\ {\color{orange}{0.5}} \\ {\color{yellow}{0.25}} \\ {\color{magenta}{0.125}} \\ \end{bmatrix} = \begin{bmatrix} {\color{cyan}{1}} \cdot {\color{cyan}{1}} \\ {\color{orange}{2}} \cdot {\color{orange}{0.5}} \\ {\color{yellow}{4}} \cdot {\color{yellow}{0.25}} \\ {\color{magenta}{8}} \cdot {\color{magenta}{0.125}} \\ \end{bmatrix} 1 2 4 8 ⊙ 1 0.5 0.25 0.125 = 1 ⋅ 1 2 ⋅ 0.5 4 ⋅ 0.25 8 ⋅ 0.125 Python >> > a * b tensor ( [ 1. , 1. , 1. , 1. ] )

Element-wise division

[ 1 2 4 8 ] ⊘ [ 1 0.5 0.25 0.125 ] = [ 1 / 1 2 / 0.5 4 / 0.25 8 / 0.125 ] \begin{bmatrix} {\color{cyan}{1}} \\ {\color{orange}{2}} \\ {\color{yellow}{4}} \\ {\color{magenta}{8}} \\ \end{bmatrix} \oslash \begin{bmatrix} {\color{cyan}{1}} \\ {\color{orange}{0.5}} \\ {\color{yellow}{0.25}} \\ {\color{magenta}{0.125}} \\ \end{bmatrix} = \begin{bmatrix} {\color{cyan}{1}} / {\color{cyan}{1}} \\ {\color{orange}{2}} / {\color{orange}{0.5}} \\ {\color{yellow}{4}} / {\color{yellow}{0.25}} \\ {\color{magenta}{8}} / {\color{magenta}{0.125}} \\ \end{bmatrix} 1 2 4 8 ⊘ 1 0.5 0.25 0.125 = 1 / 1 2 / 0.5 4 / 0.25 8 / 0.125 Python >> > a / b tensor ( [ 1. , 4. , 16. , 64. ] )

Subscribe to get the latest updates on the Linear Algebra 101 series and more. Unsubscribe any time. Email address Sign up

There are also element-wise operations that act on a vector/matrix alone. Below are two commonly used operations in machine learning.

Sigmoid

σ ( [ 1 2 4 8 ] ) = [ σ ( 1 ) σ ( 2 ) σ ( 4 ) σ ( 8 ) ] where σ ( x ) = 1 1 + e − x \sigma \left( \begin{bmatrix} {\color{cyan}{1}} \\ {\color{orange}{2}} \\ {\color{yellow}{4}} \\ {\color{magenta}{8}} \\ \end{bmatrix} \right) = \begin{bmatrix} \sigma({\color{cyan}{1}}) \\ \sigma({\color{orange}{2}}) \\ \sigma({\color{yellow}{4}}) \\ \sigma({\color{magenta}{8}}) \\ \end{bmatrix} \\ \\[8pt] \text{where } \sigma({\color{yellow}{x}}) = \frac{1}{1+e^{-{\color{yellow}{x}}}} σ 1 2 4 8 = σ ( 1 ) σ ( 2 ) σ ( 4 ) σ ( 8 ) where σ ( x ) = 1 + e − x 1 Python >> > torch . sigmoid ( a ) tensor ( [ 0.7311 , 0.8808 , 0.9820 , 0.9997 ] ) >> torch . sigmoid ( torch . tensor ( 239 ) ) tensor ( 1. ) >> > torch . sigmoid ( torch . tensor ( 0 ) ) tensor ( 0.5000 ) >> > torch . sigmoid ( torch . tensor ( - 0.34 ) ) tensor ( 0.4158 )

The sigmoid function takes any value of x x x and squashes it into the range ( 0 , 1 ) (0, 1) (0,1). Note that only σ ( − ∞ ) = 0 \sigma(-\infty) = 0 σ(−∞)=0 and σ ( + ∞ ) = 1 \sigma(+\infty) = 1 σ(+∞)=1. This is useful when you have arbitrarily large values and you want to condense them into the range of values between 0 and 1. It's sometimes useful to interpret the output of sigmoid as a probability.

ReLU (Rectified Linear Unit)

ReLU ( [ 1 2 4 8 ] ) = [ f ( 1 ) f ( 2 ) f ( 4 ) f ( 8 ) ] where f ( x ) = max ( x , 0 ) \text{ReLU} \left( \begin{bmatrix} {\color{cyan}{1}} \\ {\color{orange}{2}} \\ {\color{yellow}{4}} \\ {\color{magenta}{8}} \\ \end{bmatrix} \right) = \begin{bmatrix} f({\color{cyan}{1}}) \\ f({\color{orange}{2}}) \\ f({\color{yellow}{4}}) \\ f({\color{magenta}{8}}) \\ \end{bmatrix} \\ \\[8pt] \text{where } f({\color{yellow}{x}}) = \text{max}({\color{yellow}{x}}, 0) ReLU 1 2 4 8 = f ( 1 ) f ( 2 ) f ( 4 ) f ( 8 ) where f ( x ) = max ( x , 0 ) Python >> > c = torch . tensor ( [ 4 , - 4 , 0 , 2 ] ) >> > torch . relu ( c ) tensor ( [ 4 , 0 , 0 , 2 ] )

The ReLU function acts as a filter. Any positive input goes through it unchanged, but any negative input becomes zero. You might find it strange why such a function exists, but this simple function helps neural networks learn to recognize objects in images and is used in ChatGPT and other sophisticated chatbots. 1

Tensors Did you notice that we create vectors and matrices with the PyTorch function torch.tensor(...) ? Why is it not called torch.vector(...) nor torch.matrix(...) ? PyTorch tensors are more general. A vector has 1 \color{cyan}{1} 1 dimension, a matrix has 2 \color{orange}{2} 2 dimensions, so what is a general term that covers 3 or more dimensions? Answer: a tensor. Actually, vectors and matrices are also tensors because a tensor is any N N N-dimensional array of numbers. A tensor is a fundamental unit in PyTorch. You can learn more about them by visiting this official tutorial from the PyTorch foundation. Python >> > a = torch . rand ( ( 3 , 4 , 2 ) ) tensor ( [ [ [ 0.8856 , 0.9232 ] , [ 0.0250 , 0.2977 ] , [ 0.4745 , 0.2243 ] , [ 0.3107 , 0.9159 ] ] , [ [ 0.3654 , 0.3746 ] , [ 0.4026 , 0.4557 ] , [ 0.9426 , 0.0865 ] , [ 0.3805 , 0.5034 ] ] , [ [ 0.3843 , 0.9903 ] , [ 0.6279 , 0.2222 ] , [ 0.0693 , 0.0140 ] , [ 0.6222 , 0.3590 ] ] ] ) >> > a . shape torch . Size ( [ 3 , 4 , 2 ] )

In addition to element-wise operations, there are other operations that operate on the entire tensor. We'll cover those operations and apply them to neural networks and other machine learning concepts in the next part of this Linear Algebra 101 series. Stay tuned!

Quiz

Take the quiz below to see if you've mastered the concepts above. Don't worry if you can't answer them right away. Each question contains multiple concepts, so review the article if you're stuck.

Question 1 How many positive values are in the vector that results from the two operations below? You can do this without a calculator nor PyTorch. ReLU ( σ ( [ − 1 10 0 − 10 ] ) ) \text{ReLU} \left( \sigma \left( \begin{bmatrix} -1 \\ 10 \\ 0 \\ -10 \\ \end{bmatrix} \right) \right) ReLU σ − 1 10 0 − 10 Answer: Four The sigmoid function, σ ( x ) \sigma(x) σ(x), squashes all values into the range ( 0 , 1 ) (0, 1) (0,1). Because only σ ( − ∞ ) = 0 \sigma(-\infty) = 0 σ(−∞)=0, any other input into sigmoid outputs a positive number. The output of the sigmoid function gives us a vector of four positive numbers. Since ReLU only filters out negative numbers, the resulting vector will also have four positive numbers. Show answer

Question 2 M ∈ R 10 × 10 , N ∈ R 10 × 10 M \in \mathbb{R}^{10 \times 10}, N \in \mathbb{R}^{10 \times 10} M∈R10×10,N∈R10×10. How many elements does σ ( M + N ) \sigma(M + N) σ(M+N) have? Answer: 100 M ∈ R 10 × 10 M \in \mathbb{R}^{10 \times 10} M∈R10×10 and N ∈ R 10 × 10 N \in \mathbb{R}^{10 \times 10} N∈R10×10 each have 10 10 10 rows and 10 10 10 columns. Thus, each have 100 100 100 elements. Addition is an element-wise operation that results in 100 100 100 elements total. Applying the sigmoid function, an element-wise operation, doesn't change the number of elements. Show answer