Perceptron

What Is a Perceptron?

A Perceptron is a simple binary classification model. It receives multiple input signals and produces one output signal. The output of a perceptron can only be one of two values:

A perceptron with two input signals can be represented as:

$x_1, x_2 \rightarrow y$

Where:

(x_1, x_2) are input signals.
(w_1, w_2) are weights.
(y) is the output signal.
The circle, or node, is called a neuron.

Each input signal is multiplied by its corresponding weight before being sent to the neuron. The neuron then calculates the weighted sum of the input signals. If the weighted sum exceeds a certain threshold, the neuron outputs 1. Otherwise, it outputs 0.

This process is also called neuron activation. The threshold is usually represented by (\theta).

The perceptron can be written as:

$y = \begin{cases} 0, & w_1x_1 + w_2x_2 \leq \theta \\\\ 1, & w_1x_1 + w_2x_2 > \theta \end{cases}$

The weights control the importance of each input signal. A larger weight means the corresponding input has a stronger influence on the final output.

Simple Logic Gates

Perceptrons can be used to implement simple logic gates.
Common examples include:

AND gate
NAND gate
OR gate

These gates are basic building blocks of digital circuits.

AND Gate

The AND gate has two inputs and one output. It outputs 1 only when both inputs are 1. Otherwise, it outputs 0.
The truth table is:

(x_1)	(x_2)	Output
0	0	0
1	0	0
0	1	0
1	1	1

To represent an AND gate using a perceptron, we need to find suitable values for (w_1), (w_2), and (\theta).

There are many possible parameter combinations.
For example:

$(w_1, w_2, \theta) = (0.5, 0.5, 0.7)$

Or:

$(w_1, w_2, \theta) = (1.0, 1.0, 1.0)$

NAND Gate

The NAND gate is the opposite of the AND gate. It outputs 0 only when both inputs are 1.
In all other cases, it outputs 1.
The truth table is:

(x_1)	(x_2)	Output
0	0	1
1	0	1
0	1	1
1	1	0

A possible parameter setting is:

$(w_1, w_2, \theta) = (-0.5, -0.5, -0.7)$

In fact, we can obtain a NAND gate by reversing the signs of the parameters used for the AND gate.

OR Gate

The OR gate outputs 1 if at least one input is 1. It outputs 0 only when both inputs are 0.
The truth table is:

(x_1)	(x_2)	Output
0	0	0
1	0	1
0	1	1
1	1	1

A possible parameter setting is:

$(w_1, w_2, \theta) = (0.5, 0.5, 0)$

Simple Implementation of the AND Gate

We can implement the AND gate using a simple Python function.
The function receives (x_1) and (x_2), calculates the weighted sum, and compares it with the threshold.

def AND(x1, x2):
    w1, w2, theta = 0.5, 0.5, 0.7
    res = x1 * w1 + x2 * w2

    if res <= theta:
        return 0
    else:
        return 1

print(AND(0, 0))  # 0
print(AND(1, 0))  # 0
print(AND(0, 1))  # 0
print(AND(1, 1))  # 1

This implementation works, but we can rewrite it in a more common form using weights and bias.

Introducing Weights and Bias

Instead of using the threshold (\theta), we can rewrite the perceptron using a bias term (b).

Since:

$\theta = -b$

The perceptron can be rewritten as:

$y = \begin{cases} 0, & b + w_1x_1 + w_2x_2 \leq 0 \\\\ 1, & b + w_1x_1 + w_2x_2 > 0 \end{cases}$

Where:

(w_1, w_2) are weights.
(b) is the bias.

The weights control the importance of input signals. The bias controls how easily the neuron is activated.
A larger bias makes the neuron easier to activate.
A smaller bias makes the neuron harder to activate.

AND Gate with NumPy

import numpy as np

def AND(x1, x2):
    x = np.array([x1, x2])
    w = np.array([0.5, 0.5])
    b = -0.7

    tmp = np.sum(w * x) + b

    if tmp <= 0:
        return 0
    else:
        return 1

print(AND(0, 0))  # 0
print(AND(1, 0))  # 0
print(AND(0, 1))  # 0
print(AND(1, 1))  # 1

Here:

x stores the input signals.
w stores the weights.
b is the bias.
np.sum(w * x) + b calculates the weighted sum plus bias.

NAND Gate with NumPy

import numpy as np

def NAND(x1, x2):
    x = np.array([x1, x2])
    w = np.array([-0.5, -0.5])
    b = 0.7

    tmp = np.sum(w * x) + b

    if tmp <= 0:
        return 0
    else:
        return 1

print(NAND(0, 0))  # 1
print(NAND(1, 0))  # 1
print(NAND(0, 1))  # 1
print(NAND(1, 1))  # 0

Compared with the AND gate, the NAND gate uses opposite weight signs and a different bias.

OR Gate with NumPy

import numpy as np

def OR(x1, x2):
    x = np.array([x1, x2])
    w = np.array([0.5, 0.5])
    b = -0.2

    tmp = np.sum(w * x) + b

    if tmp <= 0:
        return 0
    else:
        return 1

print(OR(0, 0))  # 0
print(OR(1, 0))  # 1
print(OR(0, 1))  # 1
print(OR(1, 1))  # 1

AND, NAND, and OR gates all have the same perceptron structure.
The only difference is the choice of weights and bias.

Limitation of a Single-Layer Perceptron

Now let us consider the XOR gate. The XOR gate outputs 1 only when exactly one of the two inputs is 1.
The truth table is:

(x_1)	(x_2)	Output
0	0	0
1	0	1
0	1	1
1	1	0

A single-layer perceptron cannot represent the XOR gate. The reason is that a single perceptron can only separate data using a straight line.
For example, the OR gate can be represented as:

$y = \begin{cases} 0, & -0.5 + x_1 + x_2 \leq 0 \\\\ 1, & -0.5 + x_1 + x_2 > 0 \end{cases}$

This means the perceptron separates the input space using the straight line:

$-0.5 + x_1 + x_2 = 0$

One side of the line outputs 0, and the other side outputs 1. However, XOR cannot be separated by a single straight line.
This is called a non-linearly separable problem. A single-layer perceptron can only solve linearly separable problems.
This is the main limitation of the basic perceptron.

Linear Space and Nonlinear Space

A space divided by a straight line is called a linear space. A space that cannot be divided by a straight line is called a nonlinear space. AND, NAND, and OR are linearly separable. XOR is not linearly separable. Therefore, XOR cannot be solved by a single perceptron. To solve XOR, we need to combine multiple perceptrons.
This leads to the idea of a multi-layer perceptron.

Multi-Layer Perceptron

Although a single perceptron cannot implement XOR, we can build XOR by combining AND, NAND, and OR gates.
The logic is:

$s_1 = NAND(x_1, x_2)$ $s_2 = OR(x_1, x_2)$ $y = AND(s_1, s_2)$

In other words:

$XOR(x_1, x_2) = AND(NAND(x_1, x_2), OR(x_1, x_2))$

This structure contains multiple layers.

The first layer receives the input signals.
The second layer calculates intermediate outputs.
The final layer produces the XOR result.

By stacking layers, perceptrons can represent more complex functions.

XOR Implementation

import numpy as np

def AND(x1, x2):
    x = np.array([x1, x2])
    w = np.array([0.5, 0.5])
    b = -0.7

    tmp = np.sum(w * x) + b

    if tmp <= 0:
        return 0
    else:
        return 1

def NAND(x1, x2):
    x = np.array([x1, x2])
    w = np.array([-0.5, -0.5])
    b = 0.7

    tmp = np.sum(w * x) + b

    if tmp <= 0:
        return 0
    else:
        return 1

def OR(x1, x2):
    x = np.array([x1, x2])
    w = np.array([0.5, 0.5])
    b = -0.2

    tmp = np.sum(w * x) + b

    if tmp <= 0:
        return 0
    else:
        return 1

def XOR(x1, x2):
    s1 = NAND(x1, x2)
    s2 = OR(x1, x2)
    y = AND(s1, s2)
    return y

print(XOR(0, 0))  # 0
print(XOR(1, 0))  # 1
print(XOR(0, 1))  # 1
print(XOR(1, 1))  # 0

Understanding the XOR Structure

The XOR implementation can be understood as a small neural network.
Layer 0 receives the original inputs:

$x_1, x_2$

Layer 1 calculates two intermediate signals:

$s_1 = NAND(x_1, x_2)$ $s_2 = OR(x_1, x_2)$

Layer 2 calculates the final output:

$y = AND(s_1, s_2)$

This is a multi-layer perceptron structure.

The key idea is:

A single perceptron can only represent simple linear decision boundaries, but multiple perceptrons can represent more complex nonlinear functions.

Summary

A perceptron is a basic binary classification model. It receives input signals, multiplies them by weights, adds a bias, and produces an output of either 0 or 1.
AND, NAND, and OR gates can be implemented using a single perceptron. A single-layer perceptron can only solve linearly separable problems. XOR is not linearly separable, so it cannot be solved by a single perceptron. By combining multiple perceptrons, we can implement XOR.
This is the basic idea behind multi-layer perceptrons.
The perceptron is a very simple algorithm, but it is an important foundation for understanding neural networks.

```