Deep Learning in Medical Image Analysis

a Single-layer neural network b

Yüklə 4,25 Mb.

səhifə	3/25
tarix	14.06.2022
ölçüsü	4,25 Mb.
	#89427

1 2 3 4 5 6 7 8 9 ... 25

DEEP LEARNING
Feed-Forward Neural Networks

a Single-layer neural network
b Multilayer neural network

Output layer

Hidden layer

t layer

Input layer

Figure 1

Architectures of two feed-forward neural networks.

using deep models for different applications in medical imaging, including image registration, anatomy localization, lesion segmentation, detection of objects and cells, tissue segmentation, and computer-aided detection (CADe) and CADx. Finally, in Section 4 we conclude by summarizing research trends and suggesting directions for further improvements.

DEEP LEARNING

In this section, we explain the fundamental concepts of feed-forward neural networks and basic deep models in the literature. We focus on learning hierarchical feature representations from data. We also discuss how to efficiently learn parameters of deep architecture by reducing overfitting.

Feed-Forward Neural Networks

In machine learning, artificial neural networks are a family of models that mimic the structural elegance of the neural system and learn patterns inherent in observations. The perceptron (64) is the earliest trainable neural network with a single-layer architecture,¹composed of an input layer and an output layer. A perceptron, or a modified perceptron with multiple output units (Figure 1a), is regarded as a linear model, prohibiting its application in tasks involving complicated data patterns, despite the use of nonlinear activation functions in the output layer.
This limitation can be overcome by introducing a so-called hidden layer between the input layer and the output layer. Note that in neural networks the units of the neighboring layers are fully connected to one another, but there are no connections among units in the same layer. For a two-layer neural network (Figure 1b), also known as a multilayer perceptron, given an input
vector v = [v_i ] ∈ R^D, we can write the estimation function of an output unit y_k as a composition
function as follows:

(2)
⎛_M
(2)
(1)
_D
(1) (1)
(2)⎞

y_k (v; ①) = f
⎝ j =1
W_kjf

i =1
W _jiv_i + b _j
+ b_k^⎠, (1)

Annu. Rev. Biomed. Eng. 2017.19:221-248. Downloaded from www.annualreviews.org Access provided by 82.215.98.77 on 06/08/22. For personal use only.

¹In general, the input layer is not counted.

where the superscript denotes a layer index, f ⁽¹⁾(·) and f ⁽²⁾(·) denote nonlinear activation functions of units at the specified layers, M is the number of hidden units, and ① = {W⁽¹⁾, W⁽²⁾, b⁽¹⁾, b⁽²⁾} is a parameter set.²Conventionally, the hidden units’ activation function, f ⁽¹⁾(·), is commonly defined with a sigmoidal function such as a logistic sigmoid function or a hyperbolic tangent function, whereas the output units’ activation function f ⁽²⁾(·) is dependent on the target task. Because the estimation proceeds in a forward direction, this type of network is also referred to as a feed-forward neural network.
When the hidden layer in Equation 1 is regarded as a feature extractor, φ(v) = [φ_j (v)] ∈ R^M
from an input v, the output layer is only a simple linear model,

(2)
⎛_M
(2) (2)⎞

y_k (v; ①) = f
⎝ j =1
W_kjφ_j (v) + b_k^⎠, (2)

Annu. Rev. Biomed. Eng. 2017.19:221-248. Downloaded from www.annualreviews.org Access provided by 82.215.98.77 on 06/08/22. For personal use only.

i =1

ji

j
where φ_j (v) ≡ f ⁽¹⁾ ^.^DW ⁽¹⁾v_i + b ⁽¹⁾ . The same interpretation holds when there is a higher
number of hidden layers. Thus, it is intuitive that the role of hidden layers is to find features that are informative for the target task.
The practical use of neural networks requires that the model parameters ① be learned from data. The problem of parameter learning can be formulated as the minimization of the error function. From an optimization perspective, the error function E for neural networks is highly nonlinear and nonconvex. Thus, there is no analytic solution of the parameter set ①. Instead, one can use a gradient descent algorithm by updating the parameters iteratively. In order to utilize a
gradient descent algorithm, there must be a way to compute a gradient ∇ E(①) evaluated at the parameter set ①.
For a feed-forward neural network, the gradient can be efficiently evaluated by means of error back-propagation (65). Once the gradient vector of all the layers is known, the parameters ①
∈ {W⁽¹⁾, W⁽²⁾, b⁽¹⁾, b⁽²⁾} can be updated as follows:
①⁽^τ⁺¹⁾ = ①⁽^τ⁾ − η∇ E ①⁽^τ⁾, (3)
where η is a learning rate and τ denotes an iteration index. The update process is repeated until convergence or until the predefined number of iterations is reached. As for the parameter update in Equation 3, the stochastic gradient descent with a small subset of training samples, termed a minibatch, is commonly used in the literature (66).

Yüklə 4,25 Mb.

Dostları ilə paylaş:

1 2 3 4 5 6 7 8 9 ... 25

Deep Learning in Medical Image Analysis

a Single-layer neural network b

DEEP LEARNING

Feed-Forward Neural Networks