Deep Learning in Medical Image Analysis


a Single-layer neural network b



Yüklə 4,25 Mb.
səhifə3/25
tarix14.06.2022
ölçüsü4,25 Mb.
#89427
1   2   3   4   5   6   7   8   9   ...   25
a Single-layer neural network
b Multilayer neural network


Output layer

Hidden layer

t layer


Input layer




Figure 1

Architectures of two feed-forward neural networks.

using deep models for different applications in medical imaging, including image registration, anatomy localization, lesion segmentation, detection of objects and cells, tissue segmentation, and computer-aided detection (CADe) and CADx. Finally, in Section 4 we conclude by summarizing research trends and suggesting directions for further improvements.




  1. DEEP LEARNING


In this section, we explain the fundamental concepts of feed-forward neural networks and basic deep models in the literature. We focus on learning hierarchical feature representations from data. We also discuss how to efficiently learn parameters of deep architecture by reducing overfitting.


    1. Feed-Forward Neural Networks


In machine learning, artificial neural networks are a family of models that mimic the structural elegance of the neural system and learn patterns inherent in observations. The perceptron (64) is the earliest trainable neural network with a single-layer architecture,1 composed of an input layer and an output layer. A perceptron, or a modified perceptron with multiple output units (Figure 1a), is regarded as a linear model, prohibiting its application in tasks involving complicated data patterns, despite the use of nonlinear activation functions in the output layer.
This limitation can be overcome by introducing a so-called hidden layer between the input layer and the output layer. Note that in neural networks the units of the neighboring layers are fully connected to one another, but there are no connections among units in the same layer. For a two-layer neural network (Figure 1b), also known as a multilayer perceptron, given an input
vector v = [vi ] ∈ RD, we can write the estimation function of an output unit yk as a composition
function as follows:

(2)
M
(2)
(1)
D
(1) (1)
(2)⎞

yk (v; ) = f
j =1
Wkj f


i =1
W ji vi + b j
+ bk , (1)


Annu. Rev. Biomed. Eng. 2017.19:221-248. Downloaded from www.annualreviews.org Access provided by 82.215.98.77 on 06/08/22. For personal use only.


1 In general, the input layer is not counted.

where the superscript denotes a layer index, f (1)(·) and f (2)(·) denote nonlinear activation functions of units at the specified layers, M is the number of hidden units, and = {W(1), W(2), b(1), b(2)} is a parameter set.2 Conventionally, the hidden units’ activation function, f (1)(·), is commonly defined with a sigmoidal function such as a logistic sigmoid function or a hyperbolic tangent function, whereas the output units’ activation function f (2)(·) is dependent on the target task. Because the estimation proceeds in a forward direction, this type of network is also referred to as a feed-forward neural network.
When the hidden layer in Equation 1 is regarded as a feature extractor, φ(v) = [φj (v)] ∈ RM
from an input v, the output layer is only a simple linear model,

(2)
M
(2) (2)⎞

yk (v; ) = f
j =1
Wkj φj (v) + bk , (2)


Annu. Rev. Biomed. Eng. 2017.19:221-248. Downloaded from www.annualreviews.org Access provided by 82.215.98.77 on 06/08/22. For personal use only.

i =1

ji

j
where φj (v) ≡ f (1) .D W (1)vi + b (1) . The same interpretation holds when there is a higher
number of hidden layers. Thus, it is intuitive that the role of hidden layers is to find features that are informative for the target task.
The practical use of neural networks requires that the model parameters be learned from data. The problem of parameter learning can be formulated as the minimization of the error function. From an optimization perspective, the error function E for neural networks is highly nonlinear and nonconvex. Thus, there is no analytic solution of the parameter set . Instead, one can use a gradient descent algorithm by updating the parameters iteratively. In order to utilize a
gradient descent algorithm, there must be a way to compute a gradient ∇ E() evaluated at the parameter set .
For a feed-forward neural network, the gradient can be efficiently evaluated by means of error back-propagation (65). Once the gradient vector of all the layers is known, the parameters
∈ {W(1), W(2), b(1), b(2)} can be updated as follows:
(τ+1) = (τ)ηE (τ) , (3)
where η is a learning rate and τ denotes an iteration index. The update process is repeated until convergence or until the predefined number of iterations is reached. As for the parameter update in Equation 3, the stochastic gradient descent with a small subset of training samples, termed a minibatch, is commonly used in the literature (66).



    1. Yüklə 4,25 Mb.

      Dostları ilə paylaş:
1   2   3   4   5   6   7   8   9   ...   25




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə