Deep Learning in Medical Image Analysis

Unsupervised Feature Representation Learning

Yüklə 4,25 Mb.

səhifə	5/25
tarix	14.06.2022
ölçüsü	4,25 Mb.
	#89427

1 2 3 4 5 6 7 8 9 ... 25

Figure 2 a

Unsupervised Feature Representation Learning

Compared with shallow architectures that require a good feature extractor designed mostly by hand on the basis of expert knowledge, deep models are useful for discovering informative features from data in a hierarchical manner (i.e., from fine to abstract). Here, we introduce three deep models that are widely used in different applications for unsupervised feature representation learning.

Stacked auto-encoder. An auto-encoder or auto-associator (69) is a special type of two- layer neural network that learns a latent or compressed representation of the input by minimizing the reconstruction error between the input and output values of the network, namely the recon- struction of the input from the learned representations. Because of its simple, shallow structure, a single-layer auto-encoder’s representational power is very limited. But when multiple auto- encoders are stacked (Figure 2a) in a configuration called an SAE, one can significantly improve the representational power by using the activation values of the hidden units of one auto-encoder as the input to the next higher auto-encoder (70). One of the most important characteristics of SAEs is their ability to learn or discover highly nonlinear and complicated patterns, such as the relations among input values. When an input vector is presented to an SAE, the different layers of the network represent different levels of information. That is, the lower the layer in the network is, the simpler the patterns are, and the higher the layer is, the more complicated or abstract the patterns inherent in the input vector are.

With regard to training parameters of the weight matrices and the biases in SAE, a straight- forward approach is to apply back-propagation to the gradient-based optimization technique, beginning from random initialization by using the SAE as a conventional feed-forward neural network. Unfortunately, deep networks trained in this manner perform worse than networks with a shallow architecture, as they fall into a poor local optimum (71). To circumvent this problem, one should consider greedy layer-wise learning (10, 72). The key idea of greedy layer-wise learning

is to pretrain one layer at a time. That is, the user trains parameters of the first hidden layer with the training data as input, and then trains parameters of the second hidden layer with the output from the first hidden layer as input, and so on. In other words, the representation of the l th hidden layer is used as input for the (l + 1)-th hidden layer. An important advantage of such a pretraining technique is that it is conducted in an unsupervised manner with a standard back-propagation algorithm, enabling the user to increase the size of the data set by exploiting unlabeled samples for training.

_.
Deep belief network. A restricted Boltzmann machine (RBM) (73) is a single-layer undi- rected graphical model with a visible layer and a hidden layer. It assumes symmetric connectivities between visible and hidden layers, but no connections among units within the same layer. Because of the symmetry of the connectivities, it can generate input observations from hidden represen- tations. Therefore, an RBM naturally becomes an auto-encoder (10, 73), and its parameters are usually trained by use of a contrastive divergence algorithm (74) so as to maximize the log likeli- hood of observations. Like SAEs, RBMs can be stacked in order to construct a deep architecture, resulting in a single probabilistic model called a DBN. A DBN has one visible layer v and a series of hidden layers h⁽¹⁾, ... , h⁽^L⁾(Figure 2b). Note that when multiple RBMs are stacked hierarchically, although the top two layers still form an undirected generative model (i.e., an RBM), the lower layers form directed generative models. Thus, the joint distribution of the observed units v and the L hidden layers h⁽^l⁾(l = 1, ... , L) in DBN is

P v, h⁽¹⁾, ... , h⁽^L⁾ =
L−2

l =0
_P₍_h(l )_|_h(l +1)₎

_P_h(L−1)_,_h(L)

, (4)

Annu. Rev. Biomed. Eng. 2017.19:221-248. Downloaded from www.annualreviews.org Access provided by 82.215.98.77 on 06/08/22. For personal use only.
where P (h⁽^l⁾|h⁽^l⁺¹⁾) corresponds to a conditional distribution for the units of layer l given the units of layer l + 1, and P (h⁽^L⁻¹⁾, h⁽^L⁾) denotes the joint distribution of the units in layers L − 1 and L.
Regarding the learning of parameters, the greedy layer-wise pretraining scheme (10) can be applied in the following steps.

Train the first layer as an RBM with v = h⁽⁰⁾.
Use the first hidden layer to obtain the representation of inputs with either the mean acti- vations of P (h⁽¹⁾= 1|h⁽⁰⁾) or samples drawn according to P (h⁽¹⁾|h⁽⁰⁾), which will be used as observations for the second hidden layer.
Train the second hidden layer as an RBM, taking the transformed data (mean activations or samples) as training examples (for the visible layer of the RBM).
Iterate steps 2 and 3 for the desired number of layers, each time propagating upward either mean activations P (h⁽^l⁺¹⁾= 1|h⁽^l⁾) or samples drawn according to the conditional probability _P₍_h(l +1)_|_h(l )_).

After the greedy layer-wise training procedure is complete, one can apply the wake–sleep algo- rithm (75) to further increase the log likelihood of the observations. Usually, however, no further procedure is conducted to train the whole DBN jointly in practice.

Yüklə 4,25 Mb.

Dostları ilə paylaş:

1 2 3 4 5 6 7 8 9 ... 25