machine learning - Trouble understanding Convolutional Neural Network -
i read convolutional neural networks here. started playing torch7. having confusion convolutional layer of cnn.
from tutorial,
1
the neurons in layer connected small region of layer before it, instead of of neurons in fully-connected manner.
2
for example, suppose input volume has size [32x32x3], (e.g. rgb cifar-10 image). if receptive field of size 5x5, each neuron in conv layer have weights [5x5x3] region in input volume, total of 5*5*3 = 75 weights.
3
if input layer [32x32x3], conv layer compute output of neurons connected local regions in input, each computing dot product between weights , region connected in input volume. may result in volume such [32x32x12].
i started playing conv layer might image. did in torch7. here implementation,
require 'image' require 'nn' = image.lena() model = nn.sequential() model:add(nn.spatialconvolutionmm(3, 10, 5, 5)) --depth = 3, #output layer = 10, filter = 5x5 res = model:forward(i) itorch.image(res) print(#i) print(#res)
3 512 512 [torch.longstorage of size 3] 10 508 508 [torch.longstorage of size 3]
now lets see structure of cnn
so, questions are,
question 1
is convolution done - lets take image 32x32x3. , there 5x5 filter. 5x5 filter pass through whole 32x32 image , produce convoluted images? okay, sliding 5x5 filter across whole image, 1 image, if there 10 output layers, 10 images(as see output). how these? (see image clarification if required)
question 2
what number of neurons in conv layer? number of output layers? in code i've written above, model:add(nn.spatialconvolutionmm(3, 10, 5, 5))
. 10? (no. of output layers?)
if point number 2 not make sense. according if receptive field of size 5x5, each neuron in conv layer have weights [5x5x3] region in input volume, total of 5*5*3 = 75 weights.
weight here? confused in this. in model defined in torch, there no weight. how weight playing role here?
can explain going on?
is convolution done - lets take image 32x32x3. , there 5x5 filter. 5x5 filter pass through whole 32x32 image , produce convoluted images?
for 32x32x3 input image 5x5 filter iterate on every single pixel , each pixel @ 5x5 neighborhood. neighborhood contains 5*5*3=75 values. below example image 3x3 filter on single input channel, i.e. 1 neighborhood of 3*3*1 values (source).
for each individual neighbor filter have 1 parameter (aka weight), 75 parameters. calculate 1 single output value (value @ pixel x, y) reads neighbor values, multiplies each 1 respective parameter/weight , adds @ end (see discrete convolution). optimal weights have learned during training.
so 1 filter iterate on image , generate new output, pixel pixel. if have multiple filters (i.e. second parameter in spatialconvolutionmm
>1) multiple outputs ("planes" in torch).
okay, sliding 5x5 filter across whole image, 1 image, if there 10 output layers, 10 images(as see output). how these? (see image clarification if required)
each output plane gets generated own filter. each filter has own parameters (5*5*3 parameters in example). process multiple filters same one.
what number of neurons in conv layer? number of output layers? in code i've written above, model:add(nn.spatialconvolutionmm(3, 10, 5, 5)). 10? (no. of output layers?)
you should call them weights or parameters, "neurons" doesn't fit convolutional layers. number of parameters is, described, 5*5*3=75 per filter in example. have 10 filters ("output planes") have 750 parameters total. if add second layer network model:add(nn.spatialconvolutionmm(10, 10, 5, 5))
have additional 5*5*10=250 parameters per filter , 250*10=2500 total. notice how number can grow (512 filters/output planes in 1 layer operating on 256 input planes nothing uncommon).
for further reading should @ http://neuralnetworksanddeeplearning.com/chap6.html . scroll down chapter "introducing convolutional networks". under "local receptive fields" there visualizations understand filter (one shown above).
Comments
Post a Comment