• / 19
  • 下载费用:10 金币  

深度学习回传递简介.pptx

关 键 词:
深度学习回传递简介.pptx
资源描述:
Lecture 3: CNN: Back-propagation,boris. ginzburg@intel.com,Agenda,Introduction to gradient-based learning for Convolutional NN Backpropagation for basic layers Softmax Fully Connected layer Pooling ReLU Convolutional layer Implementation of back-propagation for Convolutional layer CIFAR-10 training,Good Links,http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf http://www.iro.umontreal.ca/~pift6266/H10/notes/gradient.html#flowgraph,Gradient based training,Conv. NN is a just cascade of functions: f(x0,w)  y, where x0 is image [28,28], w – network parameters (weights, bias) y – softmax output= probability that x belongs to one of 10 classes 09,,Gradient based training,We want to find parameters W, to minimize an error E (f(x0,w),y0) = -log (f(x0,w)- y0). For this we will do iterative gradient descent: w(t) = w(t-1) – λ * −𝜕𝐸 𝜕𝑤 (t) How do we compute gradient of E wrt weights? Loss function E is cascade of functions. Let’ s go layer by layer, from last layer back, and use the chain rule for gradient of complex functions:𝜕𝐸 𝜕 𝑦 𝑙−1 = 𝜕𝐸 𝜕 𝑦 𝑙 × 𝜕 𝑦 𝑙 (𝑤, 𝑦 𝑙−1 ) 𝜕 𝑦 𝑙−1 𝜕𝐸 𝜕 𝑤 𝑙 = 𝜕𝐸 𝜕 𝑦 𝑙 × 𝜕 𝑦 𝑙 (𝑤, 𝑦 𝑙−1 ) 𝜕 𝑤 𝑙,,LeNet topology,FORWARD,BACKWARD,Data Layer,Convolutional layer [5x5],Convolutional layer [5x5],Pooling [2x2, stride 2],Pooling [2x2, stride 2],Inner Product,ReLUP,Inner Product,Soft Max + LogLoss,Layer:: Backward( ),class Layer { Setup (bottom, top); // initialize layer Forward (bottom, top); //compute : 𝑦 𝑙 =𝑓 𝑤 𝑙 , 𝑦 𝑙−1 Backward( top, bottom); //compute gradient } Backward: We start from gradient 𝜕𝐸 𝜕 𝑦 𝑙 from last layer and 1) propagate gradient back : 𝜕𝐸 𝜕 𝑦 𝑙 → 𝜕𝐸 𝜕 𝑦 𝑙−1 2) compute the gradient of E wrt weights wl: 𝜕𝐸 𝜕 𝑤 𝑙,Softmax with LogLoss Layer,Consider the last layer (softmax with log-loss ): 𝐸=− log 𝑝 𝑦0 =−log⁡( 𝑒 𝑦0 0 9 𝑒 𝑦𝑘 ) = −𝑦0+ log ( 0 9 𝑒 𝑦𝑘 ) For all k=09 , except k0 (right answer) we want to decrease pk:𝜕 𝐸 𝜕 𝑦 𝑘 = 𝑒 𝑦 𝑘 0 9 𝑒 𝑦 𝑘 = 𝑝 𝑘 , for k=k0 (right answer) we want to increase pk:𝜕 𝐸 𝜕 𝑦 𝑘0 =−1+ 𝑝 𝑘0 See http://ufldl.stanford.edu/wiki/index.php/Softmax_Regression,Inner product (Fully Connected) Layer,Fully connected layer is just Matrix – Vector multiplication:𝑦 𝑙 = 𝑊 𝑙 ∗ 𝑦 𝑙−1 So 𝜕 𝐸 𝜕 𝑦 𝑙−1 = 𝜕 𝐸 𝜕 𝑦 𝑙 ∗ 𝑊 𝑙 𝑇 and 𝜕 𝐸 𝜕 𝑊 𝑙 = 𝜕 𝐸 𝜕 𝑦 𝑙 ∗ 𝑦 𝑙−1 Notice that we need 𝑦 𝑙−1 , so we should keep these values on forward pass.,ReLU Layer,Rectified Linear Unit : 𝑦 𝑙 = max (0, 𝑦 𝑙−1 ) so 𝜕 𝐿 𝜕 𝑦 𝑙−1 = =0, 𝑖𝑓 ( 𝑦 𝑙 0) = 𝜕 𝐿 𝜕 𝑦 𝑙 , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒,Max-Pooling Layer,Forward : for (p = 0; p k; p++) for (q = 0; q k; q++) yn (x, y) = max( yn (x, y), yn-1(x + p, y + q));Backward:𝜕 𝐿 𝜕 𝑦 𝑛−1 (𝑥+𝑝,𝑦+𝑞)= =0, 𝑖𝑓 ( 𝑦 𝑛 𝑥,𝑦 != 𝑦 𝑛−1 𝑥+𝑝,𝑦+𝑞 ) = 𝜕 𝐿 𝜕 𝑦 𝑛 (𝑥,𝑦), 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 Quiz: What will be gradient for Sum-pooling? What will be gradient if pooling areas overlap? (e.g. stride =1)?,Convolutional Layer :: Backward,Let’ s use the chain rule for convolutional layer𝜕𝐸 𝜕 𝑦 𝑙−1 = 𝜕𝐸 𝜕 𝑦 𝑙 × 𝜕 𝑦 𝑙 (𝑤, 𝑦 𝑙−1 ) 𝜕 𝑦 𝑙−1 ; 𝜕𝐸 𝜕 𝑤 𝑙 = 𝜕𝐸 𝜕 𝑦 𝑙 × 𝜕 𝑦 𝑙 (𝑤, 𝑦 𝑙−1 ) 𝜕 𝑤 𝑙−1 3D - Convolution: for (n = 0; n N; n ++)for (m = 0; m M; m ++)for(y = 0; yY; y++) for(x = 0; xX; x++)for (p = 0; p K; p++) for (q = 0; q K; q++) yL (n; x, y) += yL-1(m, x+p, y+q) * w (n ,m; p, q);,,,,,,,Convolutional Layer :: Backward,Example: M=1, N=2, K=2. Take one pixel in level (n-1). Which pixels in next level are influenced by it?,,2x2,,,,,,,,,,,,,2x2,Convolutional Layer :: Backward,Let’ s use the chain rule for convolutional layer: Gradient 𝜕𝐸 𝜕 𝑦 𝑙−1 is sum of convolution with gradients 𝜕𝐸 𝜕 𝑦 𝑙 over all feature maps from “upper” layer:𝜕𝐸 𝜕 𝑦 𝑙−1 = 𝜕𝐸 𝜕 𝑦 𝑙 × 𝜕 𝑦 𝑙 (𝑤, 𝑦 𝑙−1 ) 𝜕 𝑦 𝑙−1 = 𝑛=1 𝑁 𝑏𝑎𝑐𝑘_𝑐𝑜𝑟𝑟(𝑊, 𝜕𝐸 𝜕 𝑦 𝑙 ) Gradient of E wrt w is sum over all “pixels” (x,y) in the input map : 𝜕𝐸 𝜕 𝑤 𝑙 = 𝜕𝐸 𝜕𝑙 × 𝜕 𝑦 𝑙 (𝑤, 𝑦 𝑙−1 ) 𝜕 𝑤 𝑙 = 0≤𝑥≤𝑋 0𝑦𝑌 𝜕𝐸 𝜕 𝑦 𝑙 𝑥,𝑦 ° 𝑦 𝑙−1 (𝑥,𝑦),Convolutional Layer :: Backward,How this is implemented: // im2col data to col_dataim2col_cpu( bottom_data , CHANNELS_, HEIGHT_, WIDTH_, KSIZE_, PAD_, STRIDE_, col_data);// gradient w.r.t. weight.: caffe_cpu_gemm (CblasNoTrans, CblasTrans, M_, K_, N_, 1., top_diff, col_data , 1.,weight_diff ); // gradient w.r.t. bottom data: caffe_cpu_gemm (CblasTrans, CblasNoTrans, K_, N_, M_, 1., weight , top_diff , 0.,col_diff ); // col2im back to the data col2im_cpu(col_diff, CHANNELS_, HEIGHT_, WIDTH_, KSIZE_, PAD_, STRIDE_, bottom_diff );,Convolutional Layer : im2col,Implementation is based on reduction of convolution layer to matrix – matrix multiply ( See Chellapilla et all , “High Performance Convolutional Neural Networks for Document Processing” ),Convolutional Layer: im2col,,CIFAR-10 Training,http://www.cs.toronto.edu/~kriz/cifar.html https://www.kaggle.com/c/cifar-10 60000 32x32 colour images in 10 classes, with 6000 images per class. There are: 50000 training images 10000 test images.,Exercises,Look at definition of following layers (Backward) sigmoid, tanh Implement a new layer: softplus 𝑦 𝑙 = log ( 1+ 𝑒 𝑦 𝑙−1 ) Train CIFAR-10 with different topologies Port CIFAR-100 to caffe,
展开阅读全文
  微传网所有资源均是用户自行上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作他用。
0条评论

还可以输入200字符

暂无评论,赶快抢占沙发吧。

关于本文
本文标题:深度学习回传递简介.pptx
链接地址:https://www.weizhuannet.com/p-10054656.html
微传网是一个办公文档、学习资料下载的在线文档分享平台!

网站资源均来自网络,如有侵权,请联系客服删除!

 网站客服QQ:80879498  会员QQ群:727456886

copyright@ 2018-2028 微传网络工作室版权所有

     经营许可证编号:冀ICP备18006529号-1 ,公安局备案号:13028102000124

收起
展开