File size: 11,437 Bytes
ce52059
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
When 0:00:00 - 0:00:02
我看到这些画面:"一张写着"弥补你优化问题"的标志的照片"
我发现这些内容:"黑圆"
我检测到这些标签:"黑色 | 商标 | 圆 | 螺旋 | 标志  | 漩涡  | 文本 ."
我识别到这些文字:"Setting up your optimization problem Numerical approximation of gradients deeplearning.ai"
我听到有人说:" When you implement backpropagation, "


When 0:00:02 - 0:00:07
我看到这些画面:"一个穿着衬衫和领带的男人 坐在电脑面前"
我发现这些内容:"男人穿着白色扣扣衬衫, 紫色装饰玩具在男孩手里, 一张照片框在桌上"
我检测到这些标签:"握着/抓着 | 圆 | 电脑 | 计算机显示器 | 电脑屏幕 | 正装衬衫 | 人 | 形象 | 男人 | 照片  | 屏幕  | 坐/放置/坐落."
我识别到这些文字:"Settingupyour optimizationproblem Numerical approximati ofgradients deeplearning.ai"
我听到有人说:" When you implement backpropagation,  you find that there's a test called gradient checking,  that can really help you make sure that "


When 0:00:07 - 0:00:10
我看到这些画面:"一个人坐在电脑屏幕和监视器前"
我发现这些内容:"身穿白衬衫 桌子上一幅画框的男子 标志是白色的"
我检测到这些标签:"圆 | 电脑 | 计算机显示器 | 电脑屏幕 | 手 | 形象 | 男人 | 照片  | 屏幕  | 坐/放置/坐落 | 扬声器."
我识别到这些文字:"Settingupyour optimizationproblem Numerical approximatic ofgradients deeplearning.ai"
我听到有人说:" that can really help you make sure that  your implementation of backprop is correct. "


When 0:00:10 - 0:00:41
我看到这些画面:"一个穿着衬衫和领带的男人 坐在电脑面前"
我发现这些内容:"男子穿着白色扣扣衬衫"
我检测到这些标签:"圆 | 电脑 | 电脑屏幕 | 正装衬衫 | 形象 | 男人 | 屏幕  | 扬声器 | 站/矗立/摊位 | 领带 ."
我识别到这些文字:"Settingupyour optimizationproblem Numerical approximatic ofgradients deeplearning.ai"
我听到有人说:" your implementation of backprop is correct.  Because sometimes you write all these equations and you're just not  100 percent sure if you got  all the details right in implementing backpropagation.  So, in order to build up to gradient checking,  let's first talk about how to  numerically approximate computations of gradients.  And in the next video, we'll talk about how you can  implement gradient checking to make sure that  your implementation of backprop is correct.  So, let's take the function f and we plot it here.  And remember, this is f of theta equals theta cubed.  And let's again start off with some value of theta. "


When 0:00:41 - 0:01:12
我看到这些画面:"以斜度和 Y 轴表示的直线图"
我发现这些内容:"调"
我检测到这些标签:"图表 | 曲线 | 数字 | 线条 | 故事情节 | 指/朝向 | 坡 ."
我识别到这些文字:"Checking your derivative computation  ba 3 AndrewNg"
我听到有人说:" And let's again start off with some value of theta.  Let's say theta equals 1.  Now, instead of just nudging theta to the right,  to get theta plus epsilon,  we're gonna nudge it to the right and nudge it to the left to get  theta minus epsilon as well as theta plus epsilon.  So, this is 1, this is 1.01, this is 0.99,  where again epsilon is same as before, is 0.01.  It turns out that rather than taking this little triangle and  computing the height over the width, "


When 0:01:12 - 0:01:57
我看到这些画面:"以斜度和 Y 轴表示的直线图"
我发现这些内容:"调"
我检测到这些标签:"角  | 图表 | 曲线 | 数字 | 线条 | 指/朝向 | 坡  | 三角形 ."
我识别到这些文字:")=² o.qq1 1.01 =6.01 AndrewNg"
我听到有人说:" computing the height over the width,  you can get a much better estimate of the gradient if you take this point,  f at theta minus epsilon and this point,  and you instead compute the height over width of this bigger triangle.  So, for technical reasons,  um, which I won't go into,  uh, the height over width of this bigger green triangle gives you  a much better approximation to the derivative at theta.  And the one, you know, sort of instead of taking just this little triangle on  the upper right is as if you have two triangles, right?  This one on the upper right and this one on the lower left,  and you're kind of taking both of them into account by,  uh, using this bigger green triangle.  So, rather than a one-sided difference,  you're taking a two-sided difference. "


When 0:01:57 - 0:02:07
我看到这些画面:"带有直线和曲线的图形"
我发现这些内容:"调"
我检测到这些标签:"角  | 图表 | 曲线 | 数字 | 线条 | 指/朝向 | 坡  | 三角形 ."
我识别到这些文字:")=3 0.99 1.01 =6.01 AndrewNg"
我听到有人说:" you're taking a two-sided difference.  So, let's work on the math.  This point here is f of theta plus epsilon.  This point here is f of theta minus epsilon. "


When 0:02:07 - 0:02:33
我看到这些画面:"以绿线和蓝线标示的线条图"
我发现这些内容:"调"
我检测到这些标签:"图表 | 曲线 | 数字 | 线条 | 指/朝向 | 坡 ."
我识别到这些文字:"lete C(ote)-f(e-s) (6-)+ o.qq1 1.01 =6.01 AndrewNg"
我听到有人说:" This point here is f of theta minus epsilon.  So, the height of this big green triangle is f of theta plus  epsilon minus f of theta minus epsilon.  And then the width, you know,  this is one epsilon,  this is two epsilon.  So, the width of this green triangle is two epsilon.  So, the height over width is gonna be first the height,  so that's f of theta plus epsilon minus f of theta minus epsilon divided by the width. "


When 0:02:33 - 0:03:52
我看到这些画面:"带有图表和一些数学的白板"
我发现这些内容:"绿色街道标志,白墙上涂有涂鸦"
我检测到这些标签:"锥形物 | 数字 | 线条 | 指/朝向 | 坡  | 三角形 ."
我识别到这些文字:"Checking your derivative computation ) f(o+e)- C(e-e)  9(6 2 lete Clote)-f(o-s) (lo)-(o.qg) 3.0001 f(6-E)~ 2(0·0l) g(e)-36² D 0.99 1.01 6.01 AndrewNg"
我听到有人说:" so that's f of theta plus epsilon minus f of theta minus epsilon divided by the width.  So, that was two epsilon which we worked on down here.  Um, and this should hopefully be close to g of theta.  So, applying the values, remember f of theta is theta cubed.  So, this is theta plus epsilon is 1.01.  So, I'm gonna take the cube of that minus now 0.99.  Take the cube of that, divide it by 2,  times 0.99.  So, that's 0.01.  Um, feel free to pause the video and plug this in the calculator.  You should get that this is 3.0001.  Whereas from the previous slide,  we saw that g of, um, theta,  this was 3 theta squared.  So, when theta equals 1, this is 3.  So, these two values are actually very close to each other.  The approximation error is, um,  now 0.0001.  Whereas on the previous slide,  we're taking the one-sided difference,  just theta and theta plus epsilon.  We had gotten 3.0301.  And so, the approximation error was, uh,  0.03, right, rather than, um, 0.0001.  But so, with this two-sided difference way of approximating the derivative,  you find that this is extremely close to 3.  And so, this gives you, you know,  much greater confidence that g of theta is a, "


When 0:03:52 - 0:04:37
我看到这些画面:"白板,有一堆方程式和图表"
我发现这些内容:"墙是白色的,绿色的街道标志"
我检测到这些标签:"锥形物 | 数字 | 线条 | 指/朝向 | 坡  | 三角形 ."
我识别到这些文字:"Checking your derivative computation =² f(o+e)- C(e-e) 2 [lete flote)-f(o-s) (l01)-(o.qg) 3.0001&3 (6-E)~- 2(0·0) g(e)=3θ²= 0.99 1.01 9 eor! =6.01 sil:了.03ol. esor.o3 AndrewNg"
我听到有人说:" much greater confidence that g of theta is a,  uh, probably a correct implementation of the derivative of f.  When you use this method for gradient checking and back propagation,  this turns out to run twice as slow as you would use a one-sided difference.  It turns out that in practice,  I think it's worth it to use this other method,  because it's just much more accurate.  Um, little bit of optional theory.  For those of you that are a little bit more familiar with calculus,  it turns out that, um,  and it's okay if you don't get what I'm about to say here.  But it turns out that the formal definition of the derivative is, um,  for very small values, um,  values of epsilon is f of theta plus epsilon minus f of theta minus epsilon over 2 epsilon.  And the formal definition of the derivative is in the limits of exactly that formula on the right, "


When 0:04:37 - 0:06:35
我看到这些画面:"用绿箭在白板上的数学问题"
我发现这些内容:"长板后面的白色墙壁 绿色街道标志"
我检测到这些标签:"数字 | 线条 | 故事情节 | 指/朝向 | 坡 ."
我识别到这些文字:"Checking your derivative computation f(o+t)- C(e-e) 2 [(ete) Clote)-f(o-s) (101)-[o.qg) 3. 0 001 & 3 (6-≤)~~ 2(0·0l) g(e)=36²= 0.q9 1.01 Lor =6.01 slil。:了.030l. esor.o3 f(e+c)-C(o-e) ①(c²) fo+)-f(e) em: (e f'(e)= lim 2 0:01 个 -0001 AndrewNg"
我听到有人说:" And the formal definition of the derivative is in the limits of exactly that formula on the right,  as epsilon goes to 0.  And the definition of a limit is something that you learn if you,  um, take a calculus class,  but I won't go into that here.  And it turns out that for a non-zero value of epsilon,  you can show that the error of this approximation is on  the order of epsilon squared.  And remember epsilon is a very small number.  So if epsilon is, you know,  0.01, which it is here,  then epsilon squared is 0.0001.  Um, the big O notation means the error is actually some constant times this,  but this is actually exactly our approximation error.  Uh, so the big O constant happened to be 1.  Whereas in contrast, if we were to use this formula,  the other one, then the error is on the order of epsilon.  And again, when,  epsilon is a number less than 1,  then epsilon is actually much bigger than epsilon squared,  which is why this formula here is actually a much less accurate approximation  than, um, this formula on the left,  which is why when doing gradient checking,  we'd rather use this two-sided difference where you compute f of theta plus epsilon minus f of theta minus epsilon,  um, and then divide by 2 epsilon rather than this one-sided difference,  which is less accurate.  If you didn't understand my last few comments,  all of these things down here, don't worry about it.  Uh, that's really more for those of you that are a bit more familiar with calculus and with numerical approximations.  But the takeaway is that this two-sided difference formula is much more accurate.  And so that's what we're gonna use when we do gradient checking in the next video.  So you're seeing how by taking a two-sided difference,  you can numerically verify whether or not a function g,  g of theta that someone else gives you is a correct implementation of the derivative of a function f.  Let's now see how to do this.  Let's see how we can use this to verify whether or not your back propagation implementation is correct or if,  you know, there might be a bug in there that you need to go in and tease out. "