PReLU
Rectified Linear Unit (ReLU), is one of several keys to the recent success of deep networks, [other researchers] have rarely focused on the properties of the rectifiers.
First, [Kaiming et al] propose a new generalization of ReLU, which we call Parametric Rectified Linear Unit (PReLU). This activation function adaptively learns the parameters of the rectifiers, and improves accuracy at negligible extra computational cost.
[LT] its interesting that ReLUs switched with PReLUs, give an immediate 1.2% going over ReLU baseline. This begs the question, is there any advantage to parameterizing the coefficient of the positive part?
Then we [Kaiming et al] train the same architecture from scratch, with all ReLUs replaced by PReLUs (Table 2). The top-1 error is reduced to 32.64%. This is a 1.2% gain over the ReLU baseline.
update formulations of {ai} are simply derived from the chain rule.
f (yi ) = max(0, yi ) + ai min(0, yi )