Visible to Intel only — GUID: GUID-F2D79CC8-6438-4CCE-A506-69F1601EC8D5
Visible to Intel only — GUID: GUID-F2D79CC8-6438-4CCE-A506-69F1601EC8D5
Logistic Loss
LogisticLoss is a common objective function used for binary classification.
Operation |
Computational methods |
|
Mathematical formulation
Computing
Algorithm takes dataset \(X = \{ x_1, \ldots, x_n \}\) with n feature vectors of dimension p, vector with correct class labels \(y = \{ y_1, \ldots, y_n \}\) and coefficients vector \(w = \{ w_0, \ldots, w_p \}\) of size \(p + 1\) as input. Then it calculates logistic loss, its gradient or gradient using the following formulas.
Value
\(L(X, w, y) = \sum_{i = 1}^{n} -y_i \log(prob_i) - (1 - y_i) \log(prob_i)\), where \(prob_i = \sigma(w_0 + \sum_{j=1}^{p} w_j x_{i, j})\) - predicted probabilities, \(\sigma(x) = \frac{1}{1 + \exp(-x)}\) - sigmoid function. Note that probabilities are binded to interval \([\epsilon, 1 - \epsilon]\) to avoid problems with computing log function (\(\epsilon=10^{-7}\) if float type is used and \(10^{-15}\) otherwise)
Gradient
\(\overline{grad} = \frac{\partial L}{\partial w}\), where \(\overline{grad}_0 = \sum_{i=1}^{n} prob_i - y_i\), \(\overline{grad}_j = \sum_{i=1}^n X_{i, j} (prob_i - y_i) + L1 \cdot |w_j| + 2 \cdot L2 w_j\) for \(1 \leq j \leq p\)
Hessian
\(H = (h_{ij}) = \frac{\partial L}{\partial w \partial w}\), where \(h_{0,0}= \sum_{k=1}^n prob_k (1 - prob_k)\), \(h_{i,0} = h_{0,i} = \sum_{k=1}^n X_{k,i} \cdot prob_k (1 - prob_k) \), \(h_{i,j} = \sum_{k=1}^n X_{k,i} X_{k,j} \cdot prob_k (1 - prob_k) + [i = j] 2 \cdot L2\) for \(1 \leq i, j \leq p\)
Computation method: dense_batch
The method computes value of objective function, its gradient or hessian for the dense data. This is the default and the only method supported.
Programming Interface
Refer to API Reference: LogisticLoss.
Distributed mode
Currently algorithm does not support distributed execution in SMPD mode.