Adaptive Subgradient Method

Intel® oneAPI Data Analytics Library Developer Guide and Reference

Download PDF

ID 772611

Date 3/22/2024

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Visible to Intel only — GUID: GUID-53869C4B-224F-4CE0-AB18-62BD0ABEC457

View Details

Adaptive Subgradient Method

The adaptive subgradient method (AdaGrad) [Duchi2011] follows the algorithmic framework of an iterative solver with the algorithm-specific transformation T, set of intrinsic parameters defined for the learning rate , and algorithm-specific vector U and power d of Lebesgue space defined as follows:

, where is the i-th coordinate of the gradient
, where

Convergence check:

Computation

The adaptive subgradient (AdaGrad) method is a special case of an iterative solver. For parameters, input, and output of iterative solvers, see Computation for Iterative Solver.

Algorithm Input

In addition to the input of the iterative solver, the AdaGrad method accepts the following optional input:

Algorithm Input for Adaptive Subgradient Method Computation
OptionalDataID	Input
`gradientSquareSum`	A numeric table of size with the values of . Each value is an accumulated sum of squares of coordinate values of a corresponding gradient.

Algorithm Parameters

In addition to parameters of the iterative solver, the AdaGrad method has the following parameters:

Algorithm Parameters for Adaptive Subgradient Method Computation
Parameter	Default Value	Description
`algorithmFPType`	`float`	The floating-point type that the algorithm uses for intermediate computations. Can be `float` or `double`.
`method`	`defaultDense`	Default performance-oriented computation method.
`batchIndices`	`NULL`	A numeric table of size for the `defaultDense` method that represents 32-bit integer indices of terms in the objective function. If no indices are provided, the algorithm generates random indices.
`batchSize`	128	The number of batch indices to compute the stochastic gradient. If `batchSize` equals the number of terms in the objective function, no random sampling is performed, and all terms are used to calculate the gradient. The algorithm ignores this parameter if the `batchIndices` parameter is provided.
`learningRate`	A numeric table of size that contains the default step length equal to 0.01.	A numeric table of size that contains the value of learning rate . NOTE: This parameter can be an object of any class derived from `NumericTable`, except for `PackedTriangularMatrix`, `PackedSymmetricMatrix`, and `CSRNumericTable`.
`degenerateCasesThreshold`		Value needed to avoid degenerate cases when computing square roots.
`engine`	SharePtr< engines:: mt19937:: Batch>()	Pointer to the random number generator engine that is used internally for generation of 32-bit integer indices of terms in the objective function.

Algorithm Output

In addition to the output of the iterative solver, the AdaGrad method calculates the following optional result:

Algorithm Output for Adaptive Subgradient Method Computation
OptionalDataID	Output
`gradientSquareSum`	A numeric table of size with the values of . Each value is an accumulated sum of squares of coordinate values of a corresponding gradient.

Examples

C++ (CPU)

Python*

https://github.com/intel/scikit-learn-intelex/tree/main/examples/daal4py/adagrad_mse_batch.py

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® oneAPI Data Analytics Library Developer Guide and Reference

Adaptive Subgradient Method

Computation

Examples