Batch Processing

Intel® oneAPI Data Analytics Library Developer Guide and Reference

Download PDF

ID 772611

Date 3/22/2024

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Visible to Intel only — GUID: GUID-55C335DC-4D91-4520-A92C-590DCA934516

View Details

Batch Processing

Algorithm Input

The K-Means clustering algorithm accepts the input described below. Pass the Input ID as a parameter to the methods that provide input for your algorithm.

Algorithm Input for K-Means Computation (Batch Processing)
Input ID	Input
`data`	Pointer to the numeric table with the data to be clustered.
`inputCentroids`	Pointer to the numeric table with the initial centroids.

NOTE:

The input for data and inputCentroids can be an object of any class derived from NumericTable.

Algorithm Parameters

The K-Means clustering algorithm has the following parameters:

Algorithm Parameters for K-Means Computation (Batch Processing)
Parameter	Default Value	Description
`algorithmFPType`	`float`	The floating-point type that the algorithm uses for intermediate computations. Can be `float` or `double`.
`method`	`defaultDense`	Available computation methods for K-Means clustering: For CPU: `defaultDense` - implementation of Lloyd’s algorithm `lloydCSR` - implementation of Lloyd’s algorithm for CSR numeric tables For GPU: `defaultDense` - implementation of Lloyd’s algorithm
`nClusters`	Not applicable	The number of clusters. Required to initialize the algorithm.
`maxIterations`	Not applicable	The number of iterations. Required to initialize the algorithm.
`accuracyThreshold`	0.0	The threshold for termination of the algorithm.
`gamma`	1.0	The weight to be used in distance calculation for binary categorical features.
`distanceType`	`euclidean`	The measure of closeness between points (observations) being clustered. The only distance type supported so far is the Euclidean distance.
DEPRECATED:`assignFlag` USE INSTEAD:`resultsToEvaluate`	`true`	A flag that enables computation of assignments, that is, assigning cluster indices to respective observations.
`resultsToEvaluate`	`computeCentroids` \| `computeAssignments` \| `computeExactObjectiveFunction`	The 64-bit integer flag that specifies which extra characteristics of the K-Means algorithm to compute. Provide one of the following values to request a single characteristic or use bitwise OR to request a combination of the characteristics: `computeCentroids` for computation centroids. `computeAssignments` for computation of assignments, that is, assigning cluster indices to respective observations. `computeExactObjectiveFunction` for computation of exact ObjectiveFunction.

Algorithm Output

The K-Means clustering algorithm calculates the result described below. Pass the Result ID as a parameter to the methods that access the results of your algorithm.

Algorithm Output for K-Means Computation (Batch Processing)
Result ID	Result
`centroids`	Pointer to the numeric table with the cluster centroids, computed when `computeCentroids` option is enabled. NOTE: By default, this result is an object of the `HomogenNumericTable` class, but you can define the result as an object of any class derived from `NumericTable` except for `PackedTriangularMatrix`, `PackedSymmetricMatrix`, and `CSRNumericTable`.
`assignments`	Pointer to the numeric table with assignments of cluster indices to feature vectors in the input data, computed when `computeAssignments` option is enabled. NOTE: By default, this result is an object of the `HomogenNumericTable` class, but you can define the result as an object of any class derived from `NumericTable` except for `PackedTriangularMatrix`, `PackedSymmetricMatrix`, and `CSRNumericTable`.
`objectiveFunction`	Pointer to the numeric table with the minimum value of the objective function obtained at the last iteration of the algorithm, might be inexact. When `computeExactObjectiveFunction` option is enabled, exact objective function is computed. NOTE: By default, this result is an object of the `HomogenNumericTable` class, but you can define the result as an object of any class derived from `NumericTable` except for `PackedTriangularMatrix`, `PackedSymmetricMatrix`, and `CSRNumericTable`.
`nIterations`	Pointer to the numeric table with the actual number of iterations done by the algorithm. NOTE: By default, this result is an object of the `HomogenNumericTable` class, but you can define the result as an object of any class derived from `NumericTable` except for `PackedTriangularMatrix`, `PackedSymmetricMatrix`, and `CSRNumericTable`.

NOTE:

You can skip update of centroids and objectiveFunction in the result and compute assignments using original inputCentroids. To do this, set resultsToEvaluate flag only to computeAssignments and maxIterations to zero.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® oneAPI Data Analytics Library Developer Guide and Reference

Batch Processing

Algorithm Input

Algorithm Parameters

Algorithm Output