Developer Reference for Intel® oneAPI Math Kernel Library for C

ID 766684
Date 12/16/2022
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

p?gemm

Computes a scalar-matrix-matrix product and adds the result to a scalar-matrix product for distributed matrices.

Syntax

void psgemm (const char *transa , const char *transb , const MKL_INT *m , const MKL_INT *n , const MKL_INT *k , const float *alpha , const float *a , const MKL_INT *ia , const MKL_INT *ja , const MKL_INT *desca , const float *b , const MKL_INT *ib , const MKL_INT *jb , const MKL_INT *descb , const float *beta , float *c , const MKL_INT *ic , const MKL_INT *jc , const MKL_INT *descc );

void pdgemm (const char *transa , const char *transb , const MKL_INT *m , const MKL_INT *n , const MKL_INT *k , const double *alpha , const double *a , const MKL_INT *ia , const MKL_INT *ja , const MKL_INT *desca , const double *b , const MKL_INT *ib , const MKL_INT *jb , const MKL_INT *descb , const double *beta , double *c , const MKL_INT *ic , const MKL_INT *jc , const MKL_INT *descc );

void pcgemm (const char *transa , const char *transb , const MKL_INT *m , const MKL_INT *n , const MKL_INT *k , const MKL_Complex8 *alpha , const MKL_Complex8 *a , const MKL_INT *ia , const MKL_INT *ja , const MKL_INT *desca , const MKL_Complex8 *b , const MKL_INT *ib , const MKL_INT *jb , const MKL_INT *descb , const MKL_Complex8 *beta , MKL_Complex8 *c , const MKL_INT *ic , const MKL_INT *jc , const MKL_INT *descc );

void pzgemm (const char *transa , const char *transb , const MKL_INT *m , const MKL_INT *n , const MKL_INT *k , const MKL_Complex16 *alpha , const MKL_Complex16 *a , const MKL_INT *ia , const MKL_INT *ja , const MKL_INT *desca , const MKL_Complex16 *b , const MKL_INT *ib , const MKL_INT *jb , const MKL_INT *descb , const MKL_Complex16 *beta , MKL_Complex16 *c , const MKL_INT *ic , const MKL_INT *jc , const MKL_INT *descc );

Include Files
  • mkl_pblas.h
Description

The p?gemm routines perform a matrix-matrix operation with general distributed matrices. The operation is defined as

sub(C) := alpha*op(sub(A))*op(sub(B)) + beta*sub(C),

where:

op(x) is one of op(x) = x, or op(x) = x',

alpha and beta are scalars,

sub(A)=A(ia:ia+m-1, ja:ja+k-1), sub(B)=B(ib:ib+k-1, jb:jb+n-1), and sub(C)=C(ic:ic+m-1, jc:jc+n-1), are distributed matrices.

Input Parameters
transa

(global) Specifies the form of op(sub(A)) used in the matrix multiplication:

if transa = 'N' or 'n', then op(sub(A)) = sub(A);

if transa = 'T' or 't', then op(sub(A)) = sub(A)';

if transa = 'C' or 'c', then op(sub(A)) = sub(A)'.

transb

(global) Specifies the form of op(sub(B)) used in the matrix multiplication:

if transb = 'N' or 'n', then op(sub(B)) = sub(B);

if transb = 'T' or 't', then op(sub(B)) = sub(B)';

if transb = 'C' or 'c', then op(sub(B)) = sub(B)'.

m

(global) Specifies the number of rows of the distributed matrices op(sub(A)) and sub(C), m 0.

n

(global) Specifies the number of columns of the distributed matrices op(sub(B)) and sub(C), n 0.

The value of n must be at least zero.

k

(global) Specifies the number of columns of the distributed matrix op(sub(A)) and the number of rows of the distributed matrix op(sub(B)).

The value of k must be greater than or equal to 0.

alpha

(global)

Specifies the scalar alpha.

When alpha is equal to zero, then the local entries of the arrays a and b corresponding to the entries of the submatrices sub(A) and sub(B) respectively need not be set on input.

a

(local)

Array, size lld_a by kla, where kla is LOCc(ja+k-1) when transa = 'N' or 'n', and is LOCq(ja+m-1) otherwise. Before entry this array must contain the local pieces of the distributed matrix sub(A).

ia, ja

(global) The row and column indices in the distributed matrix A indicating the first row and the first column of the submatrix sub(A), respectively

desca

(global and local) array of dimension 9. The array descriptor of the distributed matrix A.

b

(local)

Array, size lld_b by klb, where klb is LOCc(jb+n-1) when transb = 'N' or 'n', and is LOCq(jb+k-1) otherwise. Before entry this array must contain the local pieces of the distributed matrix sub(B).

ib, jb

(global) The row and column indices in the distributed matrix B indicating the first row and the first column of the submatrix sub(B), respectively

descb

(global and local) array of dimension 9. The array descriptor of the distributed matrix B.

beta

(global)

Specifies the scalar beta.

When beta is equal to zero, then sub(C) need not be set on input.

c

(local)

Array, size (lld_a, LOCq(jc+n-1)). Before entry this array must contain the local pieces of the distributed matrix sub(C).

ic, jc

(global) The row and column indices in the distributed matrix C indicating the first row and the first column of the submatrix sub(C), respectively

descc

(global and local) array of dimension 9. The array descriptor of the distributed matrix C.

Output Parameters
c

Overwritten by the m-by-n distributed matrix alpha*op(sub(A))*op(sub(B)) + beta*sub(C).