Developer Guide and Reference

ID 767251
Date 10/31/2024
Public
Document Table of Contents

TOKENIZE

Pure Intrinsic Subroutine (Generic): Parses a string into tokens.

CALL TOKENIZE (string, set, tokens [, separator])

-or-

CALL TOKENIZE (string, set, first, last)

string

(Input) Must be a scalar of type character.

set

(Input) Must be a scalar with the same kind type parameter as string.

tokens

(Output) Must be of an allocatable, rank-one array of type character with deferred length and the same kind type parameter as string. It cannot be a coarray or a coindexed object.

separator

(Output; optional) Must be an allocatable, rank-one array of type character with deferred length and the same kind type parameter as string. It cannot be a coarray or a coindexed object.

first

(Output) Must be an allocatable, rank-one array of type integer. It cannot be a coarray or a coindexed object.

last

(Output) Must be an allocatable, rank-one array of type integer. It cannot be a coarray or a coindexed object.

The characters in set are token delimiters. A token is any sequence of zero or more characters in string delimited by the beginning or end of string, or by any token delimiter in set. Two consecutive token delimiters in string, or a token delimiter in the first or last character position of string, comprise a zero-length token.

Upon completion of a call to TOKENIZE (set, string, tokens [, separator]), tokens is allocated with a lower bound of one and a size equal to the number of tokens in string, and a character length equal to the length of the longest token in string. If separator is present, it is allocated with a lower bound of one and a size equal to one less than the number of tokens in string; the character length of the elements is one. separator(i) is equal to the ith token delimiter in string. There is no element in separator that indicates beginning or end of string.

Upon completion of a call to TOKENIZE (set, string, first, last), first and last are each allocated with lower bounds of one and size equal to the number of tokens in string. first(i) will have the starting position in string of the ith token found. Similarly, last(i) will contain the position in string of the last character of the ith token found. If a string has zero length, the starting position is one if it is the first token in string; otherwise, it is one greater than the position of the previous delimiter. The ending position of a zero-length delimiter is one less than its start position.

Example

Consider the following:

CHARACTER(LEN=:),ALLOCATABLE,DIMENSION(:) :: tokens, separators
CHARACTER(LEN=3):: delims = ',&'
CHARACTER(LEN=:),ALLOCATABLE :: herbs
INTEGER,ALLOCATABLE,DIMENSION(:)  :: begins, ends
herbs  = 'parsley,sage,rosemary,&thyme'
CALL TOKENIZE (herbs, delims, tokens)
CALL TOKENIZE (herbs, delims, begins, ends)

After the first call to TOKENIZE is executed, tokens has the value:

        ['parsley ', 'sage    ', 'rosemary', '        ', 'thyme   ']

and separators has the value:

        [',', ',', ',', '&']

After the second call to TOKENIZE, begins has the value:

        [1, 9, 14, 23, 24]

and ends has the value:

        [7, 12, 21, 22, 28]

See Also