TOKENIZE

Developer Guide and Reference

Download PDF

ID 767251

Date 10/31/2024

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Visible to Intel only — GUID: GUID-B76D9434-D3C0-4A62-8BA2-107F815E272C

View Details

TOKENIZE

Pure Intrinsic Subroutine (Generic): Parses a string into tokens.

CALL TOKENIZE (string, set, tokens [, separator])

-or-

CALL TOKENIZE (string, set, first, last)

string	(Input) Must be a scalar of type character.
set	(Input) Must be a scalar with the same kind type parameter as `string`.
tokens	(Output) Must be of an allocatable, rank-one array of type character with deferred length and the same kind type parameter as `string`. It cannot be a coarray or a coindexed object.
separator	(Output; optional) Must be an allocatable, rank-one array of type character with deferred length and the same kind type parameter as `string`. It cannot be a coarray or a coindexed object.
first	(Output) Must be an allocatable, rank-one array of type integer. It cannot be a coarray or a coindexed object.
last	(Output) Must be an allocatable, rank-one array of type integer. It cannot be a coarray or a coindexed object.

The characters in set are token delimiters. A token is any sequence of zero or more characters in string delimited by the beginning or end of string, or by any token delimiter in set. Two consecutive token delimiters in string, or a token delimiter in the first or last character position of string, comprise a zero-length token.

Upon completion of a call to TOKENIZE (set, string, tokens [, separator]), tokens is allocated with a lower bound of one and a size equal to the number of tokens in string, and a character length equal to the length of the longest token in string. If separator is present, it is allocated with a lower bound of one and a size equal to one less than the number of tokens in string; the character length of the elements is one. separator(i) is equal to the i^th token delimiter in string. There is no element in separator that indicates beginning or end of string.

Upon completion of a call to TOKENIZE (set, string, first, last), first and last are each allocated with lower bounds of one and size equal to the number of tokens in string. first(i) will have the starting position in string of the i^th token found. Similarly, last(i) will contain the position in string of the last character of the ith token found. If a string has zero length, the starting position is one if it is the first token in string; otherwise, it is one greater than the position of the previous delimiter. The ending position of a zero-length delimiter is one less than its start position.

Example

Consider the following:

CHARACTER(LEN=:),ALLOCATABLE,DIMENSION(:) :: tokens, separators
CHARACTER(LEN=3):: delims = ',&'
CHARACTER(LEN=:),ALLOCATABLE :: herbs
INTEGER,ALLOCATABLE,DIMENSION(:)  :: begins, ends
herbs  = 'parsley,sage,rosemary,&thyme'
CALL TOKENIZE (herbs, delims, tokens)
CALL TOKENIZE (herbs, delims, begins, ends)

After the first call to TOKENIZE is executed, tokens has the value:

        ['parsley ', 'sage    ', 'rosemary', '        ', 'thyme   ']

and separators has the value:

        [',', ',', ',', '&']

After the second call to TOKENIZE, begins has the value:

        [1, 9, 14, 23, 24]

and ends has the value:

        [7, 12, 21, 22, 28]

Parent topic: T to Z

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Developer Guide and Reference

TOKENIZE

Example

See Also