Parse a string using multiple delimiters


#1

Hello and greetings,

I’m a new over here and I’ve started migrating some Python code to Julia, but I stuck at a dead end. Sorry to provide some .py lines in this forum, but I got some doubts about the best (fastest) way to do the same in Julia.

Functions explanation:
Executing the function parsertoken("_My input.string", " _,.", 2) will result “input”.
Parsercount(“Julia=-rocks!”, " =-") will result 2.

def parsertoken(istring, idelimiters, iposition):
    """
    Return a specific token of a given input string,
    considering its position and the provided delimiters

    :param istring: raw input string
    :param idelimiteres: delimiters to split the tokens
    :param iposition: position of the token
    :return: token
    """
    	vlist=''.join([s if s not in idelimiters else ' ' for s in istring]).split()
    	return vlist[vposition]

def parsercount(istring, idelimiters):
    """
    Return the number of tokens at the input string
    considering the delimiters provided

    :param istring: raw input string
    :param idelimiteres: delimiters to split the tokens
    :return: a list with all the tokens found
    """
    	vlist=''.join([s if s not in idelimiters else ' ' for s in istring]).split()
    	return len(vlist)-1

Given I really care about speed, in my Julia implementation, I’m thinking to change the former API, mainly because to get multiple tokens from a string, I have to split the string every single time.

Cheers


#2

Why not call

tokens = split("_My input.string", (' ','_',',','.'))

and then you can get whatever tokens you want from the resulting list?

If you need only a single token, e.g. the 3rd token, you could write a more efficient function to do that. It wouldn’t be too hard to adapt the Base.split function to pull out a specific token or set of tokens, since Base.split is only 30 lines of code.