Julia Row-wise operation of DataFrames

    function substring_(x::String,i::Int32)
        if i==1
            return string(x[1])
        elseif i==2
            return x[2:end]
        else
            return "NA"
        end
    end

I would like to apply the function above on a column of dataframe:
i.e. substring_.(train[:col],2).
But it returns an array of single character e.g. [‘a’,‘b’,‘c’…] instead of [“apo”,“bksk”,“cssa”…].

In python, there is an apply + lambda method to do what I want. May I know how Julia can implement this? Thanks.

Make sure you are actually applying the function you have pasted (you might have changed the definition of substring_ and not reloaded it), as:

  • the method you describe does not return Char values;
  • the method will not get called by substring_.(train[:col],2), unless you use 32-bit Julia;

Now regarding your function I would recommend to rewrite it as:

function substring_(x::String,i::Int) # changed Int32 to Int so that we use the default integer type on your machine
        if i==1
            return first(x, 1) # this gives you a string containing the first character of x
        elseif i==2
            return chop(x, head=1, tail=0) # this is a safe way to remove the first character from x; your code would fail on non-ASCII characters
        else
            return "NA"
        end
    end

Then:

substring_.(train.col,2) # note that in DataFrames.jl you can access columns by name as a property

will work.

Also in Julia a standard way to indicate a missing value is missing, so you might consider using this value instead of "NA" in your code (“NA” might be OK for your purposes, so I did not change this in the code. However, note that if you write substring_("aNA", 2) you also get "NA" returned and then you are not able to distinguish these cases).

If some additional explanations of the above would be helpful please let me know.

Thanks for your useful advice. I think my function was not reloaded well. Your suggestions help improve my Julia programming skills.