Safe base64decode

I’d like to be able to base64decode a string without any risk of an ArgumentError("malformed base64 sequence") (see here).

To that end, I can think of two main options:

  1. Make a safe version of the base64decode function
  2. Make a function that tests the validity of a string as a base64-encoded string

I know I can just wrap it with a try catch clause, but I’m trying to avoid that. I want to be able to safely decode the string or simply test its validity before decoding it.

How is throwing an exception “unsafe”? What’s wrong with a try block?

That being said, a non allocating Base64.isvalid function seems like it could be a nice addition.

There is a StackOverflow answer giving some regex solutions. This seems like the easiest approach.

4 Likes

In analogy to tryparse, a trybase64decode function that returns nothing on invalid input would be good.

2 Likes

Would the following work for you?

function trydecode(args...) 
    try
        decode(args...)
    catch e
        nothing
    end
end
1 Like

Nothing fundamentally wrong, or “unsafe”. But in code where failure is suboptimal (servers), I’d hope there would be a deterministic way to ascertain if something is base64-decodable. Perhaps there isn’t a way to answer that other than trying to decode it and catching any eventual errors?

Will this reasonably cover all the corner-cases? If so, then that is indeed easiest.

This seems ideal to me.

That is what I’m doing now, more or less.

I’m skeptical.

The reason for tryparse, in my view, is that parsing basic objects like Float64 is a relatively cheap operation that often occurs in performance-critical inner loops, so the overhead of try … catch block is potentially significant.

In contrast, base64 decoding is normally a relatively expensive operation done on large-ish blocks of binary data, so the overhead of a try … catch block should be negligible for typical usage.

Nor does trybase64decode avoid memory allocation for the result, which in my mind is the main reason for an isvalidbase64 function.

4 Likes

s/would/could/ :slight_smile:

The overhead of a try-catch block is small in the normal flow, but large when an exception does get thrown, though I forget how large… Maybe it’s still negligible compared to decoding.

Thank you all for the awesome feedback. I’ll sum this up since I feel this has come to a good conclusion:

There is no reason to create a validity check function that probably covers 99.99% of the cases (@Aaron_Denney shared an excellent article about why one shouldn’t validate but parse instead: Parse, don’t validate). But there might be some value to a trybase64decode function with a return type of Union{Nothing, Vector{Uint8}}. The argument against that is that the added try-catch overhead is tiny compared to the time it takes to decode a string and therefore there is no point in investing in the development of this trybase64decode function.

As a side note, this highlights the difference between functions that succeed or throw an error and functions that succeed or return nothing. Is there some consensus on this programming-paradigm-question?

1 Like

BTW Can we have result value convention for fast error handling? · JuliaLang/julia · Discussion #43773 · GitHub pushes for this paradigm (and also lists some packages that help with working with that paradigm). Not sure there’s really consensus that it’s the right approach for Base though.

1 Like