This thread has devolved a little bit, but here’s some technical feedback about IsURL.jl @anon37204545.
IsURL.jl currently uses the following regexes to check for “valid” URLs:
windowsregex = r"^[a-zA-Z]:[\\]" urlregex = r"^[a-zA-Z][a-zA-Z\d+\-.]*:"
Neither of these are correct. Let’s take a look why.
windowsregex to check for valid “windows paths”. Setting aside the fact that julia already has
ispath support by default (which works in a platform agnostic way and actually checks filesystem properties), the given regex allows a lot of strings as paths that can simply never be. Take the following:
This matches the regex, but fails to meet the microsoft specification on paths -
? are a forbidden character and thus the given string cannot be a path. Moreover, on windows, there are a bunch of forbidden names (
NUL, to name a few) as well as forbidden characters, which would all be allowed with the given regex.
Ok, let’s pretend we don’t care about (proper) windows path support, people will only plug in correct paths anyway, right? Well, maybe, but drive letters aren’t the only way a volumne can be identified. In fact, there are at least three ways, and none of the other ways are matching the regex.
“Normal” windows paths have a length limitation of 260 characters, so having a path longer than that directly starting with
C:\ is not allowed. Moreover, to overcome this limitation (up to about 32 thousand characters), the path could be written with the Unicode supported path format, or
\\?\C:\, but this again doesn’t match the regex.
Enough about filepaths, let’s look at the
urlregex. URLs are a long-standing accumulation of cruft and legacy, but they at least have a proper format and RFC we can check against. In my day to day work, I often connect to servers directly via their IP, so allowing URLs to start with numbers is critically important to me. Your regex doesn’t allow that, but even the (admittedly not perfect) approach from HTTP.jl is more correct:
julia> URL("18.104.22.168/index.html") HTTP.URI("22.214.171.124/index.html")
Yay, I can connect to my servers! You might like HTTP.jl, it’s a very nice package.
It looks like your “urlregex” doesn’t check for proper URLs at all, but simply matches the
scheme part of a URL (and a trailing
:, though that alone doesn’t match the RFC for paths). I wouldn’t rely on this to validate my URLs.
All in all, please don’t be discouraged to create new packages! But please also be aware that there might be existing packages that already handle these kinds of seemingly simple tasks very well. There’s no shame in asking here or in the julialang slack about specific functionality and modules, there are a lot of folks there willing to help new people out.