Transliterate.jl, IsURL.jl – useful tools

That proverb is also known as “re-inventing the wheel”.

Julia is not only used for complex simulations, for which it is logical to use a package which will provide you with enough tools. It can also be used in REPL / scripting environment which is supposed to work fast. Compiling a package with 1000 functions to obtain 1 of them is not my definition of fast.

2 Likes

Probably not the same needs yet the packages seem pretty cool :wink:

1 Like

Personally I think that these packages are too small. Why? Because it is unusual that you want to use only one of them, for instance. Maybe it could be useful isURL in HTTP.jl or another package (as Gumbo or similar).

By the way, I recommend you to add the examples of @anon94023334 to the tests in IsUrl and to document both packages (in general, I encourage to document any new package).

3 Likes

This conversation makes me think it may be time for someone to create something like R packages. A somewhat opinionated guide to package development but for Julia.

7 Likes

In general, I am opposed to the idea of tiny single function packages. I admit that I’ve published some packages like that myself, but that’s only because it really didn’t fit into anything else. URL checker features seem like they would fit into HTTP. Would prefer if we don’t go down the path of NPM and huge numbers of tiny packages in general, as has been discussed many times before. Making a tiny package should only be out of necessity and lack of alternatives where it can fit into. Just my 2 cents.

Also, I wouldn’t describe the Julia ecosystem as young, it’s already existed for quite a few years and is nearly at v1.5, which is well past the beginning phase.

10 Likes

I’m a little surprised by the reactions here and the sentiment against small packages that only do one or two things in a couple dozen lines.

This is not at all the reaction I received when I announced any of my packages which were of a comparable size and scope:

I would have liked to be told these packages don’t belong on the general registry if that’s how people felt…

6 Likes

FWIW, I personally don’t think a targeted effort in porting Node packages of these type is a very useful thing to work on, especially as someone new to Julia. In fact, I would argue (again personal opinion) that a future where you add a package and get a large number of dependencies like IsUrl, Transcode.jl and packages like that would be a pretty bad thing for the Julia ecosystem.

4 Likes

Well, for me is more about if that functionality is going to be use in isolation, or usually people using it are going to use it with another one (as HTTP.jl and IsUrl.jl). I like ToggleableAsserts.jl and for me it has a lot of sense an only package for that, because it is not related it with another package. Anyway, it is a personal opinion, because sometimes it is not clear. For instance, for a package I need to extract POST parameters for a service type package. I did not want a so complete package like Genie.jl, and I have to do it “at hand”. I am still considering creating a package for that (an HTTParams.jl) or doing a PR to HTTP.jl including it. The criterion is not clear, but while to create a HTTParams could be nice, I think it has more sense to include it in HTTP.jl (if it does not increase its complexity of dependencies. What do you think?
EDIT: I have removed the code to be more concise.

Thank you for your work! I wouldn’t presume use cases of Julians; any less than for e.g. Node.js. Not everyone does scientific/technical computing with Julia, I’m e.g. using Plotly/Dash.jl. I do see transliterate helpful (and I checked, and ok with it for my native Icelandic, and I’m not sure people would “reproduce in a few lines of code” making sure all languages work), for e.g. NLP (ok, maybe scientific is a good term for it).

I like how e.g. Icelandic Þ and Ð are supported, but I’m conflicted it “works” for Turkish https://en.wikipedia.org/wiki/Dotted_and_dotless_I (the story on WP for sure not made up?):

Many cellphones available in Turkey (as of 2008) lacked a proper localization, which led to replacing ı by i in SMS, sometimes severely distorting the sense of a text. In one instance, a miscommunication played a role in the deaths of Emine and Ramazan Çalçoban in 2008.[5][6]

It’s not for sure people would use the other package with HTTP.jl (your program could be a filter, otherwise not unlikely). That said, there is URIParser.jl, seemingly overlapping with or making your redundant.

“are you sure this is right?” comment shows people will do such parsing wrong, an excellent argument, that a package (which can be improved later), or at least a function is needed somewhere.

Why?

I’m not taking a stand yet, except I wouldn’t want people to keep their code to themselves.

Did you object because of some package startup overhead? If that’s the worry then, I think we need to handle such, “slower”/but more interactive languages Python/Perl/JavaScript do handle?

julia> @time using Transliterate
  0.006106 seconds (14.76 k allocations: 886.013 KiB)

julia> @time using IsURL
  0.002868 seconds (6.80 k allocations: 413.055 KiB)

julia> @time using HTTP
  0.977093 seconds (1.55 M allocations: 83.811 MiB, 2.00% gc time)

with lowered opt.:
  0.291570 seconds (306.50 k allocations: 18.365 MiB)

julia> @time using Gumbo: parsehtml  # doesn't seem faster choosing just one function
  0.354805 seconds (304.67 k allocations: 17.564 MiB)
2 Likes

I think the argument against lots of small packages is that having similar functionality under the same roof increases findability, consistency, likelihood of continued maintenance, and more users to find and report bugs. It also means that the import statements become a good summary of what the file is doing. If you use 30 single function packages, you end up in the Java situation where the dependencies are a massive block that everyone ignores because there’s too much noise for the useful value.

3 Likes

I really feel that the first package announcement post of a new community member is not the best place to have this conversation.

As it stands, we don’t have any rules or publicly visible conventions around what would be sufficiently complicated enough to warrant a registered package. Doing this in the comments to @anon37204545’s announcement post, especially when this wasn’t done in anyone elses simple function announcement posts, feels like it sends an unwelcoming message to a new contributor.

This should be hashed out first before be blow up a new user’s inbox and make them feel unwelcome.

I don’t think any particular post here is an issue, but the sheer volume of very similar negative posts doesn’t seem to help the situation and I think could feel quite alienating.


Edit: I have significantly edited this post since several people responded to it to clarify my position and soften some overly dramatic / unhelpful parts.

19 Likes

I think the reason why people engaged in this discussion is because in a related topic, the OP is proposing a change in the registry approval process to mimic the way JavaScript handles tiny packages. Clearly, this author has a much greater interest in tiny packages than most other authors. I dont believe anyone wants to discourage these packages from being published, instead we wish to engage deeper into the discussion of the registration of new packages, especially since @anon37204545 started the discussion related to how other languages handle this situation.

5 Likes

Speaking for myself, the reason I engaged was because there was a request for feedback. I think it’s really uncharitable to criticize anyone who responded as contributing to an unwelcoming environment.

2 Likes

Yep, I really didn’t word my post the way I wish I had in hindsight. I’ll make some edits when I’m back at my computer.

I don’t think any of the posts here are a problem in isolation, including yours. What makes me uncomfortable is the overall feel of this thread when many many voices chime in saying that the packages shouldn’t exist. I think such a situation can cause one to feel ganged up on. This is a pretty mild example of this dynamic, but there is a big history in this Discourse of situations where one person, often a newcomer, ends up arguing against half the online community and nobody (often including myself) can resist chiming in to echo the mainstream opinion.

I think even though such pile-on posts are well intentioned and often informative, it can leave a very bad impression.

6 Likes

This thread has devolved a little bit, but here’s some technical feedback about IsURL.jl @anon37204545.

IsURL.jl currently uses the following regexes to check for “valid” URLs:

windowsregex = r"^[a-zA-Z]:[\\]"
urlregex = r"^[a-zA-Z][a-zA-Z\d+\-.]*:"

Neither of these are correct. Let’s take a look why.

First, windowsregex to check for valid “windows paths”. Setting aside the fact that julia already has ispath support by default (which works in a platform agnostic way and actually checks filesystem properties), the given regex allows a lot of strings as paths that can simply never be. Take the following:

C:\\???\a\folder

This matches the regex, but fails to meet the microsoft specification on paths - ? are a forbidden character and thus the given string cannot be a path. Moreover, on windows, there are a bunch of forbidden names (AUX, PRN, CON, NUL, to name a few) as well as forbidden characters, which would all be allowed with the given regex.

Ok, let’s pretend we don’t care about (proper) windows path support, people will only plug in correct paths anyway, right? Well, maybe, but drive letters aren’t the only way a volumne can be identified. In fact, there are at least three ways, and none of the other ways are matching the regex.

“Normal” windows paths have a length limitation of 260 characters, so having a path longer than that directly starting with C:\ is not allowed. Moreover, to overcome this limitation (up to about 32 thousand characters), the path could be written with the Unicode supported path format, or \\?\C:\, but this again doesn’t match the regex.


Enough about filepaths, let’s look at the urlregex. URLs are a long-standing accumulation of cruft and legacy, but they at least have a proper format and RFC we can check against. In my day to day work, I often connect to servers directly via their IP, so allowing URLs to start with numbers is critically important to me. Your regex doesn’t allow that, but even the (admittedly not perfect) approach from HTTP.jl is more correct:

julia> URL("172.45.12.3/index.html")
HTTP.URI("172.45.12.3/index.html")

Yay, I can connect to my servers! You might like HTTP.jl, it’s a very nice package.

It looks like your “urlregex” doesn’t check for proper URLs at all, but simply matches the scheme part of a URL (and a trailing :, though that alone doesn’t match the RFC for paths). I wouldn’t rely on this to validate my URLs.


All in all, please don’t be discouraged to create new packages! But please also be aware that there might be existing packages that already handle these kinds of seemingly simple tasks very well. There’s no shame in asking here or in the julialang slack about specific functionality and modules, there are a lot of folks there willing to help new people out.

11 Likes

The OP literally asked for feedback if this was a good idea to port a bunch of mini packages to Julia.

1 Like

no offense and just in a 20% seriousness level, this reminds me of https://www.npmjs.com/package/is-odd

1 Like

I am not sure why you think this.

In any case, existing Julia code in Base, the standard libraries, and numerous packages has a lot of examples that seem to contradict this.

2 Likes

Keyword arguments do not participate in dispatch. You may find the whole chapter

https://docs.julialang.org/en/v1/manual/methods

useful, but especially the design patterns.

1 Like

I would argue “a bunch” and “more” are within rounding error of each other.

The waiting period is very useful. If a 3 day waiting period is bottlenecking you it is likely that you are registering an excessive amount of packages. (Note that you can use a unregistered package by just adding its URL so there is no need to wait for it to be registered to start using it, unless you want to register another package with that as a dependency).

7 Likes