Transliterate.jl, IsURL.jl – useful tools

I created two very simple packages which may be useful to anyone working with text or links. Both are ports of two popular npm packages with slight modifications. They will be automatically merged into Julia registry in 3 days (Transliterate) and 1 day (IsURL). Both are registered now and you can add them with ]add Transliterate and ]add IsURL.

Those are:

https://github.com/zdroid/IsURL.jl

I am wondering do those packages match the use cases of Julians, since this community doesn’t have the same needs as the Node.js community. If the answer is positive, I will port some more.

7 Likes

Hi, and welcome to the community!

I think in general Julia packages are a bit more complex than this. I guess you’re coming from Node, where one-line packages are pretty common. I can speak for my package here and say that we wouldn’t introduce another dependency just to get something we could reproduce in a few lines of code.

Also, just looking at your code:

function transliterate(str; languages=[], custom_replacements=Dict())
    if languages ≠ []
        if typeof(languages) == String
            languages = [languages]
        end

is not really idiomatic Julia. We don’t need to do introspection like this when we have multiple dispatch. That is, create a series of methods:

function transliterate(str)
function transliterate(str, languages::AbstractVector)

Then you can simply do

transliterate(str, language::AbstractString) = transliterate(str, [language])

but again (and I’m not necessarily speaking for the general Julia community) - I’m not sure packages are the right place for one-liners like this.

Edited: Also, are you sure this is right?

julia> isurl("foo:bar:baz:zzy")
true

julia> isurl("a:10")
true

julia> isurl("a:::::")
true   

julia> isurl("a:::::::10")
true

julia> isurl("a: : : : : : :10")
true

julia> isurl("""
       a:
       multi
       lines
       """)
true
14 Likes

One possible solution would be to make a single package with lots of these useful one liners for writing websites. That way, using it gets the benefit of a whole set of nice functions for web-dev.

6 Likes

or maybe, if they’re useful enough, incorporated into HTTP.jl - but of course they have their own parser/validator for URIs there.

4 Likes

From what I’ve seen, the Julia community tends more to follow the Go proverb “A little copying is better than a little dependency”.

8 Likes

I am not sure what you are referring to here.

In any case, I don’t think that is a relevant concern for the above two packages, with 50 and 28 lines of code (not counting the lookup table). Just to put this in perspective, 4-8K LOC is quite common for mature Julia packages. Just put things that belong together in a single package.

4 Likes

That proverb is also known as “re-inventing the wheel”.

Julia is not only used for complex simulations, for which it is logical to use a package which will provide you with enough tools. It can also be used in REPL / scripting environment which is supposed to work fast. Compiling a package with 1000 functions to obtain 1 of them is not my definition of fast.

2 Likes

Probably not the same needs yet the packages seem pretty cool :wink:

1 Like

Personally I think that these packages are too small. Why? Because it is unusual that you want to use only one of them, for instance. Maybe it could be useful isURL in HTTP.jl or another package (as Gumbo or similar).

By the way, I recommend you to add the examples of @anon94023334 to the tests in IsUrl and to document both packages (in general, I encourage to document any new package).

3 Likes

This conversation makes me think it may be time for someone to create something like R packages. A somewhat opinionated guide to package development but for Julia.

7 Likes

In general, I am opposed to the idea of tiny single function packages. I admit that I’ve published some packages like that myself, but that’s only because it really didn’t fit into anything else. URL checker features seem like they would fit into HTTP. Would prefer if we don’t go down the path of NPM and huge numbers of tiny packages in general, as has been discussed many times before. Making a tiny package should only be out of necessity and lack of alternatives where it can fit into. Just my 2 cents.

Also, I wouldn’t describe the Julia ecosystem as young, it’s already existed for quite a few years and is nearly at v1.5, which is well past the beginning phase.

10 Likes

I’m a little surprised by the reactions here and the sentiment against small packages that only do one or two things in a couple dozen lines.

This is not at all the reaction I received when I announced any of my packages which were of a comparable size and scope:

I would have liked to be told these packages don’t belong on the general registry if that’s how people felt…

6 Likes

FWIW, I personally don’t think a targeted effort in porting Node packages of these type is a very useful thing to work on, especially as someone new to Julia. In fact, I would argue (again personal opinion) that a future where you add a package and get a large number of dependencies like IsUrl, Transcode.jl and packages like that would be a pretty bad thing for the Julia ecosystem.

4 Likes

Well, for me is more about if that functionality is going to be use in isolation, or usually people using it are going to use it with another one (as HTTP.jl and IsUrl.jl). I like ToggleableAsserts.jl and for me it has a lot of sense an only package for that, because it is not related it with another package. Anyway, it is a personal opinion, because sometimes it is not clear. For instance, for a package I need to extract POST parameters for a service type package. I did not want a so complete package like Genie.jl, and I have to do it “at hand”. I am still considering creating a package for that (an HTTParams.jl) or doing a PR to HTTP.jl including it. The criterion is not clear, but while to create a HTTParams could be nice, I think it has more sense to include it in HTTP.jl (if it does not increase its complexity of dependencies. What do you think?
EDIT: I have removed the code to be more concise.

Thank you for your work! I wouldn’t presume use cases of Julians; any less than for e.g. Node.js. Not everyone does scientific/technical computing with Julia, I’m e.g. using Plotly/Dash.jl. I do see transliterate helpful (and I checked, and ok with it for my native Icelandic, and I’m not sure people would “reproduce in a few lines of code” making sure all languages work), for e.g. NLP (ok, maybe scientific is a good term for it).

I like how e.g. Icelandic Þ and Ð are supported, but I’m conflicted it “works” for Turkish Tittle - Wikipedia (the story on WP for sure not made up?):

Many cellphones available in Turkey (as of 2008) lacked a proper localization, which led to replacing ı by i in SMS, sometimes severely distorting the sense of a text. In one instance, a miscommunication played a role in the deaths of Emine and Ramazan Çalçoban in 2008.[5][6]

It’s not for sure people would use the other package with HTTP.jl (your program could be a filter, otherwise not unlikely). That said, there is URIParser.jl, seemingly overlapping with or making your redundant.

“are you sure this is right?” comment shows people will do such parsing wrong, an excellent argument, that a package (which can be improved later), or at least a function is needed somewhere.

Why?

I’m not taking a stand yet, except I wouldn’t want people to keep their code to themselves.

Did you object because of some package startup overhead? If that’s the worry then, I think we need to handle such, “slower”/but more interactive languages Python/Perl/JavaScript do handle?

julia> @time using Transliterate
  0.006106 seconds (14.76 k allocations: 886.013 KiB)

julia> @time using IsURL
  0.002868 seconds (6.80 k allocations: 413.055 KiB)

julia> @time using HTTP
  0.977093 seconds (1.55 M allocations: 83.811 MiB, 2.00% gc time)

with lowered opt.:
  0.291570 seconds (306.50 k allocations: 18.365 MiB)

julia> @time using Gumbo: parsehtml  # doesn't seem faster choosing just one function
  0.354805 seconds (304.67 k allocations: 17.564 MiB)
2 Likes

I think the argument against lots of small packages is that having similar functionality under the same roof increases findability, consistency, likelihood of continued maintenance, and more users to find and report bugs. It also means that the import statements become a good summary of what the file is doing. If you use 30 single function packages, you end up in the Java situation where the dependencies are a massive block that everyone ignores because there’s too much noise for the useful value.

3 Likes

I really feel that the first package announcement post of a new community member is not the best place to have this conversation.

As it stands, we don’t have any rules or publicly visible conventions around what would be sufficiently complicated enough to warrant a registered package. Doing this in the comments to @anon37204545’s announcement post, especially when this wasn’t done in anyone elses simple function announcement posts, feels like it sends an unwelcoming message to a new contributor.

This should be hashed out first before be blow up a new user’s inbox and make them feel unwelcome.

I don’t think any particular post here is an issue, but the sheer volume of very similar negative posts doesn’t seem to help the situation and I think could feel quite alienating.


Edit: I have significantly edited this post since several people responded to it to clarify my position and soften some overly dramatic / unhelpful parts.

19 Likes

I think the reason why people engaged in this discussion is because in a related topic, the OP is proposing a change in the registry approval process to mimic the way JavaScript handles tiny packages. Clearly, this author has a much greater interest in tiny packages than most other authors. I dont believe anyone wants to discourage these packages from being published, instead we wish to engage deeper into the discussion of the registration of new packages, especially since @anon37204545 started the discussion related to how other languages handle this situation.

5 Likes

Speaking for myself, the reason I engaged was because there was a request for feedback. I think it’s really uncharitable to criticize anyone who responded as contributing to an unwelcoming environment.

2 Likes

Yep, I really didn’t word my post the way I wish I had in hindsight. I’ll make some edits when I’m back at my computer.

I don’t think any of the posts here are a problem in isolation, including yours. What makes me uncomfortable is the overall feel of this thread when many many voices chime in saying that the packages shouldn’t exist. I think such a situation can cause one to feel ganged up on. This is a pretty mild example of this dynamic, but there is a big history in this Discourse of situations where one person, often a newcomer, ends up arguing against half the online community and nobody (often including myself) can resist chiming in to echo the mainstream opinion.

I think even though such pile-on posts are well intentioned and often informative, it can leave a very bad impression.

6 Likes