Splitting more things out of base

FWIW, I would actually prefer a smaller Base namespace, with e.g. linear algebra as modules (neither external packages nor base), and a much larger library of modules provided. So more python-style in that respect. This is somewhat insubstantial, but would of course require a gargantuan code refactoring of everything (lots of using or import and qualified name use; I prefer the latter).

edit: larger library of modules provided: Pull in packages into julialang, as stdlib

Those are some of the biggest changes between Julia 0.6 and 0.7.
I don’t know if there’s any more overall now, but a lot of things – like LinearAlgebra and Random – are in standard libraries. You have to using LinearAlgebra before a mul! or svd now, and same for more niche random functions like srand.

1 Like

I know, and I really like these changes! I just think they don’t go far enough, I’d like Base to be really slim and StdLib to be fatter. For example, what is quantile doing in Base? Why are StaticArrays and DataStructures not `standard library? More brutally, can’t we put regular expressions into StdLib?

Ideally, I’d like each module/namespace to be small enough to have a complete single html-page documentation that can be read cover-to-cover, just like the python standard library. You don’t think that every beginning julia programmer needs to learn how to write regular expressions? Then it does not belong into the Base namespace, imho.

2 Likes

I agree that a modular, rational and well-organized Base is a good to have. However, I hope we don’t push too far breaking things into pieces. Python takes things too far IMO; it doesn’t even include pi! Sometimes it takes me longer to find out which libraries contain the functions I need than actually coding a small quick script.

5 Likes

I’ve been asking that question myself (as well as moving BigInt and BigFloat into StdLib).
All three of those introduce large binary dependencies (PCRE2, GMP, MPFR) and are also not thread-safe
(not because of the libraries, but because of the bindings themselves).

In other languages such as Python, Rust, Go Regex support is done via a standard package (crate).

2 Likes

I agree. We should probably move stats functions like quantile out of Base, put them in StatsBase, and put that in the standard library.

Regexes could move out too, but they’re pretty widely used in Base itself. Tackling that would probably involve rewriting code to use weaker forms of string searching, moving out file/path-related code, moving out package-loading code, and more. Maybe doable, but a lot of work.

I looked at all of the uses in base, and it’s maybe not so extensive as you believe.
Also, all that’s really needed for v1.0 would be to make a Regex package in stdlib, that exports the macro r"…",
the actual replacement of the regexes used in base could wait until v1.1.

I found 37 regexes in 11 files, which seemed to involve fairly simple patterns that could be rewritten to not use a regex (and be more efficient, IMO)

./uuid.jl:33:        if !occursin(r"^[0-9a-f]{8}(?:-[0-9a-f]{4}){3}-[0-9a-f]{12}$", s)
./path.jl:21:    const path_separator_re = r"/+"
./path.jl:22:    const path_directory_re = r"(?:^|/)\.{0,2}$"
./path.jl:23:    const path_dir_splitter = r"^(.*?)(/+)([^/]*)$"
./path.jl:24:    const path_ext_splitter = r"^((?:.*/)?(?:\.|[^/\.])[^/]*?)(\.[^/\.]*|)$"
./path.jl:29:    const path_separator_re = r"[/\\]+"
./path.jl:30:    const path_absolute_re  = r"^(?:\w+:)?[/\\]"
./path.jl:31:    const path_directory_re = r"(?:^|[/\\])\.{0,2}$"
./path.jl:32:    const path_dir_splitter = r"^(.*?)([/\\]+)([^/\\]*)$"
./path.jl:33:    const path_ext_splitter = r"^((?:.*[/\\])?(?:\.|[^/\\\.])[^/\\]*?)(\.[^/\\\.]*|)$"
./path.jl:36:        m = match(r"^([^\\]+:|\\\\[^\\]+\\[^\\]+|\\\\\?\\UNC\\[^\\]+\\[^\\]+|\\\\\?\\[^\\]+:|)(.*)$", path)
./irrationals.jl:160:    m = match(r"^(.*?)(=.*)$", sprint(show, x, context=ctx, sizehint=0))
./docs/basedocs.jl:927:```jldoctest; filter = r"Stacktrace:(\\n \\[[0-9]+\\].*)*"
./libc.jl:206:        if !occursin(r"([^%]|^)%(a|A|j|w|Ow)", fmt)
./loading.jl:371:const re_section            = r"^\s*\["
./loading.jl:372:const re_array_of_tables    = r"^\s*\[\s*\["
./loading.jl:373:const re_section_deps       = r"^\s*\[\s*\"?deps\"?\s*\]\s*(?:#|$)"
./loading.jl:374:const re_section_capture    = r"^\s*\[\s*\[\s*\"?(\w+)\"?\s*\]\s*\]\s*(?:#|$)"
./loading.jl:375:const re_subsection_deps    = r"^\s*\[\s*\"?(\w+)\"?\s*\.\s*\"?deps\"?\s*\]\s*(?:#|$)"
./loading.jl:376:const re_key_to_string      = r"^\s*(\w+)\s*=\s*\"(.*)\"\s*(?:#|$)"
./loading.jl:377:const re_uuid_to_string     = r"^\s*uuid\s*=\s*\"(.*)\"\s*(?:#|$)"
./loading.jl:378:const re_name_to_string     = r"^\s*name\s*=\s*\"(.*)\"\s*(?:#|$)"
./loading.jl:379:const re_path_to_string     = r"^\s*path\s*=\s*\"(.*)\"\s*(?:#|$)"
./loading.jl:380:const re_hash_to_string     = r"^\s*git-tree-sha1\s*=\s*\"(.*)\"\s*(?:#|$)"
./loading.jl:381:const re_manifest_to_string = r"^\s*manifest\s*=\s*\"(.*)\"\s*(?:#|$)"
./loading.jl:382:const re_deps_to_any        = r"^\s*deps\s*=\s*(.*?)\s*(?:#|$)"
./show.jl:1778:    m = match(r"^(.*?)((?:[\.eE].*)?)$", sprint(show, x, context=io, sizehint=0))
./show.jl:1784:    m = match(r"^(.*[^e][\+\-])(.*)$", sprint(show, x, context=io, sizehint=0))
./show.jl:1789:    m = match(r"^(.*?/)(/.*)$", sprint(show, x, context=io, sizehint=0))
./methodshow.jl:205:    line <= 0 || occursin(r"In\[[0-9]+\]", file) && return ""
./essentials.jl:418:```jldoctest; filter = r"Stacktrace:(\\n \\[[0-9]+\\].*)*"
./version.jl:24:                if !occursin(r"^(?:|[0-9a-z-]*[a-z-][0-9a-z-]*)$"i, ident) ||
./version.jl:34:                if !occursin(r"^(?:|[0-9a-z-]*[a-z-][0-9a-z-]*)$"i, ident) ||
./version.jl:72:const VERSION_REGEX = r"^
./env.jl:102:        m = match(r"^(=?[^=]+)=(.*)$"s, env)
./env.jl:118:        m = match(r"^(.*?)=(.*)$"s, env)

The issues I see are global rounding mode, precision, and a Ref for lgamma. Am I missing anything?

I’d have to go investigate it again (I don’t know if you recall, I’d brought this up with you in person during the 2016 JuliaCon).
The global rounding mode and precision are the ones that I remember that still need to be fixed, I know that the other issue I saw then, with convert to string (and printing), has finally been made thread-safe.

The other issues that make me really want BigInt and BigFloat moved out of base is they don’t perform well at all, due to the impedance mismatch between the GMP/MPFR interfaces (which wants you to always be dealing with references), which requires finalizers (16-bytes extra for every BigInt or BigFloat on 64-bit platfoms, which don’t get freed up immediately and can cause a lot of memory pressure), and the issues with these types (like String) being semantically immutable in Julia, yet actually mutable.
Note that Dec128 is a lot faster than 128-bit BigFloats, precisely because of this impedance mismatch, even though you might expect for binary floating point to be at least somewhat faster than decimal floating point at the same size.

If the problem is performance (and that is a problem), we should fix their performance. Moving them out of Base doesn’t make them faster.

Part of the problem (at least with BigFloats), is that their API limits performance,
i.e. depending on a settable precision, which needs to be constantly picked up,
and needs to deal with thread-safety, instead of having BigFloat parameterized by its precision,
so that it could be stored as a primitive isbits type.

IMO, a bad API has ended up getting “baked into” the language, but it doesn’t need to be.

Ok, you’re right, that’s silly. Super basic stuff, like e, pi, exp sin wants to live in Base namespace. But why gamma, beta, lgamma, etc? These are important and it is cool to have a good special function library that is shipped in the default distribution, but do they need to be global namespace?

I have no problem with regexps being a dependency of Base, but having them in the namespace sucks. Isn’t there a sensible way of having two separate namespaces, even if functionality is mixed?

The same holds for BigInt, BigFloat, and file-handling. Why not have an os module that contains all that stuff? Even if some base functionality depends on handling pathes / files, it would be good hygiene to make it a separate namespace.

Are we talking about this? What is the problem of just allocating a new Ref{Cint} everytime? We are allocating a BigFloat anyway, instead of providing an inplace version lgamma!(dst::BigFloat, src::BigFloat), so these handful of cycles shouldn’t make a big difference.

So even if a speed-up of 20% can be gotten by eliding this alloc, everyone who cares about speed must directly call into the C library anyway in order to skip the impedance mismatch.

I’d written a BigFloat package back two years ago, before JuliaCon 2016, that pretty much solved the impedance mismatch, but got stymied by the issues of making it thread-safe. I should have just pushed on and released it, because as it turns out, all of the threading problems still haven’t been fixed for Base BigFloats.

1 Like

Have you put out a public github of the current code-ruins of your effort?

Yes, I did: BigFloat.jl

Ok, sure, I’m all in favor of separating modules and such, but I also don’t see why this is such a huge problem, or “sucks”. There are very few exported identifiers just for regexes, e.g. r_str and Regex. Why are these ruining your day?

Where did I say that would be a problem? What I actually did was file an issue saying that using a global ref for this is silly, and easy to fix.

5 Likes

Note: one thing I didn’t do back then, was make the desired rounding mode part of the type.
Since that is also something that causes problems from being picked up at run-time from a global value, that needs to be made thread-safe, it seems like that would be a good thing to make a type parameter, along with the precision.
I’d have to experiment with it, and of course, the math people would know better how to handle operations on two numbers with different precision and rounding modes, how to promote the values.

They ruin my day, because they aren’t generic (so won’t work with my Strs package), aren’t thread-safe,
and can’t be overwritten without it being type-piracy and people getting warnings.

If they were moved to stdlib, then somebody could do using StrRegex instead of using Regex, and everything would be just fine.

I thought that was precisely one of the big reasons for moving things out of Base and into stdlib, so that they wouldn’t be locked down with types that aren’t so good, and can’t be changed without being a breaking change.

Note: it would probably take a very short time to simply create a stdlib Regex package, that just had @r_str and Regex exported, and add deprecations in v0.7 as has been done elsewhere if @r_str or Regex are used without a using Regex.
The harder work (which didn’t look that hard to me, maybe a couple days?) of removing the usages of regexes from Base could happen then at any time, either before or after the v1.0 release.

1 Like