How is Julia Like Python?

@jasheldo one important thing to realize is that the code in Julia base and many of the core packages is hyper optimized and that this is precisely the code that no one writes in python. These packages shouldn’t be compared to python, but rather to C libraries with python bindings as that is how you would write equivalent code in python. If you want good examples of clean simple Julia, look in smaller packages that were written quickly to get a job done, rather than to eek out every last drop of performance.

13 Likes

@jasheldo

  1. things in Julia can be written many different ways, I’ve learned to write things that are more convenient for me.
    Specific examples (MWEs) are the best way we can have a productive conversation.
    I bet if you had MWEs of a few scripts we can find a lot of different ways to code it.
    As others have explained, many things are optional:
    if you don’t like unicode, you don’t have to use it
    if you don’t like pipeline notation |>, no need to
  2. Give it time:
    I’ve come to appreciate & love some features of Julia syntax I wasn’t comfortable with at first (such as |>)
  3. My background is in Matlab where I had to avoid loops like the plague & learned to vectorize as much of my code as possible. Often this resulted in ugly, awkward code that difficult to read/write/debug.
    See: Julia slower than Matlab & Python? No - #91 by Albert_Zevelev. See the codes in that post & decide for yourself.
    I was able to replace awkward vectorizations in numpy/Matlab w/ simple loops in Julia & got better performance.
    I concluded:
4 Likes

I’ve looked at the code in various Julia packages and base functions – and it’s nothing like the code I write in Julia. I conclude the people who programmed those are much better programmers than me and are working on a higher plane of existence. :wink: Remember, a Real Programmer can write FORTRAN in any language.

I recently did a little Python programming and was reminded of how ugly it looks. Here is a little snippet of ugly Python code:

    p=regtree.predict(x)
    print("size p = ",np.shape(p))
    ssm = np.column_stack((y,p))
    ssx=pd.DataFrame(ssm)
    ssx['Pred-Act']=ssx[1]-ssx[0]
    ssx.rename(columns={0:'Act', 1:'Pred'},inplace=True)
    ssx['sq error']=ssx['Pred-Act']**2
    sse=ssx['sq error'].sum()
    print("SSE = ",sse)
    ssx['Pred-Act'].hist()
    e_max=ssx['Pred-Act'].max()
    e_min=ssx['Pred-Act'].min()
    print(e_min,e_max)
    print(ssx['Pred-Act'].quantile(q=[0.05,0.10,0.25,0.50,0.75,0.90,0.95]))
    sc=regtree.score(x,y)
    print("Tree model Regression score = ",sc)
    print(ssx)
    sys.stdout.flush()

Python ends up with a lot of dots - pd. (Pandas), np. (Numpy), and their methods .column_stack .sum() .max() .min() .quantile() .score(x,y) and so on. Special ire for the sys.stdout.flush() calls I have to sprinkle through the code to force the output to write as the program progresses.

With Julia, there is no Numpy – it’s built in. Broadcasting gets rid of the need for lots of fiddly little methods need in Python.

Here’s some of my Julia code:

            dx = sx.-x
            d = BigInt((minimum(dx)))
            di = findfirst(dx -> dx==d,dx)
            if di != nothing
                # swap card wtih Bob
                push!(y,x[di])
                deleteat!(x,di)
                global sx=BigInt(sum(x)) # Alice
                global sy=BigInt(sum(y)) # Bob
            else
                # we failed
                #print("We failed di = nothing\n")
                global retcode = 0
                break
            end

Here you see the use of broadcasting (dx = sx.-x, where sx is a scalar and x is a vector – the result, dx, is a vector), a minimum() function, the nothing datatype, sum() function.

Seems a lot cleaner to me. Doesn’t have superfluous semicolons around for loops and if-then-else statements.

Julia also now as the “missing” datatype. In Python, missing data is a problem – some people use the built-in None, while others np.nan (Numpy), then we have math.nan, and float(‘nan’). Packages, of course, won’t be consistent in which one they use.

Also, for a dynamic language, one seems to stumble over variable types frequently. Package A will error out with, “I didn’t want that type, I want this other type.” So you have to do a type conversion before you call ilt. Julia has multiple dispatch and that appears to clean up this problem quite a bit.

The one thing that gets me a little batty is the scoping rules. Still haven’t quite mastered those.

Finally, Python runs at glacial speed. The response to that is, “yes, but for stuff you need to run fast, there are packages compiled in C or C++ that are fast” – the two language problem has been institutionalized in their thinking.

8 Likes

I have ported quite a lot of python code to Julia, and it very often consists of removing clutter and directly making the code much cleaner. Below is an example, the python code is very hard to parse, while the julia equivalent is very clean

for t in range(T):
    sigma[t, :, :] = np.vstack([
        np.hstack([
            sigma[t, idx_x, idx_x],
            sigma[t, idx_x, idx_x].dot(traj_distr.K[t, :, :].T)
        ]),
        np.hstack([
            traj_distr.K[t, :, :].dot(sigma[t, idx_x, idx_x]),
            traj_distr.K[t, :, :].dot(sigma[t, idx_x, idx_x]).dot(
                traj_distr.K[t, :, :].T
            ) + traj_distr.pol_covar[t, :, :]
        ])
    ])
    mu[t, :] = np.hstack([
        mu[t, idx_x],
        traj_distr.K[t, :, :].dot(mu[t, idx_x]) + traj_distr.k[t, :]
    ])
    if t < T - 1:
        sigma[t+1, idx_x, idx_x] = \
                Fm[t, :, :].dot(sigma[t, :, :]).dot(Fm[t, :, :].T) + \
                dyn_covar[t, :, :]
        mu[t+1, idx_x] = Fm[t, :, :].dot(mu[t, :]) + fv[t, :]
return mu, sigma
for i = 1:N-1
    K,Σ           = traj.K[:,:,i], traj.Σ[:,:,i]
    Σₙ[ix,ix,i+1] = fx[:,:,i]*Σₙ[ix,ix,i]*fx[:,:,i]' + R1 # Iterate dLyap forward
    Σₙ[iu,ix,i]   = K*Σₙ[ix,ix,i]
    Σₙ[ix,iu,i]   = Σₙ[ix,ix,i]*K'
    Σₙ[iu,iu,i]   = K*Σₙ[ix,ix,i]*K' + Σ
end

most numerical processing code in python hurts my eyes t.b.h, it doesn’t look anything at all like the math it’s implementing.

28 Likes

While I agree that a lot of code in Base has a bit of extra complexity, it is surprising how little is needed
to achieve great performance with generic code. Even the more convoluted parts have just 3–4 layers.

Perhaps newcomers to Julia should not start with these, but later on reading code in Base and the standard libraries is a great way to learn idiomatic Julia.

6 Likes

My impression too is that there is an emphasis on keeping Base code ‘clean’ and without out ‘outlandish’ hyper-optimization tricks. Generic and elegant solutions are preferred, and then those should be made fast, rather than using ad-hoc performance hacks.

3 Likes

I guess that after one gets over superficial differences of syntax and functions names, the most unusual aspect of Julia may be the organization of code into

  1. small functions that do (mostly) one thing,
  2. with specialized methods for various situations as necessary,
  3. helper methods that just implement some logic that will be applied by multiple dispatch.

Seeing tiny functions that seemingly “do nothing”, just split and rearrange arguments or add some cryptic object (eg a trait) may be confusing at first.

7 Likes

That you to everyone that’s chimed in to offer constructive feedback and thoughts. It’s clear there’s a lot of passionate folks in the Julia community. It’s very heartwarming.

I appreciate the comments/thoughts/suggestions from everyone! Let me provide a more succinct snippet of what I do in Python that I love because of how clean the code is, at least to my eyes.

This first part is a bit of SQLAlchemy. I absolutely adore SQLAlchemy for many reasons but mostly because it saves me from having to write raw SQL.

start_d, end_d = '2019-01-01', '2019-12-31' # rolling 12 month incurred date
from_d, thru_d = start_d, '2020-12-31' # rolling 12 month paid-thru (runout) date

def td_sql(tin):
    """Pull the data from Teradata that will be used as a basis for the CATNIP process."""
    SCHEMA = 'SCHEMA'
    clm = Table('CLM', td_metadata, autoload=True, autoload_with=td_engine, schema=SCHEMA)
    lne = Table('CLM_LINE', td_metadata, autoload=True, autoload_with=td_engine, schema=SCHEMA)
    clm_max_rvsn = Table('CLM_MAX_RVSN', td_metadata, autoload=True, autoload_with=td_engine, schema=SCHEMA)
    coa = Table('CLM_LINE_COA', td_metadata, autoload=True, autoload_with=td_engine, schema=SCHEMA)
    patch = Table('MEDCR_PROV_ANLYTC_PATCH', td_metadata, autoload=True, autoload_with=td_engine, schema=SCHEMA)
    xwalk = Table('MEDCR_ANLYTC_PADRS_XWALK', td_metadata, autoload=True, autoload_with=td_engine, schema=SCHEMA)

    prod_id3 = 'ADY ADZ AET AEY ALW ALX ALY ALZ ANH ANI ANJ ANK AVC AVH AVT AVY TCA TCB TCC TCN '\
    'TCO TCP TCS TCT TCU TCV TCG AUS TCI AUC AUU AUT AXF AYE TCJ TCH AXA WCA WCB WCC '\
    'WCD WCE WCK WCL WCM WCN'.split()
    prod_id = 'HXBF HXDJ HXCH HXEE HXBG HXCR'.split()
    prod_id_bad = 'AVYW0009 AVYW0010'.split()

    ntwk = case([(or_(and_(func.substr(lne.c.PROD_ID, 1, 3).in_(prod_id3),
                           ~lne.c.PROD_ID.in_(prod_id_bad)),
                      lne.c.PROD_ID.in_(prod_id)), 'Blue Preferred'),
                 (coa.c.CMPNY_CF_CD == 'G0423', 'Blue Preferred'),
                 (lne.c.PROD_ID == 'MIMC', 'Medicaid PPO'),
                 (and_(lne.c.PROD_ID.like('%HX%'),
                       coa.c.MBU_CF_CD.like('IN%')), 'HIX'),
                 (or_(coa.c.PROD_CF_CD.like('%MA'),
                      coa.c.PROD_CF_CD.like('%MS')), 'Medicare Advantage PPO')],
                else_='Blue Access')

    sql = select([clm.c.CLM_NBR.label('CLMNBR'),
                  clm.c.CLM_SOR_CD,
                  clm.c.SRC_SBSCRBR_ID.label('SUBSCRIBER_ID'),
                  func.substr(clm.c.NTWK_ID, 1, 12).label('NETWORKID'),
                  ntwk.label('Network'),
                  case([(clm.c.CLM_SOR_CD == '896', lne.c.RNDRG_PROV_ID)],
                       else_=clm.c.SRC_BILLG_TAX_ID).label('TAXID'),
                  xwalk.c.PROV_ST_CD,
                  clm.c.SRC_PRCHSR_ORG_NM]).distinct()
    sql = sql.select_from(clm.join(lne, clm.c.CLM_ADJSTMNT_KEY == lne.c.CLM_ADJSTMNT_KEY)
                          .join(clm_max_rvsn, clm_max_rvsn.c.CLM_ADJSTMNT_KEY == clm.c.CLM_ADJSTMNT_KEY)
                          .outerjoin(coa, and_(lne.c.CLM_ADJSTMNT_KEY == coa.c.CLM_ADJSTMNT_KEY,
                                               lne.c.CLM_LINE_NBR == coa.c.CLM_LINE_NBR))
                          .outerjoin(patch, patch.c.CLM_ADJSTMNT_KEY == clm.c.CLM_ADJSTMNT_KEY)
                          .outerjoin(xwalk, xwalk.c.RPTG_MEDCR_ID == patch.c.RPTG_MEDCR_ID))
    sql = sql.where(and_(clm.c.SRVC_RNDRG_TYPE_CD.in_('FANCL HOSP'.split()),
                         clm.c.CLM_SOR_CD != '1104',
                         or_(clm.c.SRC_BILLG_TAX_ID.in_(tin),
                             func.substr(lne.c.RNDRG_PROV_ID, 1, 9).in_(tin)),
                         clm.c.CLM_ITS_HOST_CD != 'HOME',
                         lne.c.CLM_LINE_SRVC_STRT_DT.between(start_d, end_d),
                         lne.c.ADJDCTN_DT.between(from_d, thru_d),
                         clm.c.ADJDCTN_DT.between(from_d, thru_d)))
    sql = sql.order_by(sql.c.CLMNBR, clm.c.CLM_SOR_CD)
    return sql

ans = pd.DataFrame()
for tin in tin_lst:
    with td_engine.connect() as cnxn:
        tmp = pd.read_sql(td_sql(tin), cnxn)
    ans = pd.concat([ans, tmp], reset_index=True, sort=False)

This next snippet is a bit where I take two dataframes (SQL pulls), merge and clean them. To be honest, this is someones SAS that I rewrote and validated. They saved a lot of subsequent tables. I instead chained everything together without refactoring.

# Combines the calculation for all subsequent steps.
cols = ['CLMNBR', 'SUBSCRIBER_ID', 'TAXID', 'NETWORKCODE', 'NETWORKS']
lst = []
for state in states:
    lst.append(f'{state} MA PPO ITS')
    lst.append(f'{state} MA PPO' )

(facility.loc[facility.PROV_ST_CD.str.strip().isin(states)]
 .merge(pricer_union, how='left', on=['CLMNBR'])
 .rename({'SUBSCRIBER_ID_y': 'SUBSCRIBER_ID'}, axis=1)
 .drop('SUBSCRIBER_ID_x', axis=1)
 .assign(NETWORKCODE=lambda x: np.where(x.CLM_SOR_CD.str.strip() == '823', x.NETWORKID, x.networkcode))
 .drop_duplicates()
 .drop('networkcode', axis=1)
 .merge(m, how='left', left_on='NETWORKCODE', right_on='NWNW_ID')
 .drop_duplicates()
 .drop('NWNW_ID', axis=1)
 .rename({'NETWORK': 'NETWORKNAME'}, axis=1)
 .assign(NETWORKS=lambda x: np.where(x.NETWORKNAME.isin([' ', np.nan]), x.Network, x.NETWORKNAME))
 .drop(['Network', 'NETWORKNAME'], axis=1)
 .merge(m, how='left', left_on='NETWORKS', right_on='NETWORK')
 .drop(['NETWORKCODE', 'NETWORK', 'CLM_SOR_CD', 'PROV_ST_CD', 'NETWORKID'], axis=1)
 .rename({'NWNW_ID': 'NETWORKCODE'}, axis=1)
 .assign(NETWORK=lambda x: np.where(x.SRC_PRCHSR_ORG_NM.isin(lst), 'Medicare Advantage PPO', x.NETWORKS))
 .drop('NETWORKS', axis=1)
 .rename({'NETWORK': 'NETWORKS'}, axis=1)
 .groupby('CLMNBR')['SUBSCRIBER_ID TAXID NETWORKCODE NETWORKS SRC_PRCHSR_ORG_NM'.split()].first().reset_index()
 .drop('SRC_PRCHSR_ORG_NM', axis=1))[cols]

I think both of these are beautiful and I can give this to virtually anyone and they’d know what’s going on.

What do these two things look like in Julia? From what I’ve seen, nowhere near as elegant. My hope is I’m missing the boat.

Thank you everyone for your thoughtfulness and expertise.

2 Likes

There is an SQLAlchemy.jl package. It hasn’t been updated in a while, but the syntax you can use isn’t too far from your Python code:

db(select([artists[:Name],
             func("count", albums[:Title]) |> label("# of albums")]) |>
     selectfrom(join(artists, albums)) |>
     groupby(albums[:ArtistId]) |>
     orderby(desc("# of albums"))) |> fetchall
1 Like

As someone who isn’t that used to working with tables, I have to say that these might as well have been written in hieroglyphics.

Ehm… I guess this is easier to grok if you know this sort of thing. lne.c.PROD_ID? func.substr? I thought Python avoided abbreviations.

It’s not just this line, though, it’s the whole thing.

This chains operations together, I think, but I have no idea what they mean.

I suppose you mean to say that if you know what the code is supposed to do, and are highly familiar with sql and table stuff, you can follow along?

6 Likes

You seem to be referring mostly to aesthetic things, which are very subjective. I think most of the code you posted looks ugly. I think Python is too verbose and hard to read.

However, all of what that code does can also be done in Julia. See packages like DataFramesMeta.jl or Query.jl. Both of those allow you to easily chain general table operations.

4 Likes

It’s great that you love Python, but I am not sure this is the best place to expand on this.

Please kindly keep in mind that this is a discussion forum for Julia: unless that topic has something to do with Julia, discussions about Python are just noise for most people here.

3 Likes

On the namespace issue, Julia offers a lot of flexibility. You have the choice between using and import.

  • using will bring all the exported function definitions into your current namespace while stilling allowing you to refer to exported and unexported elements by Module.func. There is no stigma to doing this and it is encouraged.
  • import is more similar to Python syntax. You have to refer to functions using the module name: Module.func.

You can also add a colon to using or import to only bring certain functions into your current namespace.

You can also do aliasing if needed even of the whole Module by simple assignment: mo = Module. You can then refer to mo.func

Overall, if you really wanted you could use Python-like namespace conventions you can. If you want a flatter namespace, you can do that also. The system is extremely flexible.

For more details see:
https://docs.julialang.org/en/v1/manual/modules/index.html

3 Likes

I will just focus on the first part of this code to translate from Julia to Python:

Let’s start by just writing this in Julia in a very Python-like manner:

cols = ["CLMNBR", "SUBSCRIBER_ID", "TAXID", "NETWORKCODE", "NETWORKS"]
lst  = []
for state in states
    push!(lst,"$state MA PPO ITS")
    push!(lst,"$state MA PPO" )
end

The above looks pretty similar to Python and it works. One small issue is that lst is currently a Array{Any,1} but could be Array{String,1}.

cols = ["CLMNBR", "SUBSCRIBER_ID", "TAXID", "NETWORKCODE", "NETWORKS"]
lst = String[]
for state in states
    push!(lst,"$state MA PPO ITS")
    push!(lst,"$state MA PPO" )
end

The above code is perfect Julia, but a Julian might end up using comprehensions:

cols = ["CLMNBR", "SUBSCRIBER_ID", "TAXID", "NETWORKCODE", "NETWORKS"]
lst  = [ "$state MA PPO$its" for its=[""," INF"], state=states][:]

The difference here is a matter of style. You could actually do something very similar in Python as well. In Julia, however, I see more people use compact syntax.

EDIT. The following is also valid Python:

cols = ["CLMNBR", "SUBSCRIBER_ID", "TAXID", "NETWORKCODE", "NETWORKS"]
lst  = [ f"{state} MMA PPO{its}" for its in {""," INF"} for state in states]
3 Likes

…and keep in mind that extending this code is just adding another method with the same name, whereas in case of Python/numpy you have to modify the function and put another if in it :wink: The extensibility of Julia packages is I think a key feature and needs more attention in comparisons.

Btw. in my opinion a comparison between Python and Julia also needs to take into account the C/C++/FORTRAN code base and glue code in the Python world. Since effectively that’s mandatory to have a similar performance (which is key) as in Julia. And the low level numpy code is really hard to read and in my opinion ugly as hell…

3 Likes

Thank you all for your additional thoughts and feedback. It sounds like the best decision for the team is to stay where we are while I work on learning more about Julia, understanding its syntax decisions more and getting a better foundation for myself in the language. It looks like I can recommend it to the Data Science team without hesitation.

Based on the comments, if the real comparisons should be made to C/C++ then that effectively pulls an all stop. The team I have direct influence on needs to work in a high level language. C/C++ being lower level is far too out of scope for these individuals. In my mind, the analogy I draw based on all the (great) feedback is I’m out shopping for a new car and found Julia but Julia wound up being a truck. I can’t use a truck, I need a car.

It is absolutely not my intention to start a subjective style contest. My goal here is to simply understand the stated parallel between Julia and Python and it sounds like that parallel isn’t where I was hoping it would be. I’ll continue exploring Julia for myself and maybe will find a turning point somewhere down the road.

Again, thank you all for your excellent thoughts, feedback and time. Please stay safe out there. All the best to you and your loved ones!

2 Likes

The point of the comparison with C/C++ is how you get the those performance advantages while working in a high-level language. You appear to take away the opposite point of that which was made.

Changing language is of course a big decision, and your current plan seems sound. But Julia is not a ‘truck’, it’s a ‘car’ with the power and storage capacity of a ‘truck’. It’s a bit backwards to hold the fact that it can be compared to C against the language.

17 Likes

In the car versus truck analogy, Julia can be what you need. When you need a car, it can be car. If you need a truck, it can be truck. If you need a semi, a backhoe, or minivan, it can also be that. Importantly all these vehicles are meant to work well together. You can even build hybrids.

If you are looking around the Julia codebase and want everything to be a car, then obviously that is not going to happen because some people are using it as a truck or a minivan.

Let’s go back to Python for a minute. What is apparent to me is that actually a not really a single Python language anymore. There are distinct dialects. Not only is there PyPy and CPython, but you have also have Numba based Python versus NumPy/SciPy based Python. Even there, some people prefer not to use SciPy. You can mix these things together, but the result is often suboptimal. If you want the fastest code, sometimes you need to eschew NumPy and stick entirely with Numba optimized code. We could continue by talking about PyTorch and TensorFlow. My point is there are effectively different worlds within Python, and importantly it is difficult to bridge between them without compromising which is what keeps these worlds separate. I’m not meaning to denigrate Python here because the organic evolution of the language has just been fantastic. However, that also adds some legacy which limits Python. When you say you want a clean codebase like Python, I am honestly not sure which Python you are talking about. Much of what underlies Python is not written in Python. The way you describe Python seems very idealistic and does not really seem to match the reality of the many dialects of Python.

In part, Julia is a reaction to the sharding of Python. We want the full convenience of NumPy and the full speed of Numba without compromises. Julia is what you get when you see all these added on features for numerical processing and then build a new language to support that at the very core of the language rather than as an add on.

At the end of the day, if you want to write code that looks like a particular dialect of Python and maybe takes a few lines to do it, there is absolutely nothing in Julia or it’s community stopping you from doing that. While some of those features are not necessarily supported in core Julia, there are often packages that can. In Julia, we support a diversity of coding-styles.

There are some features that also support the ability to write R-like code as well. For example, the pipe syntax used above. You don’t have to use it, however. From your original question, perhaps that is unacceptable to you. However, if you want to use a purely procedural syntax, Julia is flexible enough to allow that as well.

We brought up C++ because another important feature of Julia is that you can do low level operations in Julia as well. To do the same thing in Python, you would end up having to use a low level language like C++. To be clear, you don’t have to use these features. The best way in most cases is to use a Julia package which uses these features inside the package but exposes a simple procedural or macro-based interface to allow you to take advantage of it.

The ability to write high-level Python-like syntax while also being able to access low-level operations means that we can write everything in a single interoperable programming environment. This has the advantage of composability in that we can often independently design new packages and then combine them in a synergistic manner to achieve new functionality.

Julia is designed to support many programming tasks in a single language from low level operations to high-level syntactical flourishes. You don’t have to use all of the language. You can write something in the middle of that range but still interoperate with both extremes of that range. There is a subset which is very much like the nicely formatted Python syntax you used. If you want to use that, this community will support you. That in and of itself is a really nice feature of Julia.

20 Likes

One aspect of Julia that never ceases to amaze me: you can fix someone else’s code without touching it. I realized once more how powerful this is when dealing with the following issue: Problem solving a simple SDP in ProximalAlgorithms.jl · Issue #36 · JuliaFirstOrder/ProximalAlgorithms.jl · GitHub

A method was missing that could handle a certain projection on Matrix objects, and is there only for Symmetric or Hermitian objects. Sure, will fix it, just need some wrapping/unwrapping, but in the meanwhile? Well, in the meanwhile the user can put a patch on it by adding the missing method to his own script/package, without the need to touch my package. Problem solved!

5 Likes