How is Julia Like Python?

OP I think that if someone is coming from excel, they can still definitely learn Julia. Perhaps I am too far removed from learning to code, but I think the things you describe actually make julia more easy for beginners than python.

  • Function bang, i.e. select!(df). This tells the user that the function is modifying its argument. The user doesn’t have to experiment with the function to realize it does this or read through documentation or source code.
  • Broadcasting, i.e. y = x .* z. This is also explicit, as it tells the reader that the function is being applied to every element of an array. You don’t have to wonder a function is being applied to a whole array or not.
  • Piping notation. Python’s version of piping is x.foo().bar(), so it’s hard to argue that that’s better than other alternatives. If you are referring to syntax like foo ∘ bar, note that this isn’t really standard syntax and you don’t have to use it. Plus dplyr %>% piping in R is very common and seems to be popular for people coming from excel.

Additionally, note that you don’t have to teach the entire ecosystem at once. If people are coming from excel you might just want to teach them the basics and how to work with dataframes. Then they can import excel (or preferably .csv) files and stay entirely within the DataFrames ecosystem.

3 Likes
  • function bang - actually pretty simple convention, but you are not obliged to use it
  • macros - there are some quite practical and easy to use macros like @show . Those who come from Excel don’t need to write macros of their own, as they are unlikely to write decorators in Python, too
  • piping notation - actually easy to understand and to use, but you don’t necessarily need it
  • Unicode isn’t π better than np.pi math.pi scipy.pi ?
2 Likes

For those who want to learn Julia as their first programming language, the book ThinkJulia could be a good starting point.

4 Likes

Enter Python. I fell in love immediately. It’s beautiful, succinct, has a very broad, extensible scope, and easily readable to anyone that doesn’t know Python. They may not understand how to write the code but generally speaking can follow what it’s doing.

I disagree. My first programming language was JavaScript and when I started searching for a language for my data analytics needs, I settled on Julia. As a result, I’ve never coded anything in Python and I was recently sent some Python code by a colleague who asked me to review/validate it and I found that I couldn’t make sense of what was going on. I had to have him walk through the code with me line-by-line before I could understand what it was doing.

IMHO, it’s really about what you’re used to. If your native spoken/written language is one that is based on a Latin alphabet, then looking at something written in Cyrillic script may seem very strange and foreign. The same thing happens with programming languages - if you are used to looking at Python code, Julia code may seem very foreign and strange (although in many cases I’ve found that it actually looks quite similar).

5 Likes

I’ll also make a stab at addressing these:

  • Function bang: not really syntax, just a convention, like using underscores i python for private fields, or special methods, or using CamelCase for type names, and snake_case for functions. It’s just a nice and informative convention.
  • Broadcast dots, .+ etc: This is maybe my favourite thing about Julia. A super-easy, explicit, universally general way to vectorize (in the Matlab sense) all functions and operators. You see a dot, you know it’s broadcasting over a container. If you see numpy.cos(x) in Python, can you tell what’s going on? Or having to wonder what happens when you throw an array at a function? In Julia, put a dot on it, and it works! This feature I will actually officially label ‘genius’.
  • Macros: you don’t need them, write all your code with no macros!
  • Piping, do you mean |>? I’ve almost never used that, so I have no opinion, but I don’t see anything particularly bad about it.
  • Unicode characters: You don’t have to use them at all. You can write all your code without them, and Base code doesn’t use it much. Most uses I’ve seen are like π for pi, cosθ instead of cos_theta, things like that. It is just fantastically nice for mathematical code, but you can ignore it completely, if you prefer, you wouldn’t be the only one. There is absolutely nothing in the language that requires unicode symbols.
4 Likes

@jasheldo, I came to Julia via R, but I didn’t decide against R. I make my decision which language to use problem-oriented.

How is Julia anything like Python?

I have never understood the intention of Julia to be an R or a python either. But in the end, based on Logan’s post, I hope that Julia will continue to be a programming language and community for everyone, which for me means that it will become even more user-friendly.

2 Likes

The beauty about Julia for me is that when I started I had little to no understanding of what happens “under the hood”. I, like you, basically only used R and Python and thought that while some packages are fast, vectorization is cool etc., I had no idea of why some pieces of code ran faster than others. It was largely a hit-and-miss with accumulated wisdom and tricks from stackoverflow and hours of experimenting.

With Julia, the guide is as clear as can be about why something works well and something doesn’t. Only about 10 months into Julia, I understand (largely) what happens under the hood (in fact, you can even see it with specific Julia macros), parametric types, multiple dispatch etc. If you had asked me these things with my R and Python background, I would have said “too advanced for me to know, I’ll ask my friend who did CS”. With Julia, I have felt that I have been able to cross those boundaries without much hesitation about what’s “too advanced” or what’s not. It’s a really cool feeling to see the interaction between your code and what’s going on “under the hood” – Julia is a beautiful language that allows you to do that!

23 Likes

@Gunter_Faes Just a small off-topic post to say that the community standards explicitly ask not to play on the name Julia to imply it is a person you could date - you might want to consider editing your post.

And on topic: clearly Julia is like Python because everyone’s favourite Python feature is comprehensions, and Julia’s comprehensions are pretty good, too!

6 Likes

Julia is like Python in that it’s high-level, dynamic and easy to write simple scripts with. In terms of syntax and linear algebra, I always found it much closer to Matlab, which is the language I’d compare it with. But I guess Python is just more popular, so that’s the comparison that’s more likely to be made.

2 Likes

@nilshg, thanks for the hint, have I removed…!

And on topic: clearly Julia is like Python because everyone’s favourite Python feature is comprehensions, and Julia’s comprehensions are pretty good, too!

… Interesting statement! :thinking:

2 Likes

Clearly this is a matter of taste. Python syntax is adequate in most cases, but I can’t stand numpy and pandas. I much prefer doing vector and data frame manipulations in R (and Julia) than in Python. R may have a steeper learning curve than Python, but the effort required to learn the language pays off. Similarly, Julia probably has a steeper learning curve than Python, but the payoff is a language that is, in my opinion, both more powerful and more expressive than Python.

1 Like

@jasheldo one important thing to realize is that the code in Julia base and many of the core packages is hyper optimized and that this is precisely the code that no one writes in python. These packages shouldn’t be compared to python, but rather to C libraries with python bindings as that is how you would write equivalent code in python. If you want good examples of clean simple Julia, look in smaller packages that were written quickly to get a job done, rather than to eek out every last drop of performance.

12 Likes

@jasheldo

  1. things in Julia can be written many different ways, I’ve learned to write things that are more convenient for me.
    Specific examples (MWEs) are the best way we can have a productive conversation.
    I bet if you had MWEs of a few scripts we can find a lot of different ways to code it.
    As others have explained, many things are optional:
    if you don’t like unicode, you don’t have to use it
    if you don’t like pipeline notation |>, no need to
  2. Give it time:
    I’ve come to appreciate & love some features of Julia syntax I wasn’t comfortable with at first (such as |>)
  3. My background is in Matlab where I had to avoid loops like the plague & learned to vectorize as much of my code as possible. Often this resulted in ugly, awkward code that difficult to read/write/debug.
    See: Julia slower than Matlab & Python? No. See the codes in that post & decide for yourself.
    I was able to replace awkward vectorizations in numpy/Matlab w/ simple loops in Julia & got better performance.
    I concluded:
4 Likes

I’ve looked at the code in various Julia packages and base functions – and it’s nothing like the code I write in Julia. I conclude the people who programmed those are much better programmers than me and are working on a higher plane of existence. :wink: Remember, a Real Programmer can write FORTRAN in any language.

I recently did a little Python programming and was reminded of how ugly it looks. Here is a little snippet of ugly Python code:

    p=regtree.predict(x)
    print("size p = ",np.shape(p))
    ssm = np.column_stack((y,p))
    ssx=pd.DataFrame(ssm)
    ssx['Pred-Act']=ssx[1]-ssx[0]
    ssx.rename(columns={0:'Act', 1:'Pred'},inplace=True)
    ssx['sq error']=ssx['Pred-Act']**2
    sse=ssx['sq error'].sum()
    print("SSE = ",sse)
    ssx['Pred-Act'].hist()
    e_max=ssx['Pred-Act'].max()
    e_min=ssx['Pred-Act'].min()
    print(e_min,e_max)
    print(ssx['Pred-Act'].quantile(q=[0.05,0.10,0.25,0.50,0.75,0.90,0.95]))
    sc=regtree.score(x,y)
    print("Tree model Regression score = ",sc)
    print(ssx)
    sys.stdout.flush()

Python ends up with a lot of dots - pd. (Pandas), np. (Numpy), and their methods .column_stack .sum() .max() .min() .quantile() .score(x,y) and so on. Special ire for the sys.stdout.flush() calls I have to sprinkle through the code to force the output to write as the program progresses.

With Julia, there is no Numpy – it’s built in. Broadcasting gets rid of the need for lots of fiddly little methods need in Python.

Here’s some of my Julia code:

            dx = sx.-x
            d = BigInt((minimum(dx)))
            di = findfirst(dx -> dx==d,dx)
            if di != nothing
                # swap card wtih Bob
                push!(y,x[di])
                deleteat!(x,di)
                global sx=BigInt(sum(x)) # Alice
                global sy=BigInt(sum(y)) # Bob
            else
                # we failed
                #print("We failed di = nothing\n")
                global retcode = 0
                break
            end

Here you see the use of broadcasting (dx = sx.-x, where sx is a scalar and x is a vector – the result, dx, is a vector), a minimum() function, the nothing datatype, sum() function.

Seems a lot cleaner to me. Doesn’t have superfluous semicolons around for loops and if-then-else statements.

Julia also now as the “missing” datatype. In Python, missing data is a problem – some people use the built-in None, while others np.nan (Numpy), then we have math.nan, and float(‘nan’). Packages, of course, won’t be consistent in which one they use.

Also, for a dynamic language, one seems to stumble over variable types frequently. Package A will error out with, “I didn’t want that type, I want this other type.” So you have to do a type conversion before you call ilt. Julia has multiple dispatch and that appears to clean up this problem quite a bit.

The one thing that gets me a little batty is the scoping rules. Still haven’t quite mastered those.

Finally, Python runs at glacial speed. The response to that is, “yes, but for stuff you need to run fast, there are packages compiled in C or C++ that are fast” – the two language problem has been institutionalized in their thinking.

8 Likes

I have ported quite a lot of python code to Julia, and it very often consists of removing clutter and directly making the code much cleaner. Below is an example, the python code is very hard to parse, while the julia equivalent is very clean

for t in range(T):
    sigma[t, :, :] = np.vstack([
        np.hstack([
            sigma[t, idx_x, idx_x],
            sigma[t, idx_x, idx_x].dot(traj_distr.K[t, :, :].T)
        ]),
        np.hstack([
            traj_distr.K[t, :, :].dot(sigma[t, idx_x, idx_x]),
            traj_distr.K[t, :, :].dot(sigma[t, idx_x, idx_x]).dot(
                traj_distr.K[t, :, :].T
            ) + traj_distr.pol_covar[t, :, :]
        ])
    ])
    mu[t, :] = np.hstack([
        mu[t, idx_x],
        traj_distr.K[t, :, :].dot(mu[t, idx_x]) + traj_distr.k[t, :]
    ])
    if t < T - 1:
        sigma[t+1, idx_x, idx_x] = \
                Fm[t, :, :].dot(sigma[t, :, :]).dot(Fm[t, :, :].T) + \
                dyn_covar[t, :, :]
        mu[t+1, idx_x] = Fm[t, :, :].dot(mu[t, :]) + fv[t, :]
return mu, sigma
for i = 1:N-1
    K,Σ           = traj.K[:,:,i], traj.Σ[:,:,i]
    Σₙ[ix,ix,i+1] = fx[:,:,i]*Σₙ[ix,ix,i]*fx[:,:,i]' + R1 # Iterate dLyap forward
    Σₙ[iu,ix,i]   = K*Σₙ[ix,ix,i]
    Σₙ[ix,iu,i]   = Σₙ[ix,ix,i]*K'
    Σₙ[iu,iu,i]   = K*Σₙ[ix,ix,i]*K' + Σ
end

most numerical processing code in python hurts my eyes t.b.h, it doesn’t look anything at all like the math it’s implementing.

26 Likes

While I agree that a lot of code in Base has a bit of extra complexity, it is surprising how little is needed
to achieve great performance with generic code. Even the more convoluted parts have just 3–4 layers.

Perhaps newcomers to Julia should not start with these, but later on reading code in Base and the standard libraries is a great way to learn idiomatic Julia.

6 Likes

My impression too is that there is an emphasis on keeping Base code ‘clean’ and without out ‘outlandish’ hyper-optimization tricks. Generic and elegant solutions are preferred, and then those should be made fast, rather than using ad-hoc performance hacks.

3 Likes

I guess that after one gets over superficial differences of syntax and functions names, the most unusual aspect of Julia may be the organization of code into

  1. small functions that do (mostly) one thing,
  2. with specialized methods for various situations as necessary,
  3. helper methods that just implement some logic that will be applied by multiple dispatch.

Seeing tiny functions that seemingly “do nothing”, just split and rearrange arguments or add some cryptic object (eg a trait) may be confusing at first.

7 Likes

That you to everyone that’s chimed in to offer constructive feedback and thoughts. It’s clear there’s a lot of passionate folks in the Julia community. It’s very heartwarming.

I appreciate the comments/thoughts/suggestions from everyone! Let me provide a more succinct snippet of what I do in Python that I love because of how clean the code is, at least to my eyes.

This first part is a bit of SQLAlchemy. I absolutely adore SQLAlchemy for many reasons but mostly because it saves me from having to write raw SQL.

start_d, end_d = '2019-01-01', '2019-12-31' # rolling 12 month incurred date
from_d, thru_d = start_d, '2020-12-31' # rolling 12 month paid-thru (runout) date

def td_sql(tin):
    """Pull the data from Teradata that will be used as a basis for the CATNIP process."""
    SCHEMA = 'SCHEMA'
    clm = Table('CLM', td_metadata, autoload=True, autoload_with=td_engine, schema=SCHEMA)
    lne = Table('CLM_LINE', td_metadata, autoload=True, autoload_with=td_engine, schema=SCHEMA)
    clm_max_rvsn = Table('CLM_MAX_RVSN', td_metadata, autoload=True, autoload_with=td_engine, schema=SCHEMA)
    coa = Table('CLM_LINE_COA', td_metadata, autoload=True, autoload_with=td_engine, schema=SCHEMA)
    patch = Table('MEDCR_PROV_ANLYTC_PATCH', td_metadata, autoload=True, autoload_with=td_engine, schema=SCHEMA)
    xwalk = Table('MEDCR_ANLYTC_PADRS_XWALK', td_metadata, autoload=True, autoload_with=td_engine, schema=SCHEMA)

    prod_id3 = 'ADY ADZ AET AEY ALW ALX ALY ALZ ANH ANI ANJ ANK AVC AVH AVT AVY TCA TCB TCC TCN '\
    'TCO TCP TCS TCT TCU TCV TCG AUS TCI AUC AUU AUT AXF AYE TCJ TCH AXA WCA WCB WCC '\
    'WCD WCE WCK WCL WCM WCN'.split()
    prod_id = 'HXBF HXDJ HXCH HXEE HXBG HXCR'.split()
    prod_id_bad = 'AVYW0009 AVYW0010'.split()

    ntwk = case([(or_(and_(func.substr(lne.c.PROD_ID, 1, 3).in_(prod_id3),
                           ~lne.c.PROD_ID.in_(prod_id_bad)),
                      lne.c.PROD_ID.in_(prod_id)), 'Blue Preferred'),
                 (coa.c.CMPNY_CF_CD == 'G0423', 'Blue Preferred'),
                 (lne.c.PROD_ID == 'MIMC', 'Medicaid PPO'),
                 (and_(lne.c.PROD_ID.like('%HX%'),
                       coa.c.MBU_CF_CD.like('IN%')), 'HIX'),
                 (or_(coa.c.PROD_CF_CD.like('%MA'),
                      coa.c.PROD_CF_CD.like('%MS')), 'Medicare Advantage PPO')],
                else_='Blue Access')

    sql = select([clm.c.CLM_NBR.label('CLMNBR'),
                  clm.c.CLM_SOR_CD,
                  clm.c.SRC_SBSCRBR_ID.label('SUBSCRIBER_ID'),
                  func.substr(clm.c.NTWK_ID, 1, 12).label('NETWORKID'),
                  ntwk.label('Network'),
                  case([(clm.c.CLM_SOR_CD == '896', lne.c.RNDRG_PROV_ID)],
                       else_=clm.c.SRC_BILLG_TAX_ID).label('TAXID'),
                  xwalk.c.PROV_ST_CD,
                  clm.c.SRC_PRCHSR_ORG_NM]).distinct()
    sql = sql.select_from(clm.join(lne, clm.c.CLM_ADJSTMNT_KEY == lne.c.CLM_ADJSTMNT_KEY)
                          .join(clm_max_rvsn, clm_max_rvsn.c.CLM_ADJSTMNT_KEY == clm.c.CLM_ADJSTMNT_KEY)
                          .outerjoin(coa, and_(lne.c.CLM_ADJSTMNT_KEY == coa.c.CLM_ADJSTMNT_KEY,
                                               lne.c.CLM_LINE_NBR == coa.c.CLM_LINE_NBR))
                          .outerjoin(patch, patch.c.CLM_ADJSTMNT_KEY == clm.c.CLM_ADJSTMNT_KEY)
                          .outerjoin(xwalk, xwalk.c.RPTG_MEDCR_ID == patch.c.RPTG_MEDCR_ID))
    sql = sql.where(and_(clm.c.SRVC_RNDRG_TYPE_CD.in_('FANCL HOSP'.split()),
                         clm.c.CLM_SOR_CD != '1104',
                         or_(clm.c.SRC_BILLG_TAX_ID.in_(tin),
                             func.substr(lne.c.RNDRG_PROV_ID, 1, 9).in_(tin)),
                         clm.c.CLM_ITS_HOST_CD != 'HOME',
                         lne.c.CLM_LINE_SRVC_STRT_DT.between(start_d, end_d),
                         lne.c.ADJDCTN_DT.between(from_d, thru_d),
                         clm.c.ADJDCTN_DT.between(from_d, thru_d)))
    sql = sql.order_by(sql.c.CLMNBR, clm.c.CLM_SOR_CD)
    return sql

ans = pd.DataFrame()
for tin in tin_lst:
    with td_engine.connect() as cnxn:
        tmp = pd.read_sql(td_sql(tin), cnxn)
    ans = pd.concat([ans, tmp], reset_index=True, sort=False)

This next snippet is a bit where I take two dataframes (SQL pulls), merge and clean them. To be honest, this is someones SAS that I rewrote and validated. They saved a lot of subsequent tables. I instead chained everything together without refactoring.

# Combines the calculation for all subsequent steps.
cols = ['CLMNBR', 'SUBSCRIBER_ID', 'TAXID', 'NETWORKCODE', 'NETWORKS']
lst = []
for state in states:
    lst.append(f'{state} MA PPO ITS')
    lst.append(f'{state} MA PPO' )

(facility.loc[facility.PROV_ST_CD.str.strip().isin(states)]
 .merge(pricer_union, how='left', on=['CLMNBR'])
 .rename({'SUBSCRIBER_ID_y': 'SUBSCRIBER_ID'}, axis=1)
 .drop('SUBSCRIBER_ID_x', axis=1)
 .assign(NETWORKCODE=lambda x: np.where(x.CLM_SOR_CD.str.strip() == '823', x.NETWORKID, x.networkcode))
 .drop_duplicates()
 .drop('networkcode', axis=1)
 .merge(m, how='left', left_on='NETWORKCODE', right_on='NWNW_ID')
 .drop_duplicates()
 .drop('NWNW_ID', axis=1)
 .rename({'NETWORK': 'NETWORKNAME'}, axis=1)
 .assign(NETWORKS=lambda x: np.where(x.NETWORKNAME.isin([' ', np.nan]), x.Network, x.NETWORKNAME))
 .drop(['Network', 'NETWORKNAME'], axis=1)
 .merge(m, how='left', left_on='NETWORKS', right_on='NETWORK')
 .drop(['NETWORKCODE', 'NETWORK', 'CLM_SOR_CD', 'PROV_ST_CD', 'NETWORKID'], axis=1)
 .rename({'NWNW_ID': 'NETWORKCODE'}, axis=1)
 .assign(NETWORK=lambda x: np.where(x.SRC_PRCHSR_ORG_NM.isin(lst), 'Medicare Advantage PPO', x.NETWORKS))
 .drop('NETWORKS', axis=1)
 .rename({'NETWORK': 'NETWORKS'}, axis=1)
 .groupby('CLMNBR')['SUBSCRIBER_ID TAXID NETWORKCODE NETWORKS SRC_PRCHSR_ORG_NM'.split()].first().reset_index()
 .drop('SRC_PRCHSR_ORG_NM', axis=1))[cols]

I think both of these are beautiful and I can give this to virtually anyone and they’d know what’s going on.

What do these two things look like in Julia? From what I’ve seen, nowhere near as elegant. My hope is I’m missing the boat.

Thank you everyone for your thoughtfulness and expertise.

2 Likes

There is an SQLAlchemy.jl package. It hasn’t been updated in a while, but the syntax you can use isn’t too far from your Python code:

db(select([artists[:Name],
             func("count", albums[:Title]) |> label("# of albums")]) |>
     selectfrom(join(artists, albums)) |>
     groupby(albums[:ArtistId]) |>
     orderby(desc("# of albums"))) |> fetchall
1 Like