I’ve recently started a side project at work to see if I can use `julia`

to replace `python`

for the majority of my data processing due to the possible gains in performance. Over the years the python code has become quite quick thanks to `numpy`

and `numba`

so I’ve been quite curious as to what `julia`

could bring to the party.

So I’ve done a very basic inital port of the code to get some of the basic processes operational. These are proabably not very well wrtten `julia`

code, but they are giving the correct results and already seem to be almost twice as quick as the python code. I’ll describe a few basic functions and hopefully someone can critique them and see if there’s some simple tweaks that could be made to make things quicker. Asking for guidance early on is usually a good way to ensure time is not wasted.

I’m reading my data from HDF5 file so I have a basic function that reads the required datasets:

```
function get_generic_property(filepath::String, property::String)
fid = h5open(filepath, "r")
timestep_key = keys(fid["Data"])[1]
query = read(fid["Data"][timestep_key]["groupA"][property])
close(fid)
return query
end
```

So most of the functions have maybe several calls to get the raw data, depending on what’s being processed. Other’s just one, but I don’t thinks there’s any optimisation that can be done here as it’s just using `HDF5.jl`

.

The first processing function is like this:

```
function normalize_by_row(arr)
arr ./ sqrt.(sum(arr.^2, dims=2))
end
function getFabricTensor(filepath::String)
cv1 = get_generic_property(filepath, "cvecs")
norm_vec = normalize_by_row(cv1)
tensor_prod = norm_vec' * norm_vec
return tensor_prod / size(norm_vec)[1]
end
```

where cv1 is `nx3`

array of `Float64`

and `n`

is typically in the region of 50-100K.

The second function is calculating some location indices and then looking up the correct result based on several datsets read from the HDF5 file:

```
function getlocIDs(filepath::String)
# read data
indxs = get_generic_property(filepath, "loc_indx")
type_elements, type_IDs = _gettypeData(filepath)
# all types will have at least 1 element so generate a vector of one
num_elems1 = ones(Int, size(indxs)[1])
num_elems2 = ones(Int, size(indxs)[1])
# update the number of elements by looping through Dict.
# I probably should include a check to see if the number of elements for the type is >1 before replacing
for type in type_elements
num_elems1[indxs[:, 1] .== type_elements.first] .= type_elements.second
num_elems2[indxs[:, 3] .== type_elements.first] .= type_elements.second
end
# calculate correct index
indexNumber1 = ((indxs[:, 2] - (indxs[:, 2] .% num_elems1)) .÷ num_elems1).+1
indexNumber2 = ((indxs[:, 4] - (indxs[:, 4] .% num_elems2)) .÷ num_elems2).+1
# add 1 to convert to julia indexing and extract correct ID
ids1 = getindex.(Ref(type_IDs ), indexNumber1, indxs[:, 1].+1)
ids2 = getindex.(Ref(type_IDs ), indexNumber2, indxs[:, 3].+1)
# concatente the return into a matrix
return hcat(ids1, ids2)
end
```

`type_elements`

is a `Dict()`

where each key is a string name storing an `Int`

value for the number of elements in that object type. `type_IDs `

is a `mxn`

`Matrix`

of `Int`

s where `m`

is the maximum number of elements for any of the types and `n`

is the number of types in `type_elements`

. This is typically 2-3 types. `type_IDs`

may have parts or the column where the data is a `NaN`

, since not all types will have the same number of elements.

`indxs`

is a `px4`

array, where `p`

is typically about 6x larger than `n`

.

Both functions run fine and return results that are correct, but I’m trying to understand `julia`

and how to write quick code and I’m wondering about things like `@inbounds`

checking, memory allocations and type stability.

I think all my functions are type stable. I may have a few more memory allocations than is necessary. What about function size - is it better to create lots of smaller functions? Should I break `getlocIDs()`

into multiple smaller specialised functions, like for calculating `num_elems1`

/`num_elems2`

?

Thanks in advance.