Byrow function to get the mean of row

I want to get the row wise mean of two columns of data using byrow function. These two columns are T_start(s) and T_stop(s).
image
How should i change code given below ? :point_down:

using HTTP,CSV,DataFrames,InMemoryDatasets
begin
	for x in 33683
            url="https://gcn.nasa.gov/circulars/$x"
            txt=String((HTTP.get(url)))
        if occursin("report on behalf of the Swift/UVOT team",txt)
             hb,he=findfirst(r"^Filter"im,txt)
			 lr,_=findnext("\n\nThe",txt,he)
			 cltxt=replace(txt[hb:lr], " +/- "=>s"\t", r"  +(\w)"=>s"\t\1" ,r"  +(>)"=>s"\t",r"\+/?- ?"=>s"\t")
			df=CSV.read(IOBuffer(cltxt), DataFrame, delim='\t')
			if "Column6" in names(df); rename!(df, :Column6 => :Mag_err); end
			byrow(df ,sum ,2:3)
			@show df
        end
    end
end

I am getting error shown below :


What are the other ways to get the row-wise mean of two columns of data ?

df.T_mean = (df."T_start(s)" .+ df."T_stop(s)") ./ 2
1 Like

I think this would be too slow in comparison to byrow function. I have run this code many times.

It strikes me as exceedingly unlikely that an element wise addition of two vectors will be the bottleneck in your problem - have you benchmarked/profiled this?

2 Likes

No, I couldn’t use byrow as it is giving error. By using your code it takes 1.2 second.

Have not worked with InMemoryDatasets, but skimming the docs it seems to use its own data type Dataset instead of DataFrames.DataFrame. Thus, you will probably need to convert your data frame in order to use byrow, i.e., the error is that no method is defined for a DataFrame.

1 Like

The equivalent in DataFrames would be something like

transform!(df, AsTable(2:3) => ByRow(mean))
# or
map(mean, eachrow(select(df, 2:3)))

This is not working as both packages contain select.

Oh, It requires to convert DataFrame to Dataset.

df=CSV.read(IOBuffer(cltxt), DataFrame, delim='\t')
g=Dataset(df)

but it does not show mean column :face_with_diagonal_mouth:

@time m=byrow(g ,mean ,2:3)
df.m
@show df

and byrow is slower than df.T_mean = (df."T_start(s)" .+ df."T_stop(s)") ./ 2 code.
image

When this happens, you can do DataFrames.select

1 Like

You have to set it, they’re different objects, and it doesn’t look like Datasets byrow is mutating in any case. Try Eg, df.m = byrow(...