Get index corresponding to some number in list of outputs

What is this .\raw ?

in the same page of the url there are the links for the json view and the raw view

this script could (*) handle a generality of cases (at least the tables that in the β€œraw” view appear well formatted, with columns separated by spaces) where some variables are missing for some observations.
In the case of the header on two lines, you could fix it afterwards by hand.

(*) I have not tested other cases. Other complications could arise from the presence of non-ASCII characters, so textwidth and ncodeunits do not have the same value.

using CSV, DataFrames, HTTP
url="https://gcn.nasa.gov/circulars/34049"
txt=String((HTTP.get(url)))
#m treats the ^ and $ tokens as matching the start and end of individual lines, as opposed to the whole string.
hb,he=findfirst(r"^Filter"im,txt)
lr,_=findnext("\n\nThe",txt,he).-1

cltxt=replace(txt[hb:lr], ">"=>">")
ls=split(cltxt,'\n')
lls=ncodeunits.(ls)
ml=maximum(lls[2:end])
adjls=rpad.(ls,ml,' ')

spl=[findall(r"\s\s+\S|\s\s+$"m, adjls[i]) for i in eachindex(ls)]

function splitrange(rng,n,m)
    s=[first(rng[n]):m-1,m:last(rng[n])]
    n==1 ? [s;rng[n+1:end]] : [rng[1:n-1];s;rng[n+1:end]]
end

function clrng(spl)
    mr=maximum(length,spl)
    for i in 1:mr-1
        m=minimum([first(e[i+1]) for e in filter(sp->length(sp)>=i+1,spl)])
        id=findall(>=(m),  [last(e[i]) for e in spl])
        [spl[idn]=splitrange(spl[idn],i,m) for idn in id]
    end 
end

clrng(spl)

pts=[intersect(vr...) for vr in zip(spl...)]

using IterTools
rr=Base.splat(:).(partition(sort([1;first.(pts); last.(pts); ml]),2))

res=join([join(strip.(getindex.([adjls[i]],rr)),'\t').*'\n' for i in eachindex(adjls)])
julia> df=CSV.read(IOBuffer(res), DataFrame, delim='\t')
7Γ—5 DataFrame
 Row β”‚ FILTER    EXP(s)   MAG         Significance of  Upper L  
     β”‚ String3?  Int64?   String15?   String15?        String7?
─────┼──────────────────────────────────────────────────────────
   1 β”‚ missing   missing  missing     Detection        missing
   2 β”‚ v             157  missing     missing          >19.6
   3 β”‚ b             157  missing     missing          >20.6
   4 β”‚ u             157  20.6 Β± 0.5  2.1 sigma        >20.1
   5 β”‚ w1            315  20.4 Β± 0.5  2.1 sigma        >20.0
   6 β”‚ m2           1489  20.8 Β± 0.3  3.9 sigma        missing
   7 β”‚ w2            629  21.0 Β± 0.4  2.5 sigma        >20.7


but no. it seems that the CSV.jl package is able to handle multiline headers, although something needs to be fixed :slight_smile:

julia> df=CSV.read(IOBuffer(res), DataFrame, delim='\t', header=[1,2])
6Γ—5 DataFrame
 Row β”‚ FILTER_Column1  EXP(s)_Column2  MAG_Column3  Significance of_Detection  Upper L_Column5 
     β”‚ String3         Int64           String15?    String15?                  String7?
─────┼─────────────────────────────────────────────────────────────────────────────────────────
   1 β”‚ v                          157  missing      missing                    >19.6
   2 β”‚ b                          157  missing      missing                    >20.6
   3 β”‚ u                          157  20.6 Β± 0.5   2.1 sigma                  >20.1
   4 β”‚ w1                         315  20.4 Β± 0.5   2.1 sigma                  >20.0
   5 β”‚ m2                        1489  20.8 Β± 0.3   3.9 sigma                  missing
   6 β”‚ w2                         629  21.0 Β± 0.4   2.5 sigma                  >20.7

The basic idea is to homogenize the ranges of spaces in each line.
Then you slice the table vertically staying inside the common spaces for each column.
Finally you delete the trailing and leading spaces and put it all back together with the TAB and CRLF in the right place.


julia> spl=[findall(r"\s\s+\S|\s\s+$"m, adjls[i]) for i in eachindex(ls)]
8-element Vector{Vector{UnitRange{Int64}}}:
 [7:9, 15:17, 20:31, 46:49]
 [1:33, 42:55]
 [2:9, 12:50]
 [2:9, 12:50]
 [2:9, 12:17, 28:33, 42:51]
 [3:9, 12:17, 28:33, 42:51]
 [3:8, 12:17, 28:33, 42:56]
 [3:9, 12:17, 28:33, 42:51]

julia> clrng(spl)

julia> spl
8-element Vector{Vector{UnitRange{Int64}}}:
 [7:9, 15:17, 20:31, 46:49]
 [1:11, 12:19, 20:33, 42:55]
 [2:9, 12:19, 20:41, 42:50]
 [2:9, 12:19, 20:41, 42:50]
 [2:9, 12:17, 28:33, 42:51]
 [3:9, 12:17, 28:33, 42:51]
 [3:8, 12:17, 28:33, 42:56]
 [3:9, 12:17, 28:33, 42:51]

julia> pts=[intersect(vr...) for vr in zip(spl...)]
4-element Vector{UnitRange{Int64}}:
 7:8
 15:17
 28:31
 46:49

julia> rr=Base.splat(:).(partition(sort([1;first.(pts); last.(pts); ml]),2))
5-element Vector{UnitRange{Int64}}:
 1:7
 8:15
 17:28
 31:46
 49:55
1 Like

Slot of Filter column can’t be empty so we can ignore those rows of whom Filter column slot is empty. :thinking: See below Swift/UVOT page.