Scrap table from NASA GCN circulars website

I want scrap table from these websites which have content separated by | so i need some way to parse string from | . if occursin("MASTER", txt) is used to pick only Master satellite website.

The code that i have prepared so far is :

using HTTP,DataFrames,CSV

function doanalysis()
for x in 34010:34038 
    print("\r peeking at $x ")
        url = "$x"
        resp = HTTP.get(url) 
        print(" ",status," ");
        if status == 404 ; println("status=",status); continue; end          
        txt = String(resp.body)
        if occursin("MASTER", txt)
            println(" Master report")
            hb, he = findfirst(r"^Tmid-T0 "im, txt)
            lr, _ = findnext("\n\nThe", txt, he)
            cltxt = replace(txt[hb:lr], " +/- " => "+/-", r"  +(\w)" => s"\t\1", r"  +(>)" => s"\t>")
            cltxt = replace(cltxt,">" => "\t>")
            # println("cltxt=");print(cltxt)
            df =, DataFrame, delim='\t')
            df.x=[x for i in 1:nrow(df)]
            if isnothing(dfg) # x == 33037
            end # if x is first
        end # if occursin
    catch e
        println("error ")
    end # trycatch
end # for loop
if !isnothing(dfg)
    @info "no dfg to write"
end # !isnothing
end # function doanalysis

I suggest adding some additional context/detail to your MWE.

Because in this state, those who want to help need to run the code and spend a great deal of time checking if the result is correct (by comparing the result with lots of tables from all those URLs).


  • try to indicate what goes wrong (vs. your expectations/goal)
  • are there variations in the tables, and you need help with parsing?
  • is the code producing a specific error that you need help fixing?
1 Like

you have to specify the words that delimit the table, otherwise they look more โ€œcleanโ€ tables than the others.

using CSV, DataFrames, HTTP
#m treats the ^ and $ tokens as matching the start and end of individual lines, as opposed to the whole string.


julia>, DataFrame, delim='|', skipto=3)
9ร—8 DataFrame
 Row โ”‚ Tmid-T0          Date Time                  Site                      Coord (J2000)            Filt.     Expt.    Limit    Comment 
     โ”‚ Int64      String31               String31               String                                String7  Int64    Float64  String15
   1 โ”‚     22662   2023-06-19 09:49:13                MASTER-    (23h 42m 22.90s , +81d 39m 09.5s)       C         180     18.7
   2 โ”‚     22862   2023-06-19 09:52:33                MASTER-    (00h 35m 00.64s , +81d 38m 18.1s)       C         180     16.6
   3 โ”‚     23062   2023-06-19 09:55:53                MASTER-    (23h 45m 21.09s , +79d 46m 41.0s)       C         180     18.5
   4 โ”‚     23263   2023-06-19 09:59:13                MASTER-    (00h 28m 33.71s , +79d 44m 16.3s)       C         180     16.1
   5 โ”‚     23463   2023-06-19 10:02:34                MASTER-    (23h 41m 51.58s , +81d 41m 09.0s)       C         180     18.4
   6 โ”‚     23664   2023-06-19 10:05:55                MASTER-    (00h 34m 39.01s , +81d 38m 46.3s)       C         180     16.4
   7 โ”‚     23865   2023-06-19 10:09:15                MASTER-    (23h 44m 51.68s , +79d 45m 15.0s)       C         180     18.5
   8 โ”‚     23938   2023-06-19 10:10:29            MASTER-OAFA    (04h 21m 32.87s , -20d 16m 47.6s)       C         180     12.5
   9 โ”‚     24065   2023-06-19 10:12:36                MASTER-    (00h 28m 16.55s , +79d 43m 58.3s)       C         180     16.3
julia>, DataFrame, delim='|', skipto=3, dateformat="yyyy-mm-dd HH:MM:SS")
9ร—8 DataFrame
 Row โ”‚ Tmid-T0          Date Time                  Site                      Coord (J2000)            Filt.     Expt.    Limit    Comment 
     โ”‚ Int64      Dates.DateTime         String31               String                                String7  Int64    Float64  String15
   1 โ”‚     22662  2023-06-19T09:49:13                 MASTER-    (23h 42m 22.90s , +81d 39m 09.5s)       C         180     18.7
   2 โ”‚     22862  2023-06-19T09:52:33                 MASTER-    (00h 35m 00.64s , +81d 38m 18.1s)       C         180     16.6
   3 โ”‚     23062  2023-06-19T09:55:53                 MASTER-    (23h 45m 21.09s , +79d 46m 41.0s)       C         180     18.5
   4 โ”‚     23263  2023-06-19T09:59:13                 MASTER-    (00h 28m 33.71s , +79d 44m 16.3s)       C         180     16.1
   5 โ”‚     23463  2023-06-19T10:02:34                 MASTER-    (23h 41m 51.58s , +81d 41m 09.0s)       C         180     18.4
   6 โ”‚     23664  2023-06-19T10:05:55                 MASTER-    (00h 34m 39.01s , +81d 38m 46.3s)       C         180     16.4
   7 โ”‚     23865  2023-06-19T10:09:15                 MASTER-    (23h 44m 51.68s , +79d 45m 15.0s)       C         180     18.5
   8 โ”‚     23938  2023-06-19T10:10:29             MASTER-OAFA    (04h 21m 32.87s , -20d 16m 47.6s)       C         180     12.5
   9 โ”‚     24065  2023-06-19T10:12:36                 MASTER-    (00h 28m 16.55s , +79d 43m 58.3s)       C         180     16.3
1 Like

Many earlier GCN donโ€™t have | as delimiter. How to make code effective for them also ? Can you send me some links about tokens in Julia \n, \w ,\s etc. ? I am still having difficulty in learning regular expressions even after reading Julia manual. My final code looks like :dizzy:

using HTTP,DataFrames,CSV
function doanalysis()
       for x in 34020:34038 
           print("\r peeking at GCN $x ")
               url = "$x/raw"
               resp = HTTP.get(url) 
               print(" ",status," "); 
               if status == 404 ; println("status=",status); continue; end          
               txt = String(resp.body)
               if occursin("V. Lipunov", txt)
                   println(" MASTER report")
         , DataFrame, delim='|', skipto=3)
                   df.x=[x for i in 1:nrow(df)]
                   if isnothing(dfg) 
                   end # if x is first
               end # if occursin
           catch e
               println("error ")
           end # trycatch
       end # for loop
       if !isnothing(dfg)
           @info "no dfg to write"
       end # !isnothing
       end # function doanalysis

Can i use readuntil function instead of above lr line ?

although Iโ€™ve heard good things about it, Iโ€™ve never had a chance to use this feature.
From what I understand unlike findall (which gives the starting point of the part of the string you are interested in) it captures a string from the start of the stream up to the delimiter used (ie the part of the string implicitly dropped off by findfirst).


May be used after the first findfirst โ€ฆ

# lr=first(findnext("\nFilter",txt,he))-1
# cltxt=txt[he:lr]

@rocco_sprmnt21 @rafael.guerra @algunion
Can you please tell the way to scrap tabular data from GROND telescope ? My code for GCN 21200 looks like :point_down:

using HTTP , CSV, DataFrames
function doanalysis()
                            for x in 21200
                          print("\r peeking at GCN $x ")
                             url = "$x/raw"
                             resp = HTTP.get(url) 
                             print(" ",status," "); 
                             if status == 404 ; println("status=",status); continue; end          
                             txt = String(resp.body)
                             if occursin(r"GRB ?\d{6}([A-G]|(\.\d{2}))?",txt)
				                  m=match(r"GRB ?\d{6}([A-G]|(\.\d{2}))?",txt)

                             if occursin("GROND observations", txt)
                                 println("GROND report")
                                 he=first(findfirst(r"^(g'|r'|i'|z'|J|H|K) ="im,txt))
                                lr=first(findnext(r"^(?:[\t ]*(?:\r?\n|\r))+"im,txt,he))
                       , DataFrame, delim='\t', skipto=3 ,header=0)
                                 df.GCN=[x for i in 1:nrow(df)]
                                 df.GRB=[m.match for i in 1:nrow(df)]
                                 if isnothing(dfg) 
				                      if isnothing(dfg) 
                                          @show dfg=vcat(dfg,df)
                                      end # if x is first
        end # if occursin
                         catch e
                             println("error ")
                         end # trycatch
                     end # for loop

but i want to generalise it for other GROND telescope data extraction. It is printing two times :upside_down_face: .

If I understand correctly what you are looking for, I think these are the right tools.
Iโ€™ve never used them so far

Hey @raman_kumar, I am answering this because you mentioned me in your post.

I am eager to answer any of these types of questions:

  1. Julia-related questions (e.g., you donโ€™t understand some behavior/concept, getting an error, etc.)
  2. Juliaโ€™s package-related questions (missing documentation, errors, package suggestions, etc.)

Usually, if not a conceptual question, you must include certain MWE that fails or produces some unexpected output.

However, in your scenario, it seems like your above request doesnโ€™t fit any of the above scenarios: to answer this successfully, somebody needs to go and do the work of understanding the HTML structure and then think about a way to parse that (and finally either work to adapt your code or provide specific instructions concerning the code you should write).

Maybe others might find this a legitimate question and might want to invest the time and answer you - however, my feeling is that it is always better to help somebody by contributing to refining their fishing tools instead of doing the fishing (or even part of the fishing) for them.

My advice for you is to go deeper into the structure of the HTML you want to parse and then start working on adapting your existing code - if your attempt fails and you get either errors or unexpected results, we might take it from there and answer targeted issues with your work.

From what I was able to conclude from 1-2 HTML files inspections, the content is entirely as raw text under a single div - so HTML/CSS related tools will not help much past retrieving the big text chunk - so you might actually need to parse the text and extract the relevant data in the desired format.


I want to search text starting from gโ€™ to end of K row in below image

Please tell me about changes i need to do in code below.

 lr=first(findnext(r"^(?:[\t ]*(?:\r?\n|\r))+"im,txt,he))


you could try like this, but I donโ€™t know if itโ€™s generic (but specific) enough for all your cases.
You have to see it with a little patience

using HTTP, CSV, DataFrames
function doanalysis()
    for x in 21200
    print("\r peeking at GCN $x ")
            url = "$x/raw"
            resp = HTTP.get(url) 
            print(" ",status," "); 
            if status == 404 ; println("status=",status); continue; end          
            txt = String(resp.body)
            if occursin(r"GRB ?\d{6}([A-G]|(\.\d{2}))?",txt)
				m=match(r"GRB ?\d{6}([A-G]|(\.\d{2}))?",txt)

            if occursin("GROND observations", txt)
                println(" GROND report")                
                lr=first(findnext(r"^(?:[\t ]*(?:\r?\n|\r))+"m,txt,he))
                cltxt=replace(txt[he:lr], r" ?(=|>)"=>"|" , "+/-"=>"|")
      , DataFrame, delim="|" ,header=0)
                df.GCN=[x for i in 1:nrow(df)]
                df.GRB=[m.match for i in 1:nrow(df)]                  
				if isnothing(dfg) 
                    @show dfg=df
                    @show dfg=vcat(dfg,df)
                end # if x is first
            end # if occursin
        catch e
            println("error ")                    
        end # trycatch
    end # for loop

give output shown below :point_down: GCN 21200

Missing some lines was due to kwarg skip=3, the duplication depends on the if then else you put most likely.
Write a script with no checksums, test that it does what you want, then add the checks bit by bit and test them one by one.

julia> function doanalysis()
                                   for x in 30574
                                 print("\r peeking at GCN $x ")
                                    url = "$x/raw"
                                    resp = HTTP.get(url)
                                    print(" ",status," ");
                                    if status == 404 ; println("status=",status); continue; end
                                    txt = String(resp.body)
                                    if occursin(r"GRB ?\d{6}([A-G]|(\.\d{2}))?",txt)
                                                  m=match(r"GRB ?\d{6}([A-G]|(\.\d{2}))?",txt)

                                    if occursin("GROND observations", txt)
                                        println("GROND report")


                              , DataFrame, delim='\t' ,header=0)
                                        df.GCN=[x for i in 1:nrow(df)]
                                        df.GRB=[m.match for i in 1:nrow(df)]
julia> doanalysis()
 peeking at GCN 30574  200 GRB 210731AGROND report
dfg = vcat(dfg, df) = 14ร—3 DataFrame
 Row โ”‚ Column1                       GCN    GRB
     โ”‚ String31                      Int64  SubStrinโ€ฆ
   1 โ”‚ g' = 18.71 +/- 0.01 mag,      30574  GRB 210731A
   2 โ”‚ r' = 18.44 +/- 0.01 mag,      30574  GRB 210731A
   3 โ”‚ i' = 18.19 +/- 0.01 mag,      30574  GRB 210731A
   4 โ”‚ z' = 18.01 +/- 0.01 mag,      30574  GRB 210731A
   5 โ”‚ J  = 17.66 +/- 0.02 mag,      30574  GRB 210731A
   6 โ”‚ H  = 17.38 +/- 0.02 mag, and  30574  GRB 210731A
   7 โ”‚ K  = 17.14 +/- 0.15 mag       30574  GRB 210731A
   8 โ”‚ g' = 18.71 +/- 0.01 mag,      30574  GRB 210731A
   9 โ”‚ r' = 18.44 +/- 0.01 mag,      30574  GRB 210731A
  10 โ”‚ i' = 18.19 +/- 0.01 mag,      30574  GRB 210731A
  11 โ”‚ z' = 18.01 +/- 0.01 mag,      30574  GRB 210731A
  12 โ”‚ J  = 17.66 +/- 0.02 mag,      30574  GRB 210731A
  13 โ”‚ H  = 17.38 +/- 0.02 mag, and  30574  GRB 210731A
  14 โ”‚ K  = 17.14 +/- 0.15 mag       30574  GRB 210731A

No, Please see my final code in last edited post. Now, my code is working fine. :laughing: Your code is still giving double table- 14 rows instead of actual 7 row in text.

1 Like

Your code is not parsing properly for GCN 21200. :rofl: :joy: :stuck_out_tongue_winking_eye: and mine code is not working for GCN 32383.

using HTTP , CSV, DataFrames, JSON3

julia> function scrapjson(gcn)
           gresp = HTTP.get(grurl)
           he=first(findfirst(r"\n\n^( *\w')"im,txt))+2
 , DataFrame, header=0)
           if startswith(js1.subject,"GRB")
               df.GRB .= readuntil(IOBuffer(js1.subject), ',')
               grb=findfirst(r"GRB *\d+\w",js1.subject)
               df.GRB .= js1.subject[grb]
           df.GCN .= gcn
           "Column2" โˆˆ names(df) ? df[:,Not(:Column2)] : df
scrapjson (generic function with 1 method)

julia> df1=scrapjson(31522)
6ร—3 DataFrame
 Row โ”‚ Column1     GRB          GCN   
     โ”‚ String15    String       Int64
   1 โ”‚ g' > 23.2   GRB 220117A  31522
   2 โ”‚ r' > 23.4   GRB 220117A  31522
   3 โ”‚ i' > 23.0   GRB 220117A  31522
   4 โ”‚ J  > 21.4   GRB 220117A  31522
   5 โ”‚ H  > 21.1   GRB 220117A  31522
   6 โ”‚ K  > 19.3.  GRB 220117A  31522

julia> df2=scrapjson(32383)
7ร—3 DataFrame
 Row โ”‚ Column1                          GRB          GCN   
     โ”‚ String31                         String       Int64
   1 โ”‚   g' > 23.0                      GRB 220711B  32383
   2 โ”‚   r' > 23.5                      GRB 220711B  32383
   3 โ”‚   i' > 23.2                      GRB 220711B  32383
   4 โ”‚   z' > 19.8                      GRB 220711B  32383
   5 โ”‚   J  > 21.7                      GRB 220711B  32383
   6 โ”‚   H  > 21.1                      GRB 220711B  32383
   7 โ”‚   K  > 20.1  (AB mag; 3 sigma).  GRB 220711B  32383

julia> df3=scrapjson(21200)
7ร—3 DataFrame
 Row โ”‚ Column1              GRB         GCN   
     โ”‚ String31             String      Int64
   1 โ”‚ g' = 20.75 +/- 0.08  GRB170604A  21200
   2 โ”‚ r' = 20.46 +/- 0.05  GRB170604A  21200
   3 โ”‚ i' = 20.35 +/- 0.06  GRB170604A  21200
   4 โ”‚ z' = 20.21 +/- 0.08  GRB170604A  21200
   5 โ”‚ J = 19.6 +/- 0.1     GRB170604A  21200
   6 โ”‚ H = 19.6 +/- 0.3     GRB170604A  21200
   7 โ”‚ K > 19.6             GRB170604A  21200
1 Like

I tried to automate the table capture operation a little more, using these packages (maybe someone who has experience of how to do these things, can intervene to give some indications on how to do it โ€œbetterโ€).
Since the source of the data is very โ€œmessyโ€, there is still a little tinkering to manage the โ€œrecoverableโ€ situations

urlb="" # page=1&limit=100


using HTTP , CSV, DataFrames, JSON3

using Cascadia, Gumbo

gresp = HTTP.get(urlb)
h = parsehtml(String(gresp.body)) 

s1=sel"ol li"  # buona questa!

qs = eachmatch(s1,h.root)

res=Tuple{String, String}[]

for q in qs
    if contains(txt,"GROND") && contains(txt,"GRB")
    push!(res, (q.attributes["value"],txt))


function scrapjson(gcn)
    gresp = HTTP.get(grurl)
    he=first(findfirst(r"\n\n^( *g'*)"m,txt))+2
    cltxt=replace(txt[he:lr],' '=>""), DataFrame, header=0)
    ptn=r"(GRB *\d+\w)[:|,]*"
    df.GRB .= match(ptn,js1.subject).match
    df.GCN .= gcn
    "Column2" โˆˆ names(df) ? df[:,Not(:Column2)] : df


for (gcn, _) in res[1:25]
    try println(scrapjson(gcn)) catch e; println("\n"*gcn*"--->NOK\n") end

Some results
julia> for (gcn, _) in res[1:25]
           try println(scrapjson(gcn)) catch e; println("\n"*gcn*"--->NOK\n") end
7ร—3 DataFrame
 Row โ”‚ Column1                GRB          GCN    
     โ”‚ String31               SubStrinโ€ฆ    String
   1 โ”‚ g'>23.0                GRB 220711B  32383
   2 โ”‚ r'>23.5                GRB 220711B  32383
   3 โ”‚ i'>23.2                GRB 220711B  32383
   4 โ”‚ z'>19.8                GRB 220711B  32383
   5 โ”‚ J>21.7                 GRB 220711B  32383
   6 โ”‚ H>21.1                 GRB 220711B  32383
   7 โ”‚ K>20.1(ABmag;3sigma).  GRB 220711B  32383
6ร—3 DataFrame
 Row โ”‚ Column1  GRB          GCN    
     โ”‚ String7  SubStrinโ€ฆ    String
   1 โ”‚ g'>25.2  GRB 220706A  32339
   2 โ”‚ r'>24.9  GRB 220706A  32339
   3 โ”‚ i'>24.2  GRB 220706A  32339
   4 โ”‚ J>21.7   GRB 220706A  32339
   5 โ”‚ H>21.0   GRB 220706A  32339
   6 โ”‚ K>19.9.  GRB 220706A  32339
7ร—3 DataFrame
 Row โ”‚ Column1          GRB          GCN    
     โ”‚ String15         SubStrinโ€ฆ    String
   1 โ”‚ g'=23.31+/-0.11  GRB 220627A  32304
   2 โ”‚ r'=22.70+/-0.06  GRB 220627A  32304
   3 โ”‚ i'=22.50+/-0.12  GRB 220627A  32304
   4 โ”‚ z'=22.23+/-0.21  GRB 220627A  32304
   5 โ”‚ J>21.4           GRB 220627A  32304
   6 โ”‚ H>20.6           GRB 220627A  32304
   7 โ”‚ K>19.7.          GRB 220627A  32304
6ร—3 DataFrame
 Row โ”‚ Column1  GRB          GCN    
     โ”‚ String7  SubStrinโ€ฆ    String
   1 โ”‚ g'>23.2  GRB 220117A  31522
   2 โ”‚ r'>23.4  GRB 220117A  31522
   3 โ”‚ i'>23.0  GRB 220117A  31522
   4 โ”‚ J>21.4   GRB 220117A  31522
   5 โ”‚ H>21.1   GRB 220117A  31522
   6 โ”‚ K>19.3.  GRB 220117A  31522
7ร—3 DataFrame
 Row โ”‚ Column1  GRB          GCN    
     โ”‚ String7  SubStrinโ€ฆ    String
   1 โ”‚ g'>24.8  GRB 211106A  31069
   2 โ”‚ r'>25.0  GRB 211106A  31069
   3 โ”‚ i'>24.0  GRB 211106A  31069
   4 โ”‚ z'>22.0  GRB 211106A  31069
   5 โ”‚ J>21.7   GRB 211106A  31069
   6 โ”‚ H>21.1   GRB 211106A  31069
   7 โ”‚ K>19.8.  GRB 211106A  31069
7ร—3 DataFrame
 Row โ”‚ Column1        GRB          GCN    
     โ”‚ String15       SubStrinโ€ฆ    String
   1 โ”‚ g'>24.3        GRB 210905A  30781
   2 โ”‚ r'>24.3        GRB 210905A  30781
   3 โ”‚ i'>23.8        GRB 210905A  30781
   4 โ”‚ z'=21.6+/-0.2  GRB 210905A  30781
   5 โ”‚ J=20.2+/-0.2   GRB 210905A  30781
   6 โ”‚ H=20.1+/-0.2   GRB 210905A  30781
   7 โ”‚ K>18.2.        GRB 210905A  30781
6ร—3 DataFrame
 Row โ”‚ Column1  GRB          GCN    
     โ”‚ String7  SubStrinโ€ฆ    String
   1 โ”‚ g'>23.5  GRB 210901A  30755
   2 โ”‚ r'>23.6  GRB 210901A  30755
   3 โ”‚ i'>22.6  GRB 210901A  30755
   4 โ”‚ J>20.1   GRB 210901A  30755
   5 โ”‚ H>19.7   GRB 210901A  30755
   6 โ”‚ K>16.3.  GRB 210901A  30755
3ร—3 DataFrame
 Row โ”‚ Column1           GRB          GCN    
     โ”‚ String31          SubStrinโ€ฆ    String
   1 โ”‚ g'=20.37+/-0.09   GRB 210822A  30703
   2 โ”‚ r'=20.10+/-0.05   GRB 210822A  30703
   3 โ”‚ i'=19.92+/-0.05.  GRB 210822A  30703
1ร—3 DataFrame
 Row โ”‚ Column1  GRB          GCN    
     โ”‚ String7  SubStrinโ€ฆ    String
   1 โ”‚ g'>22.6  GRB 210820A  30695


7ร—3 DataFrame
 Row โ”‚ Column1             GRB          GCN    
     โ”‚ String31            SubStrinโ€ฆ    String
   1 โ”‚ g'=18.71+/-0.01mag  GRB 210731A  30574
   2 โ”‚ r'=18.44+/-0.01mag  GRB 210731A  30574
   3 โ”‚ i'=18.19+/-0.01mag  GRB 210731A  30574
   4 โ”‚ z'=18.01+/-0.01mag  GRB 210731A  30574
   5 โ”‚ J=17.66+/-0.02mag   GRB 210731A  30574
   6 โ”‚ H=17.38+/-0.02mag   GRB 210731A  30574
   7 โ”‚ K=17.14+/-0.15mag.  GRB 210731A  30574
4ร—3 DataFrame
 Row โ”‚ Column1   GRB          GCN    
     โ”‚ String15  SubStrinโ€ฆ    String
   1 โ”‚ g๏ฟฝ>23.3   GRB 191004A  26324
   2 โ”‚ r๏ฟฝ>23.7   GRB 191004A  26324
   3 โ”‚ i๏ฟฝ>23.1   GRB 191004A  26324
   4 โ”‚ z๏ฟฝ>22.7   GRB 191004A  26324
7ร—3 DataFrame
 Row โ”‚ Column1          GRB        GCN    
     โ”‚ String15         SubStrinโ€ฆ  String
   1 โ”‚ g'=16.81+/-0.03  GRB191016  26176
   2 โ”‚ r'=16.33+/-0.03  GRB191016  26176
   3 โ”‚ i'=15.84+/-0.04  GRB191016  26176
   4 โ”‚ z'=15.51+/-0.04  GRB191016  26176
   5 โ”‚ J=15.28+/-0.05   GRB191016  26176
   6 โ”‚ H=14.80+/-0.05   GRB191016  26176
   7 โ”‚ K=14.83+/-0.08   GRB191016  26176
7ร—3 DataFrame
 Row โ”‚ Column1     GRB          GCN    
     โ”‚ String15    SubStrinโ€ฆ    String
   1 โ”‚ g'>25.5mag  GRB 191024A  26066
   2 โ”‚ r'>25.6mag  GRB 191024A  26066
   3 โ”‚ i'>24.8mag  GRB 191024A  26066
   4 โ”‚ z'>23.4mag  GRB 191024A  26066
   5 โ”‚ J>21.9mag   GRB 191024A  26066
   6 โ”‚ H>21.4mag   GRB 191024A  26066
   7 โ”‚ K>20.2mag   GRB 191024A  26066




7ร—3 DataFrame
 Row โ”‚ Column1     GRB          GCN    
     โ”‚ String15    SubStrinโ€ฆ    String
   1 โ”‚ g'>23.7mag  GRB 191004A  25959
   2 โ”‚ r'>24.2mag  GRB 191004A  25959
   3 โ”‚ i'>23.5mag  GRB 191004A  25959
   4 โ”‚ z'>23.3mag  GRB 191004A  25959
   5 โ”‚ J>21.4mag   GRB 191004A  25959
   6 โ”‚ H>20.4mag   GRB 191004A  25959
   7 โ”‚ K>19.9mag   GRB 191004A  25959





7ร—3 DataFrame
 Row โ”‚ Column1            GRB          GCN    
     โ”‚ String31           SubStrinโ€ฆ    String
   1 โ”‚ g๏ฟฝ๏ฟฝ๏ฟฝ=20.30+/-0.03  GRB 190829A  25569
   2 โ”‚ r๏ฟฝ๏ฟฝ๏ฟฝ=19.34+/-0.03  GRB 190829A  25569
   3 โ”‚ i๏ฟฝ๏ฟฝ๏ฟฝ=18.77+/-0.03  GRB 190829A  25569
   4 โ”‚ z๏ฟฝ๏ฟฝ๏ฟฝ=18.21+/-0.03  GRB 190829A  25569
   5 โ”‚ J=17.34+/-0.06     GRB 190829A  25569
   6 โ”‚ H=16.68+/-0.06     GRB 190829A  25569
   7 โ”‚ Ks=16.40+/-0.08.   GRB 190829A  25569
7ร—3 DataFrame
 Row โ”‚ Column1             GRB          GCN    
     โ”‚ String31            SubStrinโ€ฆ    String
   1 โ”‚ g'=22.79+/-0.05mag  GRB 190613B  24831
   2 โ”‚ r'=22.05+/-0.04mag  GRB 190613B  24831
   3 โ”‚ i'=21.56+/-0.05mag  GRB 190613B  24831
   4 โ”‚ z'=21.15+/-0.07mag  GRB 190613B  24831
   5 โ”‚ J=20.5+/-0.1mag     GRB 190613B  24831
   6 โ”‚ H=20.2+/-0.2mag     GRB 190613B  24831
   7 โ”‚ K>19.8mag           GRB 190613B  24831
7ร—3 DataFrame
 Row โ”‚ Column1          GRB          GCN    
     โ”‚ String15         SubStrinโ€ฆ    String
   1 โ”‚ g<23.0mag        GRB 190129B  23814
   2 โ”‚ r=22.7+/-0.3mag  GRB 190129B  23814
   3 โ”‚ i=21.8+/-0.3mag  GRB 190129B  23814
   4 โ”‚ z=21.7+/-0.4mag  GRB 190129B  23814
   5 โ”‚ J=20.4+/-0.4mag  GRB 190129B  23814
   6 โ”‚ H=19.1+/-0.2mag  GRB 190129B  23814
   7 โ”‚ K=19.0+/-0.4mag  GRB 190129B  23814

You can, of course, adapt the scripts to dig into subsequent pages as well

1 Like

I donโ€™t understand anything about the contents of the tables, but I think that in this form the data is easier to read

julia> urlb="" # page=1&limit=100

julia> using HTTP , CSV, DataFrames, JSON3

julia> using Cascadia, Gumbo

julia> gresp = HTTP.get(urlb);

julia> h = parsehtml(String(gresp.body));

julia> s1=sel"ol li"  # buona questa!
Selector(Cascadia.var"#51#52"{Selector, Selector}(Selector(Cascadia.var"#5#6"{String}("ol")), Selector(Cascadia.var"#5#6"{String}("li"))))

julia> qs = eachmatch(s1,h.root);

julia> res=Tuple{String, String}[]
Tuple{String, String}[]

julia> for q in qs
           if contains(txt,"GROND") && contains(txt,"GRB")
           push!(res, (q.attributes["value"],txt))

julia> function scrapjson(gcn)
           gresp = HTTP.get(grurl)
           he=first(findfirst(r"\n\n^( *g'*)"m,txt))+2
           cltxt=replace(txt[he:lr],' '=>"")
 , DataFrame, header=0)
           ptn=r"(GRB *\d+\w)[:|,]*"
           df.GRB .= match(ptn,js1.subject)[1]
           df.GCN .= gcn
           "Column2" โˆˆ names(df) ? df[:,Not(:Column2)] : df
scrapjson (generic function with 1 method)

julia> df=scrapjson(res[1][1]);

julia> dfnok=DataFrame(GCN=String[])
0ร—1 DataFrame
 Row โ”‚ GCN    
     โ”‚ String

julia> for (gcn, _) in res[2:25]
           try df=vcat(df,scrapjson(gcn), cols=:union) catch e; push!(dfnok,(GCN=gcn,)) end

julia> df.mag=replace.(df.Column1, "'"=>"",'๏ฟฝ'=>"");

julia> df=select(df,[:GCN, :GRB],:mag=>ByRow(x->[x[1],x[2:end]])=>[:tel,:mag1]);

julia> udf=unstack(df,[:GCN, :GRB],:tel,:mag1)
17ร—9 DataFrame
julia> vcat(udf,dfnok,cols=:union)
25ร—9 DataFrame
 Row โ”‚ GCN     GRB          g                 r                 i                 z                 J                 H           โ‹ฏ
     โ”‚ String  SubStrinโ€ฆ?   String?           String?           String?           String?           String?           String?     โ‹ฏ
   1 โ”‚ 32383   GRB 220711B  >23.0             >23.5             >23.2             >19.8             >21.7             >21.1       โ‹ฏ
   2 โ”‚ 32339   GRB 220706A  >25.2             >24.9             >24.2             missing           >21.7             >21.0        
   3 โ”‚ 32304   GRB 220627A  =23.31+/-0.11     =22.70+/-0.06     =22.50+/-0.12     =22.23+/-0.21     >21.4             >20.6        
   4 โ”‚ 31522   GRB 220117A  >23.2             >23.4             >23.0             missing           >21.4             >21.1        
   5 โ”‚ 31069   GRB 211106A  >24.8             >25.0             >24.0             >22.0             >21.7             >21.1       โ‹ฏ
   6 โ”‚ 30781   GRB 210905A  >24.3             >24.3             >23.8             =21.6+/-0.2       =20.2+/-0.2       =20.1+/-0.2  
   7 โ”‚ 30755   GRB 210901A  >23.5             >23.6             >22.6             missing           >20.1             >19.7        
   8 โ”‚ 30703   GRB 210822A  =20.37+/-0.09     =20.10+/-0.05     =19.92+/-0.05.    missing           missing           missing      
   9 โ”‚ 30695   GRB 210820A  >22.6             missing           missing           missing           missing           missing     โ‹ฏ
  10 โ”‚ 30574   GRB 210731A  =18.71+/-0.01mag  =18.44+/-0.01mag  =18.19+/-0.01mag  =18.01+/-0.01mag  =17.66+/-0.02mag  =17.38+/-0.  
  11 โ”‚ 26324   GRB 191004A  >23.3             >23.7             >23.1             >22.7             missing           missing      
  โ‹ฎ  โ”‚   โ‹ฎ          โ‹ฎ              โ‹ฎ                 โ‹ฎ                 โ‹ฎ                 โ‹ฎ                 โ‹ฎ                 โ‹ฎ    โ‹ฑ
  16 โ”‚ 24831   GRB 190613B  =22.79+/-0.05mag  =22.05+/-0.04mag  =21.56+/-0.05mag  =21.15+/-0.07mag  =20.5+/-0.1mag    =20.2+/-0.2  
  17 โ”‚ 23814   GRB 190129B  <23.0mag          =22.7+/-0.3mag    =21.8+/-0.3mag    =21.7+/-0.4mag    =20.4+/-0.4mag    =19.1+/-0.2 โ‹ฏ
  18 โ”‚ 30584   missing      missing           missing           missing           missing           missing           missing      
  19 โ”‚ 26042   missing      missing           missing           missing           missing           missing           missing      
  20 โ”‚ 25992   missing      missing           missing           missing           missing           missing           missing      
  21 โ”‚ 25960   missing      missing           missing           missing           missing           missing           missing     โ‹ฏ
  22 โ”‚ 25791   missing      missing           missing           missing           missing           missing           missing      
  23 โ”‚ 25789   missing      missing           missing           missing           missing           missing           missing      
  24 โ”‚ 25652   missing      missing           missing           missing           missing           missing           missing      
  25 โ”‚ 25651   missing      missing           missing           missing           missing           missing           missing     โ‹ฏ
                                                                                                       2 columns and 4 rows omitted
1 Like

Why you are using JSON3 ? Is there any special advantage over my 12th post code in this discussion ? :face_with_diagonal_mouth: :worried:

JSON3.Object{Base.CodeUnits{UInt8, String}, Vector{UInt64}} with 6 entries:
  :subject    => "GROND observations of GRB 180620B"
  :createdOn  => 1529579936000
  :submitter  => "Patricia Schady at MPE/Swift  <>"
  :circularId => 22819
  :email      => ""
  :body       => "Tassilo Schweyer and Patricia Schady (MPE Garching) report:\n\nWe observed the field of GRB 180620B (Swift trigg

it is not essential, but it seemed to me more convenient to access the โ€œ:subjectโ€ and โ€œ:bodyโ€ (e potresti aggiungere facilmente le info sul :submitter altre) fields to obtain the information to put in the tables