Can you scrap for GCN - Circulars - 31493: GRB 220118A: MITSuME Akeno optical observation ? In this case both starting and end position are T0+
.
you should write the resulting table well and post it. You donβt understand anything even with the RAW view
For GCN 31418 table should look like :
T0+[hours] MID-UT T-EXP[sec]5-sigma FILTER limits
22.0 2022-01-08 18:58:13 11820 g' 20.7
22.0 2022-01-08 18:58:13 11820 Rc 20.9
22.0 2022-01-08 18:58:13 11820 Ic 20.1
julia> urlmessy="https://gcn.nasa.gov/circulars/31493/raw"
"https://gcn.nasa.gov/circulars/31493/raw"
julia> txt=String((HTTP.get(urlmessy)))
"HTTP/1.1 200 OK\r\nContent-Type: text/plain;charset=UTF-8\r\nTransfer-Encoding: chunked\r\nConnection: keep-alive\r\nDate: Sat, 29 Jul 2023 17:08:47 GMT\r\nLink: <https://gcn.nasa.gov/circulars/31493>; rel=\"canonical\"\r\nApigw-Requestid: I1gg8imKoAMEJpg=\r\nContent-Encoding: gzip\r\nVary: Accept-Encoding\r\nX-Cache: Miss from cloudfront\r\nVia: 1.1 c651b6f427de520af17b746abf0c7ee6.cloudfront.net (CloudFront)\r\nX-Amz-Cf-Pop: MXP64-P2\r\nX-Amz-Cf-Id: QOaNnew_Vj-jdlhBHyq" β― 2168 bytes β― ">19.9 |\n-----------------------------------------------------------------------------------------------------------------\nT0+ : Elapsed time after the burst\nT-EXP: Total Exposure time\n\nWe used PS1 catalog for flux calibration.\nThe magnitudes are expressed in the AB system.\nThe images were processed in real-time through the MITSuME GPU\nreduction pipeline (Niwano et al. 2021, PASJ, Vol.73, Issue 1, Pages\n4-24; https://github.com/MNiwano/Eclaire)."
julia> he=first(findfirst(r"^\nT0+"im,txt))+1
1651
julia> lr=first(findnext("|\n---",txt,he))
2624
julia> cltxt=txt[he:lr]
"T0+[sec] MID-UT T-EXP[sec] magnitude(or 5-sigma limits) 5-sigma limits\n-----------------------------------------------------------------------------------------------------------------\n56 | 2022-01-18 18:21:34 | 10 |g'>15.9, Rc>16.8,\nIc>16.8 | g'>15.9, Rc>16.8, Ic>16.8 |\n84 | 2022-01-18 18:22:02 | 40 |g'=18.36+/-0.70, Rc=18.57+/-0.78,\nIc>17.5 | g'>17.3, Rc>17.9, Ic>17.5 |\n171 | 2022-01-18 18:23:29 | 50 |g'=17.45+/" β― 75 bytes β― " 2022-01-18 18:24:54 | 60 |g'=17.31+/-0.18, Rc=16.87+/-0.09,\nIc=16.64+/-0.11| g'>17.7, Rc>18.5, Ic>18.1 |\n325 | 2022-01-18 18:26:03 | 60 |g'=17.31+/-0.22, Rc=17.31+/-0.13,\nIc=17.08+/-0.15| g'>17.7, Rc>18.4, Ic>18.1 |\n535 | 2022-01-18 18:29:33 | 300 |g'=18.21+/-0.17, Rc=17.80+/-0.10,\nIc=17.47+/-0.10| g'>18.7, Rc>19.5, Ic>19.1 |\n1471 | 2022-01-18 18:45:09 | 1020 |g'=19.82+/-0.33, Rc=19.26+/-0.14,\nIc=18.81+/-0.15| g'>19.5, Rc>20.3, Ic>19.9 |"
julia> clcltxt=replace(cltxt, r"\n(Ic)"=>s"\1")
"T0+[sec] MID-UT T-EXP[sec] magnitude(or 5-sigma limits) 5-sigma limits\n-----------------------------------------------------------------------------------------------------------------\n56 | 2022-01-18 18:21:34 | 10 |g'>15.9, Rc>16.8,Ic>16.8 | g'>15.9, Rc>16.8, Ic>16.8 |\n84 | 2022-01-18 18:22:02 | 40 |g'=18.36+/-0.70, Rc=18.57+/-0.78,Ic>17.5 | g'>17.3, Rc>17.9, Ic>17.5 |\n171 | 2022-01-18 18:23:29 | 50 |g'=17.45+/-0" β― 68 bytes β― "6 | 2022-01-18 18:24:54 | 60 |g'=17.31+/-0.18, Rc=16.87+/-0.09,Ic=16.64+/-0.11| g'>17.7, Rc>18.5, Ic>18.1 |\n325 | 2022-01-18 18:26:03 | 60 |g'=17.31+/-0.22, Rc=17.31+/-0.13,Ic=17.08+/-0.15| g'>17.7, Rc>18.4, Ic>18.1 |\n535 | 2022-01-18 18:29:33 | 300 |g'=18.21+/-0.17, Rc=17.80+/-0.10,Ic=17.47+/-0.10| g'>18.7, Rc>19.5, Ic>19.1 |\n1471 | 2022-01-18 18:45:09 | 1020 |g'=19.82+/-0.33, Rc=19.26+/-0.14,Ic=18.81+/-0.15| g'>19.5, Rc>20.3, Ic>19.9 |"
julia> df=CSV.read(IOBuffer(clcltxt), DataFrame, delim='|',header=false, skipto=3, dateformat="yyyy-mm-dd HH:MM:SS")
7Γ6 DataFrame
Row β Column1 Column2 Column3 Column4 Column5 Column6
β Int64 DateTime Int64 String String31 Missing
ββββββΌββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
1 β 56 2022-01-18T18:21:34 10 g'>15.9, Rc>16.8,Ic>16.8β¦ g'>15.9, Rc>16.8, Ic>16.8 missing
2 β 84 2022-01-18T18:22:02 40 g'=18.36+/-0.70, Rc=18.57+/-0.78β¦ g'>17.3, Rc>17.9, Ic>17.5 missing
3 β 171 2022-01-18T18:23:29 50 g'=17.45+/-0.20, Rc=16.90+/-0.15β¦ g'>17.5, Rc>18.1, Ic>17.5 missing
4 β 256 2022-01-18T18:24:54 60 g'=17.31+/-0.18, Rc=16.87+/-0.09β¦ g'>17.7, Rc>18.5, Ic>18.1 missing
5 β 325 2022-01-18T18:26:03 60 g'=17.31+/-0.22, Rc=17.31+/-0.13β¦ g'>17.7, Rc>18.4, Ic>18.1 missing
6 β 535 2022-01-18T18:29:33 300 g'=18.21+/-0.17, Rc=17.80+/-0.10β¦ g'>18.7, Rc>19.5, Ic>19.1 missing
7 β 1471 2022-01-18T18:45:09 1020 g'=19.82+/-0.33, Rc=19.26+/-0.14β¦ g'>19.5, Rc>20.3, Ic>19.9 missing
julia> h=readuntil(IOBuffer(cltxt), '\n')
"T0+[sec] MID-UT T-EXP[sec] magnitude(or 5-sigma limits) 5-sigma limits"
julia> hd=["T0+[sec]", "MID-UT", "T-EXP[sec]", "magnitude(or 5-sigma limits)", "5-sigma limits","m"]
6-element Vector{String}:
"T0+[sec]"
"MID-UT"
"T-EXP[sec]"
"magnitude(or 5-sigma limits)"
"5-sigma limits"
"m"
julia> rename!(df,hd)
7Γ6 DataFrame
Row β T0+[sec] MID-UT T-EXP[sec] magnitude(or 5-sigma limits) 5-sigma limits m
β Int64 DateTime Int64 String String31 Missing
ββββββΌββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
1 β 56 2022-01-18T18:21:34 10 g'>15.9, Rc>16.8,Ic>16.8β¦ g'>15.9, Rc>16.8, Ic>16.8 missing
2 β 84 2022-01-18T18:22:02 40 g'=18.36+/-0.70, Rc=18.57+/-0.78β¦ g'>17.3, Rc>17.9, Ic>17.5 missing
3 β 171 2022-01-18T18:23:29 50 g'=17.45+/-0.20, Rc=16.90+/-0.15β¦ g'>17.5, Rc>18.1, Ic>17.5 missing
4 β 256 2022-01-18T18:24:54 60 g'=17.31+/-0.18, Rc=16.87+/-0.09β¦ g'>17.7, Rc>18.5, Ic>18.1 missing
5 β 325 2022-01-18T18:26:03 60 g'=17.31+/-0.22, Rc=17.31+/-0.13β¦ g'>17.7, Rc>18.4, Ic>18.1 missing
6 β 535 2022-01-18T18:29:33 300 g'=18.21+/-0.17, Rc=17.80+/-0.10β¦ g'>18.7, Rc>19.5, Ic>19.1 missing
7 β 1471 2022-01-18T18:45:09 1020 g'=19.82+/-0.33, Rc=19.26+/-0.14β¦ g'>19.5, Rc>20.3, Ic>19.9 missing
In regular expression r"^\nT0+"im
you are already using ^
(which stands for starting of line) so why do you need \n
? For my code it is not working.
using HTTP, CSV, DataFrames
function doanalysis()
dfg=nothing
for x in 31493
print("\r peeking at GCN $x ")
try
url = "https://gcn.nasa.gov/circulars/$x/raw"
resp = HTTP.get(url)
status=resp.status
print(" ",status," ");
if status == 404 ; println("status=",status); continue; end
txt = String(resp.body)
if occursin(r"GRB ?\d{6}([A-G]|(\.\d{2}))?",txt)
m=match(r"GRB ?\d{6}([A-G]|(\.\d{2}))?",txt)
print(m.match)
end
if occursin("MITSuME", txt)
println(" MITSuME report")
he=first(findfirst(r"^\nT0+"im,txt))+1
lr=first(findnext("|\n---",txt,he))
cltxt=replace(txt[he:lr], ","=>" ",r"(οΏ½+)"=>" " , ">"=>" ", "-"=>" ")
df=CSV.read(IOBuffer(cltxt), DataFrame, delim=" ")
df.GCN=[x for i in 1:nrow(df)]
df.GRB=[m.match for i in 1:nrow(df)]
if isnothing(dfg)
@show dfg=df
else
@show dfg=vcat(dfg,df)
end # if x is first
end # if occursin
catch e
println("error at try")
end # trycatch
end # for loop
end
doanalysis()
@rocco_sprmnt21 I want to combine GCNs of different telescopes in a single file so i donβt want to switch to use your code containing JSON3
.
@rocco_sprmnt21
Following code is not able to scrap table from GCN due to occurrence of J. Bolmer
on upper line.
See code
using HTTP , CSV, DataFrames
function doanalysis()
dfg=nothing
for x in 26176
print("\r peeking at GCN $x ")
try
url = "https://gcn.nasa.gov/circulars/$x/raw"
resp = HTTP.get(url)
status=resp.status
print(" ",status," ");
if status == 404 ; println("status=",status); continue; end
txt = String(resp.body)
if occursin(r"GRB ?\d{6}([A-G]|(\.\d{2}))?",txt)
m=match(r"GRB ?\d{6}([A-G]|(\.\d{2}))?",txt)
print(m.match)
end
if occursin("GROND", txt)
println(" GROND report")
he=first(findfirst(r"^(g'|r'|i'|z'|J|H|K)"m,txt))
lr=first(findnext(r"^(((?:[\t ]*(?:\r?\n|\r))+)|(The)|(Given))"m,txt,he))-1
cltxt=replace(txt[he:lr], "mag"=>" ",","=>"|",r" ?(=|>)"=>"|" , "+/-"=>"|","οΏ½"=>" ")
df=CSV.read(IOBuffer(cltxt), DataFrame, delim="|" ,header=0)
df.GCN=[x for i in 1:nrow(df)]
df.GRB=[m.match for i in 1:nrow(df)]
rename!(df,"Column1" => "Filter","Column2" => "Mag","Column3" => "Mag_err")
if isnothing(dfg)
@show dfg=df
else
@show dfg=vcat(dfg,df)
end # if x is first
end # if occursin
catch e
println("error ")
end # trycatch
end # for loop
end
doanalysis()
Above code is also not working for GCN-20843
using HTTP , CSV, DataFrames
function doanalysis()
dfg=nothing
for x in 26066
print("\r peeking at GCN $x ")
try
url = "https://gcn.nasa.gov/circulars/$x/raw"
resp = HTTP.get(url)
status=resp.status
print(" ",status," ");
if status == 404 ; println("status=",status); continue; end
txt = String(resp.body)
if occursin(r"GRB ?\d{6}([A-G]|(\.\d{2}))?",txt)
m=match(r"GRB ?\d{6}([A-G]|(\.\d{2}))?",txt)
print(m.match)
end
if occursin("GROND", txt)
println(" GROND report")
he=first(findfirst(r"^(g'|r'|i'|z'|J[^\.]|H|K)"m,txt))
lr=first(findnext(r"^(((?:[\t ]*(?:\r?\n|\r))+)|(The)|(Given))"m,txt,he))-1
cltxt=replace(txt[he:lr], "mag"=>" ",","=>"|",r" ?(=|>)"=>"|" , "+/-"=>"|","οΏ½"=>" ")
df=CSV.read(IOBuffer(cltxt), DataFrame, delim="|" ,header=0)
df.GCN=[x for i in 1:nrow(df)]
df.GRB=[m.match for i in 1:nrow(df)]
rename!(df,"Column1" => "Filter","Column2" => "Mag")# ,"Column3" => "Mag_err")
if isnothing(dfg)
@show dfg=df
else
@show dfg=vcat(dfg,df)
end # if x is first
end # if occursin
catch e
println("error ")
end # trycatch
end # for loop
end
doanalysis()
about the other page : Page not found
PS
Excuse me, could you explain what is the meaning of this work?
since retrieving this data is so messy, isnβt it better if you do it by hand?
I have to scrap thousands of web pages.
I want to compile data so that it can be used in research.
Your code is not working.
Other page is 20843
se vuoi essere aiutatao non basta dire il codice non funziona, devi almeno mostrare dove.
Ti avevo giΓ consigliato di eliminare tutti i controlli in fase di test e lasciare solo le espressioni principali che catturano il testo.
Eβ scomparsa la pagina!
https://gcn.nasa.gov/circulars/26066/raw
quindi le pagine, oltre ad avere un formato variabile, ogni tanto, si nascondo?
Please type in English. GCN web site has removed the option of raw
. Now they have placed Text
icon.