As ski resorts begin to open, they update data on their websites throughout the day, every day.
Two factors skiiers care about are: (1) # trails currently open, (2) current Snow base
Can I automatically pull this data from the links above using Julia?
(skicentral.com does this, but not particularly well & I’d like to learn how to do it in Julia if possible)
Here is my very raw, very naive attempt.
What’s amazing is I have zero experience “webscraping” & I still got it to work.
using HTTP;
l_pc ="https://www.parkcitymountain.com/the-mountain/mountain-conditions/terrain-and-lift-status.aspx";
l_v ="https://www.vail.com/the-mountain/mountain-conditions/terrain-and-lift-status.aspx";
l_w = "https://www.whistlerblackcomb.com/the-mountain/mountain-conditions/terrain-and-lift-status.aspx";
# Number of "runs" open in Park City
r = HTTP.get(l_pc); # get link
rs = String(r.body); # make into long string
a1="id=\"runs\">\r\n\r\n <div class=\"terrain_summary__circle\"\r\n data-open="
a2=findfirst(a1, rs) # findall
a3 = rs[( a2[end] +2 ) : ( a2[end] +4)]
#################
# Number of "runs" open in Vail
r = HTTP.get(l_v); # get link
rs = String(r.body); # make into long string
a1="id=\"runs\">\r\n\r\n <div class=\"terrain_summary__circle\"\r\n data-open="
a2=findfirst(a1, rs) # findall
a3 = rs[( a2[end] +2 ) : ( a2[end] +4)]
#################
# Number of "runs" open in Whistler
r = HTTP.get(l_w); # get link
rs = String(r.body); # make into long string
a1="id=\"runs\">\r\n\r\n <div class=\"terrain_summary__circle\"\r\n data-open="
a2=findfirst(a1, rs) # findall
a3 = rs[( a2[end] +2 ) : ( a2[end] +4)]
#################
All three of those are owned by Vail Resorts, so it’s not too surprising that the snow report html is similar. You’ll need something different for other resorts.
Yeah, I’m currently on Epic pass…
Here is a cleaner way to do it w/ a String. (still no good & today morning I didn’t know what “webscraping” meant)
using HTTP;
l_pc ="https://www.parkcitymountain.com/the-mountain/mountain-conditions/terrain-and-lift-status.aspx";
l_v ="https://www.vail.com/the-mountain/mountain-conditions/terrain-and-lift-status.aspx";
l_w = "https://www.whistlerblackcomb.com/the-mountain/mountain-conditions/terrain-and-lift-status.aspx";
#location of string w/ # Runs open.
a1="id=\"runs\">\r\n\r\n <div class=\"terrain_summary__circle\"\r\n data-open="
RUNS = [];
for resort in [l_pc l_v l_w]
r = HTTP.get(resort); # get link
rs = String(r.body); # make into string
a2=findfirst(a1, rs) # findall get index w/ Number of Runs open
num_runs_open = rs[( a2[end] +2 ) : ( a2[end] +4)] # get # Runs open
push!(RUNS, num_runs_open)
end
julia> RUNS
3-element Vector{Any}:
"120"
"86\""
"30\""
Here’s a little more robust solution using an HTML/XML parser. The heavy lifting is the "//div"... line, which is an XPath query to search XML - in this case finding a div with the attribute data-terrain-status-id equal to "runs" then selecting the next child element.
using EzXML
using HTTP
function get_open(url)
r = HTTP.get(url)
tree = EzXML.parsehtml(r.body)
node = findfirst("//div[@data-terrain-status-id=\"runs\"]/div", tree)
val = node["data-open"]
return parse(Int, val)
end
Also it looks more complicated to scrape snowfall “BASE DEPTH” & “CURRENT SEASON”.
(not sure the approach used for RUNS above would work…)
A trick which can be helpful, if is you select the element in Chrome (right click, Inspect) there is an option to copy an XPath query to an element. Sometimes you’ll want to clean up or modify that query, but it can be a helpful starting point.