Pull data from websites in Julia

As ski resorts begin to open, they update data on their websites throughout the day, every day.
Two factors skiiers care about are: (1) # trails currently open, (2) current Snow base

Each resort website posts this info:
Park City trails: image
Park City snow: image

Vail Trails:image
Vail Snow: image

Whistler Trails: image
Whistler Snow: image

Can I automatically pull this data from the links above using Julia?
(skicentral.com does this, but not particularly well & I’d like to learn how to do it in Julia if possible)

Scraping with Julia: Scraping web pages with Julia HTTP & Gumbo: Tutorial
Someone’s scraping project for ski snow reports, but not Julia: Web Scraping for custom API - DEV Community 👩‍💻👨‍💻

You should be able to combine parts from both to get what you want.

Scraping is notoriously fussy and fragile.

You can get more results at once from Snow Report | Colorado Ski Country USA, for example

2 Likes

Here is my very raw, very naive attempt.
What’s amazing is I have zero experience “webscraping” & I still got it to work.

using HTTP;
l_pc ="https://www.parkcitymountain.com/the-mountain/mountain-conditions/terrain-and-lift-status.aspx";
l_v  ="https://www.vail.com/the-mountain/mountain-conditions/terrain-and-lift-status.aspx";
l_w  = "https://www.whistlerblackcomb.com/the-mountain/mountain-conditions/terrain-and-lift-status.aspx";

# Number of "runs" open in Park City
r  = HTTP.get(l_pc); # get link 
rs = String(r.body);  # make into long string 
a1="id=\"runs\">\r\n\r\n                                            <div class=\"terrain_summary__circle\"\r\n                                                    data-open="
a2=findfirst(a1, rs) # findall 
a3 = rs[( a2[end] +2 ) : ( a2[end] +4)] 
#################

# Number of "runs" open in  Vail 
r  = HTTP.get(l_v); # get link 
rs = String(r.body);  # make into long string 
a1="id=\"runs\">\r\n\r\n                                            <div class=\"terrain_summary__circle\"\r\n                                                    data-open="
a2=findfirst(a1, rs) # findall 
a3 = rs[( a2[end] +2 ) : ( a2[end] +4)] 
#################

# Number of "runs" open in Whistler
r  = HTTP.get(l_w); # get link 
rs = String(r.body);  # make into long string 
a1="id=\"runs\">\r\n\r\n                                            <div class=\"terrain_summary__circle\"\r\n                                                    data-open="
a2=findfirst(a1, rs) # findall 
a3 = rs[( a2[end] +2 ) : ( a2[end] +4)] 
#################
1 Like

All three of those are owned by Vail Resorts, so it’s not too surprising that the snow report html is similar. You’ll need something different for other resorts.

1 Like

Yeah, I’m currently on Epic pass…
Here is a cleaner way to do it w/ a String. (still no good & today morning I didn’t know what “webscraping” meant)

using HTTP;
l_pc ="https://www.parkcitymountain.com/the-mountain/mountain-conditions/terrain-and-lift-status.aspx";
l_v  ="https://www.vail.com/the-mountain/mountain-conditions/terrain-and-lift-status.aspx";
l_w  = "https://www.whistlerblackcomb.com/the-mountain/mountain-conditions/terrain-and-lift-status.aspx";

#location of string w/ # Runs open.
a1="id=\"runs\">\r\n\r\n                                            <div class=\"terrain_summary__circle\"\r\n                                                    data-open="
RUNS = [];

for resort in [l_pc l_v l_w] 
    r  = HTTP.get(resort); # get link
    rs = String(r.body);  # make into string 
    a2=findfirst(a1, rs) # findall get index w/ Number of Runs open
    num_runs_open = rs[( a2[end] +2 ) : ( a2[end] +4)]  # get # Runs open 
    push!(RUNS, num_runs_open)
end 

julia> RUNS
3-element Vector{Any}:
 "120"
 "86\""
 "30\""
1 Like

Here’s a little more robust solution using an HTML/XML parser. The heavy lifting is the "//div"... line, which is an XPath query to search XML - in this case finding a div with the attribute data-terrain-status-id equal to "runs" then selecting the next child element.

using EzXML
using HTTP

function get_open(url)
    r = HTTP.get(url)
    tree = EzXML.parsehtml(r.body)
    node = findfirst("//div[@data-terrain-status-id=\"runs\"]/div", tree)
    val = node["data-open"]
    return parse(Int, val)
end
3 Likes

Try Gumbo.jl. It parses HTML and gives you structured access to the elements.

Here is an example from the time of the coronaplotting craze.

1 Like

@chris-b1 thanks!

using HTTP, EzXML;
s1 = "https://www.";
names   = ["parkcitymountain" "vail" "whistlerblackcomb" "beavercreek" "breckenridge" "northstarcalifornia"];
sruns = ".com/the-mountain/mountain-conditions/terrain-and-lift-status.aspx"
ssnow = ".com/the-mountain/mountain-conditions/snow-and-weather-report.aspx"
RUNS = [];
for resort in names
    link_runs = s1 * resort * sruns 
    r  = HTTP.get(link_runs)
    tree = EzXML.parsehtml(r.body)
    #
    node_runs = findfirst("//div[@data-terrain-status-id=\"runs\"]/div", tree)
    num_runsopen  = node_runs["data-open"]   |> x -> parse(Int, x)
    num_runstotal = node_runs["data-total"]  |> x -> parse(Int, x)
    #
    push!(RUNS, [resort num_runsopen num_runstotal])
    #
    link_snow = s1 * resort * ssnow 
    r  = HTTP.get(link_snow)
    tree = EzXML.parsehtml(r.body)
    # How do we get: snowfall.Depth.Inches etc???
end 
pushfirst!(RUNS, ["Resort" "Runs Open" "Runs Total"])
RUNS = vcat(RUNS...)

Gives
image

I wonder if there is an easy way to scrape all the ski resort websites from https://www.epicpass.com/?

Also it looks more complicated to scrape snowfall “BASE DEPTH” & “CURRENT SEASON”.
(not sure the approach used for RUNS above would work…)

1 Like

Try scraping for the individual links at Snow and Weather Reports | Snow.com

1 Like

Also it looks more complicated to scrape snowfall “BASE DEPTH” & “CURRENT SEASON”.
(not sure the approach used for RUNS above would work…)

A trick which can be helpful, if is you select the element in Chrome (right click, Inspect) there is an option to copy an XPath query to an element. Sometimes you’ll want to clean up or modify that query, but it can be a helpful starting point.

//*[@id="snow_report_1"]/div[2]/ul/li[6]/div/h5/text()

image

2 Likes

mehhh, no luck pulling snow data (base depth & current season), but thanks anyways…