Extracting information from https://caps.fool.com/Ticker/MSFT.aspx

I am trying to extract information from Caps run by Fool. However, using Gumbo and Cascadia, the html page appears to contain no useful information (and certainly not what I am after in a direct way). How can I get the information that is shown in my browser (for example, the Caps rating of 4)?

Here is my my code:

using Gumbo
using Cascadia

url = "https://caps.fool.com/Ticker/MSFT.aspx"

page = parsehtml(read(download(url), String))

Does the example in the Cascadia readme not work for you?

julia> using Cascadia, Gumbo, HTTP

julia> r = HTTP.get("https://caps.fool.com/Ticker/MSFT.aspx");

julia> h = parsehtml(String(r.body));

julia> qs = eachmatch(Selector("#tickerRating"), h.root)
1-element Vector{HTMLNode}:
 HTMLElement{:div}:<div class="subtle marginT" id="tickerRating">
  CAPS  Rating:
  <img alt="4 out of 5" class="capsStarRating" id="ctl00_ctl00_ctl00_ctl00_cphContent_cphContent_cphContent_cphCrossBar_TickerHeader_ctlLargeImageStars" src="https://g.foolcdn.com/art/ratings/stars/trans/4stars-trans-lg.png" title="4 Stars: Favorite"/>
</div>

Where it seemed from cursory inspection of the page source that the star rating is in the tickerRating div. You can then split the result like

julia> first(split(split(string(qs[1]), "title=\"")[2], ":"))
"4 Stars"
1 Like

What if I wanted to access information further down on the website. For example, the information that of “All Players” 14,714 think MSFT will “Outperform” while 2,044 think it will “Underperform”. I am stuck “digging in” to the website beyond .root.

julia> eachmatch(Selector(".jointSentimentGroup"), h.root)[1]
HTMLElement{:div}:<div class="jointSentimentGroup">
  <div class="perform out">
    <span>
      14,716
    </span>
    <span class="legend">
      Outperform
    </span>
  </div>
  <div class="perform under">
    <span>
      2,044
    </span>
    <span class="legend">
      Underperform
    </span>
  </div>
  <div class="sentimentBar underperformBar">
    <div class="outperformBar" style="width:87.80429594272076372315035800%">
    </div>
...

I don’t think there’s any magic here, just Ctrl+F for the information you’re after in the site’s source code and then check whether there’s a CSS element that you can select which contains what you need.

1 Like

I See. Thank you!

On a very similar note, I am trying to extract the “SmartScore” for MSFT (it’s the 8) on the following URL:

However, this one is tougher, I think. I find the source in the website code (see the diagram) but I can’t seem to extract it. Some help would again be appreciated.