[ANN] Harbest.jl - Simple web scraping with Julia

Hello people!

I’m making a package, mostly as a personal tool but I just publish it.

It’s kind of a port of Rvest from R, but it will extend the functionality way more, making new and cool functions.

Currently, I see it mostly as sugar code in the sense that combines the funcionality of HTTP, Cascadia and Gumbo in a different syntax.

You can see the documentation here

You can install it using Pkg.add("Harbest")

I’m gonna be making better documentation, better and new functions very soon!

11 Likes

Awesome. Is there any possibility to download images at the page automatically, in future?

Yes!
I’ll be thinking on a nice implementation to do that.
Thanks!

Can your package pulls ski resort data?
I tried here:

Yes! Easily

For example, with the following code, you’ll get the lifts, terrain and trails open (from the first link you sent)

using Harbest

html = read_html("https://www.parkcitymountain.com/the-mountain/mountain-conditions/terrain-and-lift-status.aspx")

data = html_elements(html, ".c118__number1--v1")

# data[1] is the amount of lifts, data[2] and data[3] are the rest
lifts_open = html_text3(data[1]) ## "38"
1 Like

Nice. Let me know if you find any issues with Cascadia that blocks you. I’ve long thought that some syntax sugar on Cascadia/Gumbo would be useful. I’ve considered extending Cascadia with it, but happy to see it in a different package.

Regards

Avik

1 Like