Harbest.jl v0.4
Harbest.jl is a package that allows to perform simple web scraping
Whatโs new?
- html_table function
- different and cleaner html_text3 output
- html_elements function now allows Vector[String} as an argument
- actual documentation
- improved docstrings
Example
using Harbest
starwars = read_html("https://rvest.tidyverse.org/articles/starwars.html")
titles = html_elements(starwars, ["section", "h2"]) |> html_text3
titles
# 7-element Vector{String}:
# "The Phantom Menace"
# "Attack of the Clones"
# "Revenge of the Sith"
# โฎ
# "Return of the Jedi"
# "The Force Awakens"
html = read_html("https://en.wikipedia.org/w/index.php?title=The_Lego_Movie&oldid=998422565")
table = html_elements(html, ".tracklist") |> html_table
table
# 28ร4 DataFrame
# Row โ No. Title Performer(s) Length
# โ String String String String
# โโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
# 1 โ 1. "Everything Is Awesome" Tegan and Sara featuring The Lonโฆ 2:43
# 2 โ 2. "Prologue" 2:28
# 3 โ 3. "Emmett's Morning" 2:00
# 4 โ 4. "Emmett Falls in Love" 1:11
# 5 โ 5. "Escape" 3:26
# โฎ โ โฎ โฎ โฎ โฎ
# 25 โ 25. "Everything Is Awesome" Jo Li (Joshua Bartholomew and Liโฆ 1:26
# 26 โ 26. "Everything Is Awesome (unpluggeโฆ Shawn Patterson and Sammy Allen 1:24
# 27 โ 27. "Untitled Self Portrait" Will Arnett 1:08
# 28 โ 28. "Everything Is Awesome (instrumeโฆ 2:41
# 19 rows omitted