Extracting Views, likes and dislikes by webscraping youtube Juliacon 2019

xiaodai · August 2, 2019, 2:49am

I am in the process of writing a blogpost about Juliacon 2019. I was looking into how I can webscrape data from all 111 Juliacon youtube videos by extracting the title, views, likes, and dislikes figures.

I couldn’t figure it out, so I reached for R’s rvest. With code below

library(rvest)
library(RSelenium)

rs = RSelenium::rsDriver(browser = "chrome", port=4567L)

rsc = rs$client

rsc$navigate("https://www.youtube.com/playlist?list=PLP8iPy9hna6StY9tIJIUN3F_co9A0zh0H")

ht = rsc$getPageSource()

ok <- xml2::read_html(ht[[1]])

ok %>%
  html_nodes("h3.style-scope.ytd-playlist-video-renderer") %>%
  html_text() -> texts

length(texts)

titles = texts %>%
  strsplit("[|]") %>%
  purrr::map_chr(~ifelse(length(.x) == 1, .x, .x[2]) %>% trimws)


urls = ok %>% 
  html_nodes("a.yt-simple-endpoint.style-scope.ytd-playlist-video-renderer") %>%
  html_attr("href")



# go to the url 
get_views <- function(url) {
  rsc$navigate(paste0("https://www.youtube.com", url))
  
  ht2 = rsc$getPageSource()
  
  ok2 <- xml2::read_html(ht2[[1]])
  
  ok2 %>% 
    html_nodes("div.style-scope.ytd-menu-renderer a.yt-simple-endpoint.style-scope") %>%
    html_text %>%
    strsplit("\n") %>%
    purrr::map(~.x[1]) %>%
    unlist -> ok3
  
  like_dislike = as.integer(ok3[1:2])
  
  views = ok2 %>%
    html_node("span.view-count.style-scope.yt-view-count-renderer") %>%
    html_text %>%
    strsplit(" ")
  
  views = stringr::str_remove(views[[1]][1], ",") %>% as.integer
  
  data.table::data.table(views = views, likes = like_dislike[1], disklikes = like_dislike[2])
}

library(data.table)
pt = proc.time()
the_data = purrr::map_dfr(urls, get_views)
print(timetaken(pt))

Ideally, I want to make the the whole thing in Julia. So happpy for someone to chime and show how it can be done with Julia’s Gumbo.jl etc, but otherwise I will just make the webscraping using RCall.jl.

Topic		Replies	Views
YouTube views and likes Community youtube	9	1899	April 23, 2019
Youtube searches with HTTP.jl New to Julia	7	1012	July 30, 2019
[ANN] Harbest.jl - Simple web scraping with Julia Package Announcements	5	1136	December 25, 2022
Text Mining: Detect Strings: Word Lookup in a Large Corpus of Phrases Using a Large Dictionary Performance question	27	2195	December 15, 2021
Extracting hashtags from text: Flattening in Query.jl New to Julia	3	510	December 14, 2020

Extracting Views, likes and dislikes by webscraping youtube Juliacon 2019

Related topics