Web Scraping
Getting data off the Internet
Friday, March 5, 2021 · 5 - 7 PM
Tonight's meeting will be about web scraping. Web scraping is basically anything that consists of using automated software ("bots") to get data off of a website.
There's a few rough categories of web scraping, in order from more elegant and "legitimate" to more hacky and sneaky:
1) Hitting a website's public API, probably a JSON API. We'll use Python and the Requests library to do this.
2) Hitting a website's internal/undocumented API.
3) Making requests and extracting data from the HTML. Probably the most proper definition of web "scraping". We'll use Beautiful Soup to do this.
4) Simulating an actual user/browser interacting with the page. Selenium is the main tool for this; we won't cover it in the workshop, but we'll talk about it a bit.
Probably best to come with Python installed! VPN users, we'll have containers set up for you on Greenbank, but there's a limited amount, so local Python is probably preferable.
Meeting link, as always: https://meet.jit.si/SADClubMeetingSpring2021