You can use vals to web scrape websites.
Either by fetching HTML and using a parsing library, or by making an API call to an external service that runs a headless browser for you.
- Locate the HTML element that contains the data you need
- To parse and search HTML, use the cheerio library
Locate the HTML element that contains the data you need
Right click on the section of a website that contains the data you want to fetch and then inspect the element. In Chrome, the option is called Inspect and it highlights the HTML element in the Chrome DevTools panel.
For example, to scrape the introduction paragraph of the OpenAI page on Wikipedia, inspect the first word of the first paragraph.
In the Elements tab, look for the the data you need and right click the parent element and choose Copy selector to get the CSS selector:
#mw-content-text > div.mw-parser-output > p:nth-child(7).
If you know a little CSS, it’s better to write your own selector by hand to make it more generic. This can make the selector less vulnerable to the website being updated and, e.g., class names changing. In this case, you can use
document.querySelector in the Console to check your CSS selector is correct.