Languages

For web scraping to be possible we only need two types of tools: HTTP client and HTML parser. Most programming languages have libraries for both however, some have better existing tools than others.

Which language is the best?

Web scraping is a data subject so naturally, languages used in data programming are a great fit. Additionally, since the scaling bottleneck is IO-blocking (e.g. waiting for request to complete), features like asynchronous support or easy threading are very valuable for scaling up web scrapers.

Python is the most popular language used for web scraping as it's a great data language with many great built-in and community tools used in web scraping. Javascript is becoming quite popular too through the virtue of web use.

That being said, almost any programming language can be used for web scraping with great success as long as HTTP client and HTML parser libraries are available.

HTTP Clients

For HTTP clients, we need 3 important features:

HTTP v2+ support - as most real world traffic goes through http2 or http3 if we scrape using http1 we stand out and are easy to be blocked.
Asynchronous support - the biggest scaling problem in web scraping is IO blocking, so asynchronous programming or accessible threading is important for scaling up web scrapers.
Stability - the web is huge and complex - there are so many things that can go wrong. So, having a client that follows RFC standards and behaves as closely as a real web browser will prevent scraper from being blocked.

Based on these 3 virtues, here's an ordered list of HTTP clients in popular programming languages:

language	client	highlights
Python	httpx	feature-rich, http2, async, http-proxy, socks-proxy
	requests	ease of use, http-proxy, socks-proxy
Go	req	feature-rich, http2, http3, http-proxy, socks-proxy
	resty	feature-rich, http2, http-proxy
Ruby	typhoeus	uses-curl, concurrency
	faraday	ease-of-use, can adapt typhoeus
PHP	guzzle	uses-curl, concurrency
	symfony-http	uses-curl, concurrency
R	crul	uses-curl, concurrency
	httr	uses-curl, concurrency
Nim	puppy	uses-curl winhttp or appkit, http-proxy
Rust	hurl	uses-curl
NodeJS	axios	feature-rich, async, http-proxy, socks-proxy

* uses-curl - all libraries that use curl inherit it's features like http/socks proxies etc.

HTML Parsers

Not all web scrapers work with HTML but generally, we need some HTML parsing and most programming languages have some sort of XML/HTML parser available. However, there are a few important features we need to look out for:

CSS selectors - is the most common way to parse HTML and XML documents. It's the same language used to select elements to apply css styles.
XPath selectors - like CSS selectors but significantly more powerful. You want XPath if you're working with heavy HTML pages.
Speed, Stability and extras.

Based on these virtues, here's an ordered list of HTML parsing libraries in popular programming languages.

Language	XPath Library
Python	parsel lxml
Go	htmlquery gokogiri
PHP	dom-crawler DiDom
Rust	sxd-xpath
Ruby	nokogiri
R	rvest

Language	CSS Selector Library
Python	parsel beautifulsoup lxml pyquery
Go	goquery cascadia
Rust	scraper soup
PHP	dom-crawler DiDom
Ruby	nokogiri
R	rvest
NodeJS	cheerio

JSON Parsers

Modern web scraping scrapes JSON almost as often as HTML these days and every language has JSON-like native structure (hashtables, dictionaries etc.), so parsing JSON is rarely a note worthy challenge. However, there are a few powerful tools that should not be overlooked:

JMESPath
Powerful path language (like XPath or CSS selectors) for JSON. Very popular and has implementations in most languages used in web scraping.
JSONPath
XPath-like path language for JSON with key ability to select any descendant values (like XPath's //). This is a great tool for parsing big, heavily nested JSON datasets.
jq
The most popular json query language and util. The domain specific language that can be difficult to learn but it's very powerful. Unfortunately there aren't many many client library implementations - it's more of a standalone tool. See also jqt

There are many more JSON parsing libraries and tools with various extra features like type validation etc but these 3 are the most popular ones used in web scraping.