axios is a popular JavaScript library that allows you to make HTTP requests from a Node.js environment.
It is a promise-based library that works in both the browser and Node.js.
It is similar to the Fetch API, but with a more powerful feature set and better browser compatibility.
One of the main benefits of using axios is that it automatically transforms the response data into a JSON object,
making it easy to work with.
Axios is known for user-friendly API and support for asynchronous async/await syntax making it very accessible in web scraping.
Mechanize is a Ruby library for automating interaction with websites. It automatically
stores and sends cookies, follows redirects, and can submit forms — making it behave
like a web browser without needing an actual browser engine.
Key features include:
- Automatic cookie management
Stores cookies received from servers and sends them back on subsequent requests,
maintaining session state across multiple pages.
- Form handling
Can find, fill in, and submit HTML forms programmatically. Supports text inputs,
selects, checkboxes, radio buttons, and file uploads.
- Link following
Navigate through pages by clicking links using their text content, CSS selectors,
or href patterns.
- History and back/forward
Maintains a browsing history, allowing you to go back and forward through visited pages.
- HTTP authentication
Supports basic and digest HTTP authentication.
- Proxy support
Can route requests through HTTP proxies.
- Redirect handling
Automatically follows HTTP redirects (configurable).
Mechanize is one of the oldest and most established web interaction libraries in Ruby.
It is best suited for scraping traditional server-rendered websites with forms and
multi-page workflows. For JavaScript-heavy sites, a browser automation tool like
Selenium or Playwright is recommended instead.
```javascript
// axios can be used with promises:
axios.get('http://httpbin.org/json')
.then(response => {
console.log(response.data);
})
.catch(error => {
console.log(error);
});
// or async await syntax:
var resp = await axios.get('http://httpbin.org/json');
console.log(resp.data);
// to make requests concurrently Promise.all function can be used:
const results = await Promise.all([
axios.get('http://httpbin.org/html'),
axios.get('http://httpbin.org/html'),
axios.get('http://httpbin.org/html'),
])
// axios also supports other type of requests like POST and even automatically serialize them:
await axios.post('http://httpbin.org/post', {'query': 'hello world'});
// or formdata
const data = {name: 'John Doe', email: 'johndoe@example.com'};
await axios.post('https://jsonplaceholder.typicode.com/users',
querystring.stringify(data),
{
headers: {
'Content-Type': 'application/x-www-form-urlencoded'
}
}
);
// default values like headers can be configured globally
axios.defaults.headers.common['User-Agent'] = 'webscraping.fyi';
// or for session instance:
const instance = axios.create({
headers: {"User-Agent": "webscraping.fyi"},
})
```
```ruby
require 'mechanize'
agent = Mechanize.new
# Navigate to a page
page = agent.get('https://example.com')
puts page.title
# Find and click a link
page = page.link_with(text: 'Products').click
# Extract data from the page
page.search('.product').each do |product|
name = product.at('.name').text
price = product.at('.price').text
puts "#{name}: #{price}"
end
# Fill in and submit a login form
login_page = agent.get('https://example.com/login')
form = login_page.form_with(action: '/login')
form['username'] = 'user@example.com'
form['password'] = 'password123'
dashboard = agent.submit(form)
# Cookies are maintained automatically
puts dashboard.title # "Dashboard"
# Download a file
agent.get('https://example.com/report.csv').save('report.csv')
```