Web automation and scraping are powerful tools for gathering and interacting with web data. Puppeteer, a Node.js library that provides a high-level API to control headless Chrome or Chromium, is ideal for these tasks. In this guide, we’ll walk through a detailed example of how to use Puppeteer to automate form submissions and capture screenshots of the results.
Introduction
In this tutorial, we will build a script that:
- Opens a web page.
- Enters a search query into a form field.
- Submits the form.
- Waits for the page to load the search results.
- Takes a screenshot of the results page.
Let’s dive into the code and see how this is achieved.
Complete Code – Right away.
Here’s the code we’ll be discussing:
const puppeteer = require("puppeteer");
async function enterFormData(url, searchQuery){
try {
const browser = await puppeteer.launch({headless: false});
const page = await browser.newPage();
await page.goto(url);
await page.focus('input[name="p"]');
await page.keyboard.type(searchQuery);
await page.keyboard.press('Enter');
await page.waitForNavigation({waitUntil: 'networkidle2'});
await page.screenshot({path: 'query-results.png'});
await browser.close();
console.log("Form Data Submitted Successfully");
} catch(error){
console.log(error);
}
}
const url = "https://yahoo.com";
const query = "sunrise";
enterFormData(url, query);
Importing Puppeteer
const puppeteer = require("puppeteer");
The first line imports Puppeteer, which is necessary for launching and controlling the browser. Puppeteer is known for its ability to automate tasks in a web browser, perform web scraping, and more.
Defining the enterFormData
Function
The enterFormData
function performs the following tasks:
- Launching the Browser:
const browser = await puppeteer.launch({headless: false});
The puppeteer.launch()
method starts a new browser instance. The headless: false
option ensures that the browser is visible while running. This is useful for debugging or watching the automation process in real-time. For production or automated tasks, you might set this to true
to run the browser in headless mode, which operates in the background without a GUI.
2. Opening a New Page:
const page = await browser.newPage();
This line creates a new page (or tab) in the browser, where we will perform our interactions.
3. Navigating to the URL:
await page.goto(url);
The page.goto(url)
method navigates to the specified URL. In this example, it opens Yahoo’s homepage.
4. Focusing on the Input Field:
await page.focus('input[name="p"]');
This command focuses on the input field with the name attribute p
. This is the field where we will enter our search query.
5. Typing the Search Query:
await page.keyboard.type(searchQuery);
The page.keyboard.type(searchQuery)
method types the provided search query into the focused input field. This simulates user input into the field.
6. Submitting the Form:
await page.keyboard.press('Enter');
This line simulates pressing the ‘Enter’ key, which submits the form on the webpage. This action triggers the search or form submission process.
7. Waiting for Navigation:
await page.waitForNavigation({waitUntil: 'networkidle2'});
After submitting the form, the page will navigate to the results or next page. The page.waitForNavigation({waitUntil: 'networkidle2'})
method waits for the navigation to complete, ensuring that the page has fully loaded. The networkidle2
option waits until there are no more than 2 network connections for at least 500 ms.
8. Capturing a Screenshot:
await page.screenshot({path: 'query-results.png'});
‘The page.screenshot()
method takes a screenshot of the current page. The path
option specifies the filename where the screenshot will be saved. In this case, it saves the screenshot as query-results.png
.
9. Closing the Browser:
await browser.close();
This line closes the browser instance, freeing up system resources.
10. Handling Errors:
catch(error){ console.log(error); }
The catch
block catches any errors that occur during the execution of the function and logs them to the console.
Running the Function
const url = "https://yahoo.com";
const query = "sunrise";
enterFormData(url, query);
Here, we define the URL and search query, and then call the enterFormData
function with these parameters.
The function performs the tasks described above, resulting in the submission of the search query and the capture of the results page screenshot.
Conclusion
This script demonstrates how to use Puppeteer to automate interactions with a webpage, such as filling out and submitting a form, and capturing the results.
Puppeteer is a powerful tool for web automation and scraping, enabling you to programmatically control a browser and perform a wide range of tasks.
Whether you’re looking to automate repetitive tasks, gather data, or test web applications, Puppeteer offers a flexible and robust solution. By modifying the script to suit your needs, you can automate virtually any web-based task with ease.