In today’s fast-paced world, web automation has become an indispensable tool for developers, testers, and businesses.

Automating tasks that involve web browsing can greatly enhance efficiency and accuracy.

One of the most popular tools for web automation is Puppeteer, a Node.js library developed by the Chrome team.

In this article, we will delve into a practical example of using Puppeteer to automate web interactions, capture screenshots, and generate PDFs.

Final completed code is shared at the end of the article.

Puppeteer Web Scraping Example


1. Introduction to Puppeteer

Puppeteer is a powerful library that provides a high-level API to control headless or full browsers over the DevTools Protocol.

It is particularly useful for tasks such as web scraping, automating UI tests, taking screenshots, and generating PDFs.

2. Setting Up Puppeteer

Before we dive into the code, make sure you have Node.js installed on your system.

You can install Puppeteer by running the following command:

npm install puppeteer

3. Running a Browser Instance

In this example, we’ll launch a browser instance with specific configurations, such as a non-headless mode, custom viewport size, and slow-motion mode for better visualization during development.

const puppeteer = require("puppeteer");

async function run(){
    const browser = await puppeteer.launch({
        headless: false,
        defaultViewport: { width: 1440, height: 600 },
        devtools: true,
        slowMo: 1000,
        env: 'dev'
    });

    const page = await browser.newPage();
    // Rest of the code...
}

run();

4. Navigating to a Webpage

Once the browser instance is set up, we’ll navigate to a webpage. In this case, we’ll navigate to “https://yahoo.com“.

await page.goto("https://yahoo.com");

5. Retrieving Page Title and Content

You can easily retrieve the title and content of the page using Puppeteer’s API.

const title = await page.title();
const heading = await page.$eval('h1', (element) => element.textContent);

6. Capturing Screenshots

Capturing screenshots can be invaluable for visual verification and debugging.

See also  Get Image Dimensions in JavaScript

Puppeteer allows you to take screenshots of the current page.

await page.screenshot({ path: 'episode3.png' });

7. Generating PDFs

Puppeteer also enables the generation of PDFs from web pages, which can be useful for creating reports or capturing content.

await page.pdf({ path: 'example.pdf', format: 'A4' });

8. Completed Code and Video Tutorial

Here’s the final code and video tutorial to help you with it.

const puppeteer = require("puppeteer");

async function run(){
    // Launch the a browser instance 
    const browser = await puppeteer.launch(
        {
            headless:false, 
            defaultViewport: { width: 1440, height: 600},
            devtools: true,
            slowMo: 1000,
            env: 'dev'
        },
        );

    const page = await browser.newPage();

    await page.goto("https://yahoo.com");

    const title = await page.title();

    const heading = await page.$eval('h1', (element)=> element.textContent);

    await page.screenshot({path: 'episode3.png'});

    await page.pdf({path: 'example.pdf', format: 'A4'});

    await browser.close();
}

run();

Puppeteer empowers developers to automate browser interactions, capture visual data, and generate PDFs effortlessly.

From web scraping to UI testing, its versatile capabilities make it a valuable tool in the modern web development landscape.

In this article, we explored a practical example of using Puppeteer to automate web interactions, retrieve page information, capture screenshots, and generate PDFs.

As you delve deeper into Puppeteer, you’ll discover even more ways to streamline your web automation tasks and enhance your development workflow. Happy coding!

By soorya