Puppeteer Tutorial #7 - Get the Source Code Of Any Webpage

In the realm of web development, understanding the source code of a webpage can be an invaluable skill for debugging, analyzing, and learning from existing websites.

Puppeteer, a powerful Node.js library developed by the Chrome team, offers a simple and effective method to retrieve the source code of a webpage.

In this tutorial, we’ll explore how to use Puppeteer to extract and save the HTML source code of a webpage for further analysis.

Being able to access the HTML source code of a webpage is a valuable asset for developers, designers, and anyone curious about how websites are constructed.

Puppeteer simplifies this process by allowing you to automate the retrieval of source code from any webpage.

The final completed source code is given at the end of the article.

1. Setting Up Puppeteer

Before diving into the code, ensure you have Node.js installed on your system.

You can install Puppeteer using the following command:

npm install puppeteer

2. Retrieving HTML Source Code

Let’s start by creating a function that takes a URL and an output file name as parameters.

This function will launch a browser instance, navigate to the specified URL, retrieve the HTML source code, and then close the browser.

const puppeteer = require("puppeteer");
const fs = require("fs");

async function getSourceCode(url, outputData) {
    try {
        const browser = await puppeteer.launch({ headless: false });
        const page = await browser.newPage();

        await page.goto(url);

        const sourceCode = await page.content();

        // Rest of the code...
    } catch (error) {
        console.error("Error getting source code of the URL");
    }
}

const url = "https://example.com";
const outputData = "source_code.html";

getSourceCode(url, outputData);

3. Saving Source Code to File

Continuing from where we left off, we’ll extend our function to save the retrieved HTML source code to a file.

This step ensures that you have a permanent copy for further analysis and reference.

// ...
fs.writeFileSync(outputData, sourceCode, "utf-8");

await browser.close();
console.log("Successfully executed the source code of the URL");

4. Final Complete Source Code

The final complete source code is shared below for your reference.

const puppeteer = require("puppeteer");
const fs = require("fs");

async function getSourceCode(url, outputData){
    try {
        const browser = await puppeteer.launch({headless: false});
        
        const page = await browser.newPage();

        await page.goto(url);

        const sourceCode = await page.content();

        fs.writeFileSync(outputData, sourceCode, "utf-8");
        
        await browser.close();

        console.log("Successfully executed the source code of the URL");

    } catch(error){
        console.error("Error getting source code of the url");
    }
}

const url = "https://example.com";
const outputData = "source_code.html";

getSourceCode(url, outputData);

Puppeteer empowers developers to automate tasks that were once time-consuming, such as retrieving web source code.

In this tutorial, we learned how to set up Puppeteer, extract the HTML source code of a webpage, and save it to a file for future use.

Whether you’re debugging a website, analyzing the structure of a page, or learning from existing sites, Puppeteer’s capabilities can significantly enhance your development workflow.

As you continue to explore Puppeteer’s functionalities, you’ll uncover a world of automation possibilities that can streamline your web development tasks and enhance your understanding of the web ecosystem.

Happy coding and exploring!

Puppeteer Tutorial #7 – Get the Source Code Of Any Webpage

1. Setting Up Puppeteer

2. Retrieving HTML Source Code

3. Saving Source Code to File

Related

1. Setting Up Puppeteer

2. Retrieving HTML Source Code

3. Saving Source Code to File

Share this:

Related