Get the Source Code Of Any Webpage using Puppeteer

In the realm of web development, understanding the source code of a webpage can be an invaluable skill for debugging, analyzing, and learning from existing websites.

Puppeteer, a powerful Node.js library developed by the Chrome team, offers a simple and effective method to retrieve the source code of a webpage.

In this tutorial, we’ll explore how to use Puppeteer to extract and save the HTML source code of a webpage for further analysis.

Being able to access the HTML source code of a webpage is a valuable asset for developers, designers, and anyone curious about how websites are constructed.

Puppeteer simplifies this process by allowing you to automate the retrieval of source code from any webpage.

The final completed source code is given at the end of the article.

1. Setting Up Puppeteer

Before diving into the code, ensure you have Node.js installed on your system.

You can install Puppeteer using the following command:

npm install puppeteer

2. Retrieving HTML Source Code

Let’s start by creating a function that takes a URL and an output file name as parameters.

This function will launch a browser instance, navigate to the specified URL, retrieve the HTML source code, and then close the browser.

const puppeteer = require("puppeteer");
const fs = require("fs");

async function getSourceCode(url, outputData) {
    try {
        const browser = await puppeteer.launch({ headless: false });
        const page = await browser.newPage();

        await page.goto(url);

        const sourceCode = await page.content();

        // Rest of the code...
    } catch (error) {
        console.error("Error getting source code of the URL");
    }
}

const url = "https://example.com";
const outputData = "source_code.html";

getSourceCode(url, outputData);

3. Saving Source Code to File

Continuing from where we left off, we’ll extend our function to save the retrieved HTML source code to a file.

4. Final Complete Source Code

The final complete source code is shared below for your reference.

const puppeteer = require("puppeteer");
const fs = require("fs");

async function getSourceCode(url, outputData){
    try {
        const browser = await puppeteer.launch({headless: false});
        
        const page = await browser.newPage();

        await page.goto(url);

        const sourceCode = await page.content();

        fs.writeFileSync(outputData, sourceCode, "utf-8");
        
        await browser.close();

        console.log("Successfully executed the source code of the URL");

    } catch(error){
        console.error("Error getting source code of the url");
    }
}

const url = "https://example.com";
const outputData = "source_code.html";

getSourceCode(url, outputData);

Puppeteer empowers developers to automate tasks that were once time-consuming, such as retrieving web source code.

In this tutorial, we learned how to set up Puppeteer, extract the HTML source code of a webpage, and save it to a file for future use.

Whether you’re debugging a website, analyzing the structure of a page, or learning from existing sites, Puppeteer’s capabilities can significantly enhance your development workflow.

As you continue to explore Puppeteer’s functionalities, you’ll uncover a world of automation possibilities that can streamline your web development tasks and enhance your understanding of the web ecosystem.

Happy coding and exploring!

Get the Source Code Of Any Webpage using Puppeteer

1. Setting Up Puppeteer

2. Retrieving HTML Source Code

3. Saving Source Code to File

4. Final Complete Source Code

Related

By soorya

You Missed

Automating Form Submission and Screenshot Capture with Puppeteer

Scraping Website Source Code Using Puppeteer

Building a RESTful API with Expressjs and MongoDB using Mongoose

Building CRUD RESTful API with Express.js and MySQL

1. Setting Up Puppeteer

2. Retrieving HTML Source Code

3. Saving Source Code to File

4. Final Complete Source Code

Share this:

Related

By soorya

Related Post

You Missed