Converting DOCX to HTML in LWC Using Mammoth.js

When working with Lightning Web Components (LWC) in Salesforce, handling DOCX files can be tricky. If your use case involves converting DOCX files to HTML, Mammoth.js is an excellent solution. It’s a lightweight JavaScript library designed to accurately convert DOCX files to HTML while maintaining clean, semantic output. In this blog, I’ll walk you through the process of using Mammoth.js in an LWC to achieve this.

Why Use Mammoth.js?

Mammoth.js has several advantages:

  • Clean HTML output: It generates simple, clean HTML without unnecessary elements, making it easier to style and integrate into your LWC.
  • Customizable: You can control how specific DOCX elements like headings, lists, and tables are converted.
  • JavaScript-based: Perfect for client-side rendering in LWC, with no need for server-side conversions.

Prerequisites

Before we dive into coding, ensure you have the following:

  • Basic understanding of Salesforce LWC.
  • Experience working with JavaScript libraries in an LWC environment.
  • Familiarity with Mammoth.js, or you can include the library using npm or a CDN.

Step-by-Step Guide

1. Install Mammoth.js

Include the library via CDN, add the script tag to your LWC component:

<script src="https://cdnjs.cloudflare.com/ajax/libs/mammoth/1.4.2/mammoth.browser.min.js"></script>

 

2. Create LWC Component

Let’s create a simple LWC component where users can upload DOCX files, and the component will display the converted HTML.

  • HTML File (convertDocx.html)
<template>
    <lightning-card title="DOCX to HTML Converter">
        <div class="slds-m-around_medium">
            <input type="file" accept=".docx" onchange={handleFileChange} />
        </div>
        <div if:true={convertedHtml}>
            <h2>Converted HTML:</h2>
            <div lwc:dom="manual" class="converted-content"></div>
        </div>
    </lightning-card>
</template>

 

In this HTML file, we provide an input to upload .docx files and a container to display the converted HTML.

  • JavaScript File (convertDocx.js)
import { LightningElement } from 'lwc';
import mammoth from '@salesforce/resourceUrl/mammoth'; // If using static resources

export default class ConvertDocx extends LightningElement {
    convertedHtml;

    handleFileChange(event) {
        const file = event.target.files[0];
        if (file && file.type === 'application/vnd.openxmlformats-officedocument.wordprocessingml.document') {
            this.convertDocxToHtml(file);
        } else {
            alert('Please upload a valid DOCX file.');
        }
    }

    async convertDocxToHtml(file) {
        const arrayBuffer = await file.arrayBuffer();

        mammoth.convertToHtml({ arrayBuffer })
            .then((result) => {
                this.convertedHtml = result.value;
                const contentDiv = this.template.querySelector('.converted-content');
                contentDiv.innerHTML = this.convertedHtml;
            })
            .catch((error) => {
                console.error('Error converting DOCX:', error);
            });
    }
}

 

Here’s what’s happening in the JavaScript:

  1. handleFileChange: This method triggers when the user uploads a file. It checks if the uploaded file is a valid DOCX and then calls the convertDocxToHtml function.
  2. convertDocxToHtml: This function reads the file as an ArrayBuffer (required by Mammoth.js), then uses the mammoth.convertToHtml method to convert the DOCX to HTML. Once converted, it dynamically updates the DOM in the LWC component to display the result.

3. Include Static Resources (Optional)

If you are deploying this in a Salesforce environment, you might need to upload the Mammoth.js library as a static resource. Once uploaded:

  1. Import the static resource into your JavaScript file using import mammoth from '@salesforce/resourceUrl/mammoth';.
  2. Ensure that Mammoth.js is available and properly referenced in your component.

4. Styling and Additional Options

Mammoth.js offers options to control the output HTML. For example, you can pass additional options to convertToHtml to handle custom styles, images, or lists:

mammoth.convertToHtml({ arrayBuffer }, {
    styleMap: [
        "p[style-name='Heading 1'] => h1:fresh",
        "p[style-name='Heading 2'] => h2:fresh"
    ],
    convertImage: mammoth.images.imgElement(function(image) {
        return image.read("base64").then(function(imageBuffer) {
            return {
                src: "data:" + image.contentType + ";base64," + imageBuffer
            };
        });
    })
}).then((result) => {
    this.convertedHtml = result.value;
});

 

This example customizes the output by converting DOCX headers to HTML <h1> and <h2> tags and embedding DOCX images as base64-encoded img elements.

Final Thoughts

With Mammoth.js, converting DOCX to HTML in a Lightning Web Component becomes a straightforward task. The simplicity and flexibility of Mammoth.js make it a great choice when working with DOCX files on the front end. By following the steps outlined above, you can quickly add DOCX-to-HTML conversion functionality to your LWC projects.

If your use case involves more advanced customization, Mammoth.js also provides hooks for deeper control over how each DOCX element is converted, giving you the flexibility to fine-tune the output according to your needs.

Let me know how it works for you or if you encounter any challenges while implementing this!

5 2 votes
Article Rating