New Research: Supply Chain Attack on Axios Pulls Malicious Dependency from npm.Details
Socket
Book a DemoSign in
Socket

html-encoding-sniffer

Package Overview
Dependencies
Maintainers
6
Versions
9
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

html-encoding-sniffer

Sniff the encoding from a HTML byte stream

latest
Source
npmnpm
Version
6.0.0
Version published
Weekly downloads
60M
8.8%
Maintainers
6
Weekly downloads
 
Created
Source

Determine the Encoding of a HTML Byte Stream

This package implements the HTML Standard's encoding sniffing algorithm in all its glory. The most interesting part of this is how it pre-scans the first 1024 bytes in order to search for certain <meta charset>-related patterns.

const htmlEncodingSniffer = require("html-encoding-sniffer");
const fs = require("fs");

const htmlBytes = fs.readFileSync("./html-page.html");
const sniffedEncoding = htmlEncodingSniffer(htmlBytes);

The passed bytes are given as a Uint8Array; the Node.js Buffer subclass of Uint8Array will also work, as shown above.

The returned value will be a canonical encoding name (not a label). You might then combine this with the @exodus/bytes package to decode the result:

const { TextDecoder } = require("@exodus/bytes");
const htmlString = (new TextDecoder(sniffedEncoding)).decode(htmlBytes);

Options

You can pass the following options to htmlEncodingSniffer:

const sniffedEncoding = htmlEncodingSniffer(htmlBytes, {
  xml,
  transportLayerEncodingLabel,
  defaultEncoding,
});

The xml option is a boolean, defaulting to false. If set to true, then we bypass the HTML encoding sniffing algorithm and compute the encoding based on the presence of a BOM, or the other options provided. (In the future, we may perform sniffing of the <?xml?> declaration, but for now that is not implemented.)

The transportLayerEncodingLabel is an encoding label that is obtained from the "transport layer" (probably a HTTP Content-Type header), which overrides everything but a BOM.

The defaultEncoding is the ultimate fallback encoding used if no valid encoding is supplied by the transport layer, and no encoding is sniffed from the bytes. For HTML, it defaults to "windows-1252", as recommended by the algorithm's table of suggested defaults for "All other locales" (including the en locale). For XML, it defaults to "UTF-8".

Credits

This package was originally based on the excellent work of @nicolashenry, in jsdom. It has since been pulled out into this separate package.

Keywords

encoding

FAQs

Package last updated on 26 Dec 2025

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts