
Company News
Socket Named Top Sales Organization by RepVue
Socket won two 2026 Reppy Awards from RepVue, ranking in the top 5% of all sales orgs. AE Alexandra Lister shares what it's like to grow a sales career here.
html-encoding-sniffer
Advanced tools
This package implements the HTML Standard's encoding sniffing algorithm in all its glory. The most interesting part of this is how it pre-scans the first 1024 bytes in order to search for certain <meta charset>-related patterns.
const htmlEncodingSniffer = require("html-encoding-sniffer");
const fs = require("fs");
const htmlBytes = fs.readFileSync("./html-page.html");
const sniffedEncoding = htmlEncodingSniffer(htmlBytes);
The passed bytes are given as a Uint8Array; the Node.js Buffer subclass of Uint8Array will also work, as shown above.
The returned value will be a canonical encoding name (not a label). You might then combine this with the @exodus/bytes package to decode the result:
const { TextDecoder } = require("@exodus/bytes");
const htmlString = (new TextDecoder(sniffedEncoding)).decode(htmlBytes);
You can pass the following options to htmlEncodingSniffer:
const sniffedEncoding = htmlEncodingSniffer(htmlBytes, {
xml,
transportLayerEncodingLabel,
defaultEncoding,
});
The xml option is a boolean, defaulting to false. If set to true, then we bypass the HTML encoding sniffing algorithm and compute the encoding based on the presence of a BOM, or the other options provided. (In the future, we may perform sniffing of the <?xml?> declaration, but for now that is not implemented.)
The transportLayerEncodingLabel is an encoding label that is obtained from the "transport layer" (probably a HTTP Content-Type header), which overrides everything but a BOM.
The defaultEncoding is the ultimate fallback encoding used if no valid encoding is supplied by the transport layer, and no encoding is sniffed from the bytes. For HTML, it defaults to "windows-1252", as recommended by the algorithm's table of suggested defaults for "All other locales" (including the en locale). For XML, it defaults to "UTF-8".
This package was originally based on the excellent work of @nicolashenry, in jsdom. It has since been pulled out into this separate package.
iconv-lite is a package that provides encoding and decoding of text in various character sets. Unlike html-encoding-sniffer, which is specifically designed for sniffing HTML document encodings, iconv-lite supports a broader range of encodings and can be used for general text conversion purposes.
jschardet is a character encoding detector, similar to the functionality provided by html-encoding-sniffer. However, jschardet is based on the universalchardet library and can be used to detect the encoding of any text, not just HTML documents. It offers a more general approach to encoding detection.
FAQs
Sniff the encoding from a HTML byte stream
The npm package html-encoding-sniffer receives a total of 56,226,838 weekly downloads. As such, html-encoding-sniffer popularity was classified as popular.
We found that html-encoding-sniffer demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 6 open source maintainers collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Company News
Socket won two 2026 Reppy Awards from RepVue, ranking in the top 5% of all sales orgs. AE Alexandra Lister shares what it's like to grow a sales career here.

Security News
NIST will stop enriching most CVEs under a new risk-based model, narrowing the NVD's scope as vulnerability submissions continue to surge.

Company News
/Security News
Socket is an initial recipient of OpenAI's Cybersecurity Grant Program, which commits $10M in API credits to defenders securing open source software.