Security &
Compliance
Technical documentation for security teams evaluating PII Scrambler for enterprise use. Every claim on this page is independently verifiable.
No APIs, no uploads, no external calls. Data never leaves the browser.
Names from 106 countries — the largest client-side name dictionary of any PII tool.
Static HTML/JS/CSS export. No Node.js, no backend, no database.
Open source. Inspect the network tab yourself to verify.
Architecture Overview
PII Scrambler is built with Next.js using output: "export" — this produces a fully static site. The build output is plain HTML, JavaScript, and CSS files. There is no server-side code, no API routes, no database, and no runtime environment.
File processing runs entirely in a Web Worker — a separate browser thread that handles parsing, PII detection, and file rebuilding. The Web Worker communicates with the main thread exclusively through postMessage with transferable ArrayBuffers.
The deployed application consists of:
- ●Static HTML, JS bundles, and CSS
- ●Two name dictionary files (3.2 MB + 6.3 MB, served from same origin)
- ●A PDF.js worker script (for PDF text extraction)
No environment variables are used. No secrets or API keys exist anywhere in the codebase.
Data Flow
File selection
User selects a file via drag-drop or file picker. The file is read into an ArrayBuffer in browser memory.
Worker transfer
The ArrayBuffer is transferred (zero-copy) to a Web Worker thread. The main thread no longer holds this data.
Text extraction
The appropriate file processor extracts text content. PDF uses pdfjs-dist, DOCX uses jszip, XLSX uses the xlsx library. All libraries run in-browser.
PII detection
14 regex patterns scan for structured PII (emails, credit cards across all major networks, SSNs, NI numbers, dates of birth, IP addresses, phone numbers, postcodes, and VINs). Three-tier name detection — including a 1,229,656-name dictionary covering 106 countries — runs in parallel. All data is loaded from static files bundled with the app.
File rebuild
The processor rebuilds the file in its original format with PII replaced by labels like [EMAIL], [NAME], [PHONE_NUMBER]. For PDFs, pages containing PII are rendered to images and rebuilt with the original text content stream removed — only non-PII text is re-added as a selectable layer. This ensures redacted text cannot be recovered via copy-paste or text extraction.
Download
The cleaned file ArrayBuffer is transferred back to the main thread. A download is triggered via URL.createObjectURL. The object URL is immediately revoked and all references released for garbage collection.
At no point in this flow does data leave the browser's memory boundary. There is no network transmission of file content.
Network Activity
What you will see in the Network tab
On page load: HTML document, JS bundles, CSS, font files (DM Sans from Google Fonts, PX Grotesk from same origin), and two name dictionary files: /names-first.txt (3.2 MB) and /names-last.txt (6.3 MB) from same origin. These are preloaded static assets bundled with the app — no different from loading CSS or JS.
On first PDF process: /pdf.worker.min.mjs is loaded from same origin (Mozilla's PDF.js worker).
On file process: Nothing. All resources are already loaded and cached.
What you will NOT see
✕XHR/fetch to external domains
✕WebSocket connections
✕Tracking pixels or beacons
✕Google Analytics or any analytics
✕Mixpanel, Segment, Amplitude
✕Sentry, Bugsnag, or error reporting
✕CDN calls for PII processing
✕Any outbound POST requests
What It Won't Do
Technical Evidence
Static export configuration
// next.config.ts
const nextConfig = {
output: "export", // ← Fully static, no server
webpack: (config) => {
config.resolve.fallback = {
fs: false, // No filesystem access
path: false, // No path module
crypto: false, // No crypto module
// ... all Node.js modules disabled
};
return config;
},
};No API routes
The src/app/ directory contains only page routes and layout files. There is no api/ subdirectory. No server-side request handlers exist anywhere in the codebase.
Worker isolation
The Web Worker runs in a separate thread within the same browser security context. Data is transferred via postMessage using Transferable ArrayBuffers (zero-copy, single-owner semantics). The worker terminates after each file is processed.
Dependency Audit
| Package | Purpose | Network |
|---|---|---|
| next | Static site generation framework | None |
| react / react-dom | UI rendering | None |
| pdfjs-dist | PDF text extraction (Mozilla PDF.js) | None |
| pdf-lib | PDF modification (content stream replacement, image embedding, text layer rebuild) | None |
| jszip | ZIP manipulation (DOCX files are ZIP archives) | None |
| xlsx | Excel spreadsheet parsing | None |
| papaparse | CSV parsing | None |
| compromise | NLP library for named entity recognition | None |
Every dependency operates exclusively on in-memory data. None make network requests, transmit telemetry, or access external resources.
Name Detection & Data Provenance
The largest client-side name dictionary of any PII tool. 437k first names and 793k surnames — precision-filtered at build time to remove ~4,000 non-name words (countries, cities, common English) while preserving every legitimate name. Combined with patronymic suffix and prefix pattern recognition for naming conventions that dictionary lookup alone would miss.
Three-tier detection
Contextual heuristics
Detects names near honorifics (Mr., Dr.), salutations (Dear), form labels (Name:), signature blocks, and email-derived patterns.
NLP
Named entity recognition via compromise.js — a client-side NLP library. No external API calls.
Dictionary + suffix patterns
1,229,656-name dictionary (106 countries), precision-filtered at build time to exclude non-name words, plus cultural suffix/prefix pattern matching. Multi-tier agreement boosts confidence; common English words and blocklisted terms are penalised.
Cultural naming convention coverage
Beyond the dictionary, PII Scrambler recognises culturally-specific suffix and prefix patterns for surnames that wouldn't appear in standard English name lists:
Slavic -ović, -ski, -enko, -chuk
Arabic Al-, El-, Bin-, Abu-, Bint-
Turkish -oğlu
Persian -zadeh, -pour, -nejad
Georgian -dze, -shvili
Armenian -ian, -yan
Greek -opoulos, -idis, -akis
Scandinavian -sson, -ström
Romanian -escu, -eanu
Portuguese -eiro, -eira
Full Unicode diacritics support including Turkish dotless-i, Polish stroke-l, Scandinavian stroke-o, and Latin Extended characters.
Data sources (all public-domain)
philipperemy/name-dataset — 730k first names and 983k last names sourced from 106 countries. Precision-filtered at build time to remove ~4,000 common English words, country names, and city names. Open-source, MIT licensed.
US Census Bureau 2010 — ~162,000 surnames, frequency-ranked. Public domain.
NameDatabases (GitHub) — ~20,000 first names and ~85,000 surnames. Open-source.
International supplement — 500+ hand-curated names covering South Asian, East Asian, Middle Eastern, European, and African naming conventions.
Name data is compiled at build time via npm run build:names and stored as static text files. No runtime fetching from external sources occurs.
Deployment Options
Hosted
Deploy as static files on any CDN or static hosting platform (Vercel, Netlify, S3 + CloudFront, GitHub Pages).
Self-hosted
Run npm run build and deploy the out/ directory to any internal static file server. Full control over infrastructure.
Air-gapped
Build once, copy the output to an isolated network. No internet connection required after the initial build. All assets including name dictionaries are bundled.
Note: The app loads DM Sans from Google Fonts on page load. For air-gapped or fully isolated deployments, this font can be self-hosted. The display font (PX Grotesk) is already bundled locally.
Verification Steps
Every claim on this page can be independently verified. Here's how:
Open browser DevTools → Network tab
Clear the log, then process a file. You will see only same-origin requests for name dictionary files. Zero external domain requests.
Inspect next.config.ts
Confirm output: "export" is set. This guarantees static-only output with no server runtime.
Check the src/app/ directory
Confirm there is no api/ subdirectory. No server-side request handlers exist.
Review package.json
Confirm no analytics, telemetry, or tracking dependencies are listed.
Search for fetch() calls
The only fetch calls in the entire codebase load /names-first.txt and /names-last.txt — static files from the same origin.
For maximum assurance
Clone the repository, build locally, and deploy on an air-gapped network. Process a sensitive document and observe: zero network traffic beyond the static assets.