Skip to content

Taraflex/zhlob

Repository files navigation

Zhlob (Жлоб)

Zhlob is a specialized MITM (Man-in-the-Middle) proxy server designed to breathe life back into the modern, bloated web when browsing over extremely narrow or high-latency data channels. It surgically strips down and re-packs web content to minimize every possible byte while maintaining core site functionality.

Core Philosophy: Reviving the Web for Slow Connections

Modern websites are weighed down by megabytes of tracking scripts, heavy images, and metadata. Zhlob acts as a transformative filter between your browser and the internet, turning a bandwidth-heavy page into a lightweight version suitable for dial-up, satellite, or congested mobile networks.


Technical Mechanisms & Optimization Strategy

1. HTML Reconstruction & "Rechunking"

Zhlob alters how HTML is delivered to improve perceived performance and reduce payload:

  • Surgical Cleaning: Using lol_html, it removes aria-* attributes, itemprop, itemscope, itemtype, and role. It also strips HTML comments, noscript tags, and unnecessary attributes like decoding from images or loading='eager' from iframes.
  • Asynchronous Styles: If html_rechunk_size is active and the page CSP allows inline JS, Zhlob transforms standard <link rel="stylesheet"> into an asynchronous loader (rel="preload" + as="style" + onload="this.rel='stylesheet'"). This prevents CSS from blocking the initial render, allowing content to appear instantly.
  • URL & Link Cleaning: Automatically strips tracking parameters from <a> tags' href attributes: utm_*, fbclid, gclid, yclid, ysclid, _ga, _gl, _openstat, rb_clickid. It also simplifies rel attributes to keep only security-related values (like noopener).
  • The Rechunking Trick: By breaking HTML into small, fixed-size chunks (default: 1360 bytes), Zhlob forces the browser to render the page as packets arrive. This is critical for high-latency links where waiting for a large buffer would create a visible delay.

2. High-Speed Ad/Tracker Blocking (DAC & PSL)

Zhlob uses a Double-Array Aho-Corasick (DAC) engine for $O(n)$ pattern matching.

  • DAC Engine: Standard Adblock rules are compiled into a single binary state machine. It scans HTML script tags (both src and inline code) for blocked patterns.
  • Strictly Script-Targeted: Note that Zhlob's DAC matching is currently applied only to script execution.
  • Public Suffix List (PSL): Crucial for distinguishing between TLDs (like .com or .co.uk) and actual domains. This ensures that "Third-Party" rules are applied correctly. Zhlob includes a built-in PSL, but using an external one via --psl is recommended for up-to-date accuracy.

3. Aggressive Image Downcycling

Instead of serving original images, Zhlob:

  • Dynamic Scaling: Clamps image dimensions so the shorter side stays within a specific range (default: 96-384px), adjusted by a scale factor.
  • WebP Transformation: Converts images to low-quality grayscale-optimized WebP. This format is significantly more efficient than JPEG or PNG for the targets Zhlob aims for.
  • Metadata Removal: Strips all EXIF, ICC profiles, and alternative sources (srcset, sizes).

Configuration & Runtime Options (CLI & Environment)

Flags can be set via command-line or environment variables (prefixed with ZHLOB_).

  • --listen / -L <ADDR> (Default: 127.0.0.1:5151) The address where the proxy is reachable. Supports IP:PORT (e.g., 0.0.0.0:8080), :PORT, or http://....
  • --psl <PSL> Path to a Public Suffix List file (use - for stdin). Critical for accurate third-party ad-rule evaluation. If omitted, Zhlob uses an internal fallback list. This flag is used both in dacgen and the main proxy mode.
  • --dac <PATH> Path to the binary DAC file generated by the dacgen command. Without this, no content-based ad blocking will occur.
  • --cache-max-age <DURATION> (Default: 2h) Overrides the max-age directive in the Cache-Control header for all transformed responses. This forces the browser to keep optimized content in its local cache for longer, reducing repeated requests over narrow channels.
  • --fast-304 (Default: true) When enabled, Zhlob will attempt to return 304 Not Modified without contacting the upstream server if it detects that the browser already has a usable version:
    • Works for resources with a zhlob~ prefix in the ETag (previously transformed).
    • Works for any image/*, video/*, or audio/* if the browser provides If-Modified-Since or If-None-Match, assuming the content hasn't changed to save data.
  • --skip-aux-resources (Default: true) Aggressively skips requests for "auxiliary" content that hasn't been cached yet.
    • Blocked types: video/*, audio/*, font/*, application/font-*, application/x-font-*.
    • Favicons: Any path starting with /favicon and ending in .ico, .png, or .gif.
    • Response: Returns 204 No Content for these requests to stop the browser from waiting.
  • --image-scale <FLOAT> (Default: 0.5) The scale factor for image dimensions.
    • 1.0 means keep original size (but still re-compress).
    • 0.5 means halve the width and height.
    • Set to 0.0 to completely disable all image processing.
  • --image-scale-limit <MIN..MAX> (Default: 96..384) Clamps the dimension calculated by image-scale. The shorter side of the image will never be smaller than MIN or larger than MAX pixels.
  • --html-clean (Default: true) The master switch for HTML transformation. If disabled, Zhlob will not strip metadata, comments, or scripts, and will not clean link attributes.
  • --html-rechunk-size <SIZE> (Default: 1360) Sets the target size for network chunks.
    • Large chunks are split to this size.
    • If set to 0, rechunking and asynchronous style loading are disabled.
  • --transform-limit <SIZE> (Default: 5m) Safety threshold. Any resource with a Content-Length larger than this (e.g., 5MB) will be passed through as-is. This prevents the proxy from exhausting memory or CPU when encountering massive files.
  • --log-level <LEVEL> (Default: info) Log verbosity: off, error, warn, info, debug, trace.

Adblock Rule Support & DAC Generation

Zhlob's dacgen command compiles standard text-based filter lists into a high-performance binary DAC state machine. Because it aims for maximum speed, only a subset of Adblock syntax is supported.

Supported Rule Patterns:

  • Simple Substrings: tracker.js — matches the presence of the string anywhere in a script URL or inline code.
  • Domain Anchors: ||example.com — matches the domain and all subdomains.
  • Start Anchors: |http:// — matches patterns strictly at the start of a URL.
  • Third-Party Restriction: Rules containing $third-party or $3p (requires PSL to function correctly).
  • Type Filtering: Rules containing $script or $all.

Unsupported (Strictly Skipped):

  • Regex: Patterns enclosed in slashes /.../.
  • Element Hiding: All CSS-based rules (##, #@#, etc.).
  • End Anchors: Patterns ending in | (e.g., index.js|).
  • Wildcards: Patterns containing * in the middle are skipped.
  • Content type: Content types that have no effect on script
  • Advanced Options: domain=, rewrite=, csp=, redirect=, etc., are ignored, and if they are required for the rule to be safe, the rule itself is typically skipped.

Building a DAC file:

  1. Download filters: You can find sources in download_ad_filters.sh. It is recommended to use:
  2. Generate:
    ./zhlob dacgen --dac blocklist.dac --psl public_suffix_list.dat filter1.txt filter2.txt
    Providing --psl during generation is optional but recommended as it helps the generator optimize certain domain-based rules.

Setup: HTTPS Interception

  1. Run Zhlob once. It generates CA certificates in ~/.zhlob/.
  2. Install zhlob-ca-cert.cer (Windows/Android) or zhlob-ca-cert.pem (Other) as a Trusted Root CA in your system or browser.
  3. Visit http://mitm.it through the proxy for detailed instructions.

About

Reviving the web for dial-up and satellite links. A radical content-optimizing proxy that removes bloat, shrinks images, and blocks ads at the wire level.

Topics

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT

Stars

Watchers

Forks

Releases

No releases published

Contributors