NPU-Centric Consensus

Mining

TensorChain is a verifiable delay function based on high-dimensional matrix multiplication, designed to favor consumer NPUs over industrial GPU farms.

TensorChain Proof of Useful Work

TensorChain saturates unified memory bandwidth rather than raw TFLOPS, flipping the economics in favor of consumer hardware. The puzzle is tuned to target 75% of available system RAM on high-end consumer devices.

  • Seed derivation from previous block hash + miner nonce via SHAKE256
  • Deterministic matrix generation: A, B sized to ~100 GB baseline (above H100 VRAM)
  • Compute noisy product C' = (A+E)·(B+F) using Neural Engine INT8 tensor units
  • Digest via Merkle root of diagonal elements, hashed into succinct proof
  • Verification via Freivalds' algorithm in O(n²) instead of O(n³)
  • Proof of Memory Capacity

    By setting matrix size N larger than H100 VRAM (80GB) but smaller than Mac Studio UMA (192GB), TensorChain creates a "Proof of Memory Capacity and Bandwidth" that physically excludes PCIe-bound GPU rigs.

    The Batch-1 Efficiency Gap

    Industrial GPUs collapse in efficiency when forced to process single inference requests. Consumer NPUs are optimized for exactly this workload.

    Metric Nvidia H100 (Industrial) Apple M2 Ultra (Consumer)
    Optimal Batch Size ≥ 64 1
    Joules per Token (Batch 1) ~15 J ~11 J
    Memory Access CPU → PCIe → VRAM copies Unified Memory (0 copy)
    Outcome Expensive latency overhead Native advantage

    By mandating sequential, low-batch inference operations, Po8 forces industrial miners to operate in their most inefficient regime while consumer devices operate in their optimal regime. This economic inversion is the key to decentralization.

    Hardware Configurations

    Validator Tier

    Mac Studio (M2/M3 Ultra) with 128 GB+ RAM. Full node + miner + mixnet relay. Maximum TensorChain participation.

    Miner Tier

    MacBook Pro M-Series Max with 64 GB RAM. Sequential TensorChain workloads. Efficient batch-1 inference.

    Edge Tier

    Kneron KL720 USB accelerator. Participates via sharded mining pools. Memory-light CNN workloads.

    Mobile Tier

    Mobile NPUs via sharded task decomposition. Contributes to aggregate network security through pooled resources.

    Tensor sizes automatically adapt to fill available unified memory without swapping. The scheduler routes memory-heavy tasks to UMA nodes and compute-heavy tasks to accelerator nodes.

    InferNet Layer

    Beyond entropy generation, InferNet utilizes NPUs for useful AI inference tasks with economic value.

    Optimistic Verification

    Miners run models and post results with staked bonds. Fishermen re-execute off-chain during challenge windows.

    Bisection Protocol

    Disputes are mediated down to a single instruction. The divergent operation is executed on-chain to determine truth.

    INT8 Determinism

    Strict INT8 quantization ensures bit-for-bit identical outputs across all hardware—Kneron dongles match M2 Ultras.

    Model Registry

    On-chain registry tracks supported models with quantization parameters, ONNX graph hashes, and licensing metadata.

    Pool Architecture

    Not everyone owns high-end workstations. Sharded mining enables participation from modular accelerators and mobile devices.

    Workload Decomposition

    • Large matrices decomposed into sub-blocks
    • Kneron nodes assigned specific sub-blocks to compute
    • Results aggregated by pool coordinators
    • Rewards distributed proportionally to contribution

    Reconfigurable Data Paths

    • Kneron architecture switches operation types at runtime
    • Conv2D to Dilated Convolution without reloading
    • High utilization even on fragmented workloads
    • Native protocol support for heterogeneous pools