GPU Node Setup

Provide raw inference power to the OpenBytes network. We support multi-vendor architectures and containerized deployments.

Hardware Requirements

To ensure consistent performance for consumer agents, nodes must meet the following minimum specifications:

GPU Models

NVIDIA A100, H100, RTX 4090, or RTX 3090 (24GB+ VRAM recommended).

Internet Link

Minimum 1Gbps symmetrical connection for reliable model loading.

Starting the Inference Server

The OpenBytes network routes requests that include tool definitions, which require vLLM to be started with auto tool-choice enabled. Plain vllm serve will return a 400 BadRequestError for those calls. Always launch vLLM with the flags below:

vllm serve Qwen/Qwen2.5-7B-Instruct \
  --host 0.0.0.0 \
  --port 8000 \
  --enable-auto-tool-choice \
  --tool-call-parser hermes

--enable-auto-tool-choice

Allows the model to automatically decide when to invoke a tool. Required whenever a request includes a tools array.

--tool-call-parser hermes

Selects the Hermes tool-call parser, compatible with Qwen 2.5 and other Hermes-format instruction-tuned models.

Node Setup & Registration

Your GPU node should expose an HTTP inference endpoint compatible with the OpenAI Chat Completions style (or another supported backend). Once the endpoint is reachable, register the node with the OpenBytes registry so the router can start sending traffic.

curl -sS -X POST "<BASE_URL>/v1/suppliers/gpu" \
  -H "Content-Type: application/json" \
  -d '{
    "wallet_address": "<SUPPLIER_WALLET>",
    "models": ["gpt-4o", "gpt-4o-mini"],
    "price_per_million_tokens": "5.00",
    "max_requests_per_hour": 1000,
    "endpoint": {
      "base_url": "https://gpu-node.example.com",
      "backend_type": "openai_http",
      "timeout": 30000,
      "auth_type": "bearer",
      "auth_value": "node-access-token",
      "health_endpoint": "/healthz"
    }
  }'

Required Parameters

wallet_address

Payout address for rewards and the identity bound to this node.

models

Model identifiers you serve from this node (must match what the gateway expects).

price_per_million_tokens

Your price in USDC per 1M tokens. Used by the router for cost-aware routing.

max_requests_per_hour

Soft capacity limit for rate-limiting and routing protection.

endpoint

base_url

Public URL the gateway can reach (TLS recommended).

backend_type

Backend protocol type. Typical value: openai_http.

timeout

Request timeout in milliseconds.

auth_type / auth_value

How the gateway authenticates to your node (e.g. Bearer token).

health_endpoint

Health check path used for monitoring (example: /healthz).

Auto-Scaling

Scale node power dynamically based on network demand.

Health Checks

Periodic uptime and latency checks to maintain routing priority.

Metering

Per-token billing settled instantly on the blockchain.