GPU Node Setup
Provide raw inference power to the BitNow network. We support multi-vendor architectures and containerized deployments.
Hardware Requirements
To ensure consistent performance for consumer agents, nodes must meet the following minimum specifications:
GPU Models
NVIDIA A100, H100, RTX 4090, or RTX 3090 (24GB+ VRAM recommended).
Internet Link
Minimum 1Gbps symmetrical connection for reliable model loading.
Starting the Inference Server
The BitNow network routes requests that include tool definitions, which require vLLM to be started with auto tool-choice enabled. Plain vllm serve will return a 400 BadRequestError for those calls. Always launch vLLM with the flags below:
vllm serve Qwen/Qwen2.5-7B-Instruct \
--host 0.0.0.0 \
--port 8000 \
--enable-auto-tool-choice \
--tool-call-parser hermes--enable-auto-tool-choiceAllows the model to automatically decide when to invoke a tool. Required whenever a request includes a tools array.
--tool-call-parser hermesSelects the Hermes tool-call parser, compatible with Qwen 2.5 and other Hermes-format instruction-tuned models.
Node Setup & Registration
Your GPU node should expose an HTTP inference endpoint compatible with the OpenAI Chat Completions style (or another supported backend). Once the endpoint is reachable, register the node with the BitNow registry so the router can start sending traffic.
curl -sS -X POST "<BASE_URL>/v1/suppliers/gpu" \
-H "Content-Type: application/json" \
-d '{
"wallet_address": "<SUPPLIER_WALLET>",
"models": ["gpt-4o", "gpt-4o-mini"],
"price_per_million_tokens": "5.00",
"max_requests_per_hour": 1000,
"endpoint": {
"base_url": "https://gpu-node.example.com",
"backend_type": "openai_http",
"timeout": 30000,
"auth_type": "bearer",
"auth_value": "node-access-token",
"health_endpoint": "/healthz"
}
}'Required Parameters
Auto-Scaling
Scale node power dynamically based on network demand.
Health Checks
Periodic uptime and latency checks to maintain routing priority.
Metering
Per-token billing settled instantly on the blockchain.