Deploying RAG API Server with Railway in Just 5 Minutes Without Docker
A guide to deploying a RAG system for quant research on Railway. It automatically deploys on GitHub push, and costs are much lower than Heroku.
Why Is Deployment a Hassle?

I built a RAG system locally. Combining LangChain + Qdrant + Claude API, it performs well on financial document searches. But to share with my team, I need to host it on a server.
Typical options are:
- SSH into EC2 and set up manually — cumbersome and hard to manage
- Docker + ECS/Kubernetes — overly complex
- Heroku — expensive (no free plan anymore)
- Railway — I chose this
Why Railway Is Convenient
Railway connects to your GitHub repo and automatically builds and deploys on main branch pushes. If you have a Dockerfile, it uses that; otherwise, it detects the language and builds automatically.
For a Python FastAPI project:
- Push code to GitHub
- In Railway, select “Deploy from GitHub”
- Choose the repository
- Done
Environment variables are set via the Railway dashboard. No need to hardcode sensitive info like ANTHROPIC_API_KEY, QDRANT_URL.
Example FastAPI RAG Server
# main.py
from fastapi import FastAPI
from anthropic import Anthropic
from qdrant_client import QdrantClient
import os
app = FastAPI()
client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY")]
qdrant = QdrantClient(url=os.environ["QDRANT_URL"])
@app.post("/query")
async def query(question: str) -> dict:
# 1. Search relevant documents in Qdrant
results = qdrant.search(
collection_name="research_docs",
query_vector=embed(question),
limit=5
)
context = "\n\n".join([r.payload["text"] for r in results])
# 2. Generate answer using Claude
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1000,
messages=[{
"role": "user",
"content": f"Context:\n{context}\n\nQuestion: {question}"
}]
)
return {"answer": response.content[0].text}
As long as requirements.txt is included, Railway automatically sets up the Python environment.
Cost Calculation
Railway Pricing (as of April 2026):
- Hobby Plan: $5/month — 512MB RAM, 1 vCPU, 1GB disk
- Pro Plan: $20/month — 8GB RAM, 8 vCPU, 100GB disk
- Charges based only on actual usage (running time × resources)
A lightweight internal RAG API (for quant team use) is well suited for Hobby at $5/month. FastAPI doesn’t use much memory, and Qdrant can be hosted as a separate instance (Qdrant Cloud free tier).
Comparison with Heroku:
- Heroku Basic: $7/month (512MB RAM, single server)
- Railway Hobby: $5/month + usage-based, multiple services deployments
Step-by-Step Setup
1. Code Structure
my-rag-api/
├── main.py
├── requirements.txt
└── (Optional: Dockerfile)
requirements.txt
fastapi==0.115.0
uvicorn==0.32.0
anthropic==0.40.0
qdrant-client==1.12.0
2. Deploy on Railway
- Visit railway.app and log in with GitHub
- New Project → Deploy from GitHub repo
- Select your repository
- Add environment variables:
ANTHROPIC_API_KEY,QDRANT_URL - Automatic deployment completes
Railway detects main.py running FastAPI and automatically runs uvicorn main:app --host 0.0.0.0 --port $PORT.
3. Custom Domain (Optional)
Default domain is xxx.railway.app. Custom domains are linked via CNAME records in the Railway dashboard.
Useful Pattern: Slack Webhook Integration
For internal quant RAG bots, linking with Slack slash commands is convenient.
@app.post("/slack/command")
async def slack_command(
text: str = Form(...),
response_url: str = Form(...)
) -> dict:
# Process asynchronously and send response
asyncio.create_task(
process_and_respond(text, response_url)
)
return {"text": "Analyzing... (response will be sent shortly)"}
If you type /rag today BTC on-chain summary, RAG searches stored documents and answers.
Conclusion
Railway is a great service for deploying Python servers without complex DevOps. It automates CI/CD via GitHub integration and manages environment variables cleanly.
For internal research tools or team shared APIs, Hobby plan at $5/month is enough. As traffic grows, upgrade to Pro or increase instances.
Sign up with Railway (referral link) — gets you $20 credit.
Recommended Articles
What is an LLM Agent? From Concept to Quant Investment Applications
RunPod vs Vast.ai: Real-world Comparison of Local LLM and GPU Rental for Backtesting
Bitcoin News Sentiment Analysis: Techniques to Read Market Sentiment and Investment Strategies
Related Posts
Newsletter
Weekly Quant & Market Insights
Get market analysis, quant strategy ideas, and AI & data tool insights delivered to your inbox.
Subscribe →