JD — AI Solution Architect, Agentic Systems (12–18 years)
Role Overview
Lead the design and delivery of enterprise agentic AI systems (Observe → Decide → Act → Learn). You’ll own the architecture, guide a squad of full-stack AI engineers, and partner with product/domain leads to hit measurable outcomes (quality, latency, cost per task, safety). This role combines hands-on technical leadership, people mentorship, and stakeholder management.
We do not require a specific GPA. We value real production work, design rigor, and measurable results.
- Non-traditional paths (bootcamps, self-taught, OSS track record) are welcome if you demonstrate competence via portfolios and interviews.
- Publications, patents, or conference talks are pluses, not requirements.
Key Outcomes (first 6 months)
- One production-ready agentic workflow live with guardrails, HITL approvals, rollback, and agreed SLOs (success %, p95 latency, $/task).
- A reusable reference architecture (RAG patterns, model routing, tool adapters, eval harness) adopted across squads.
- Team operating model in place: coding standards, PR/ADR discipline, on-call/runbooks, and weekly quality/cost reviews.
Leadership & People Responsibilities
- Lead a squad (3–5 engineers + data/platform) through discovery → design → build → launch; drive iteration cadence and unblock delivery.
- Mentor senior/staff engineers (pairing, design reviews, career feedback); raise the bar on code quality and architectural rigor.
- Define and enforce engineering practices: ADRs, SLAs/SLOs, incident response, IaC, secure-by-default patterns.
- Coordinate with product/domain SMEs for roadmap, priority trade-offs, and value tracking; run design reviews with InfoSec.
- Build team RACI and ensure crisp handoffs between retrieval, reasoning, and action layers.
Architecture & Delivery Responsibilities
- Own end-to-end architecture for agentic workloads: channels (web/Teams/Slack), edge/SSO, trust layer (redaction/policy), RAG, model routing, tool adapters, orchestration/HITL, audit and cost guardrails.
- Specify confidence bands & fallbacks; define approval matrices and allow/deny actions; implement idempotency & compensation patterns.
- Design backend/microservices (REST/gRPC), event-driven flows (queues/schedulers), and observability (requests/success/latency/$ per task).
- Shape data pipelines: ingestion, chunking, embeddings, metadata, freshness/TTL, vector stores; ensure citation integrity.
- Guide cloud & platform posture: Docker/K8s, CI/CD, infra as code, secrets/SSO, runtime tuning, and FinOps (budgets, rate limits, autoscaling).
Required Experience
- 12–18 years in software/solutions architecture; 4+ years delivering ML/NLP or conversational systems in production.
- Proven team leadership of cross-functional squads (planning, estimation, delivery, coaching).
- Built agentic or tool-using assistants and RAG pipelines at scale; strong grasp of retrieval quality vs. latency trade-offs.
- Deep backend (Python + one of Node/Java), microservices, messaging (Kafka/Rabbit/SQS), resiliency (retries, circuit breakers).
- AI/ML: PyTorch/TensorFlow, HuggingFace, embeddings/vector search; evals & prompt techniques.
- Data/Stores: SQL + NoSQL (Postgres/MySQL, Mongo/Elastic/Dynamo).
- Cloud/DevOps: AWS/Azure/GCP, Docker/K8s, CI/CD, observability (Prometheus/Grafana/OpenTelemetry), MLflow (or similar).
- Security/compliance literacy (PII handling, RBAC/least privilege, audit evidence).
Nice to Have
- LLM fine-tuning/LoRA, retrieval optimization; multi-agent patterns.
- Enterprise app ecosystems (SAP, Salesforce, ServiceNow) and/or RPA.
- Familiarity with tool/connector standards (e.g., MCP-style patterns).
Minimum (one of the following):
- Bachelor’s in Computer Science, Electrical/Computer Engineering, or related; or
- Bachelor’s in another engineering/science field plus substantial architecture leadership; or
- Equivalent experience (12–18+ yrs) leading architecture/delivery of ML/NLP or large distributed systems.
Preferred:
- Master’s in CS/AI/ML/Software Engineering (or MBA with strong technical undergrad).
- Formal training in Software Architecture, Distributed Systems, Security & Compliance, Data Management, Applied ML/NLP.
Evidence we look for (can substitute for advanced degrees):
- End-to-end reference architectures for agentic/LLM systems (Observe→Decide→Act→Learn) used in production.
- Track record leading squads (3–8 engineers) to ship secure, observable, cost-controlled AI workloads.
- Design docs/ADRs, runbooks, SLOs, and measurable outcomes (success %, p95 latency, $ per task).
- Stakeholder leadership: InfoSec reviews, compliance sign-offs, vendor assessments.
Relevant certifications (nice-to-have, not mandatory):
- Cloud Architect: AWS Solutions Architect Pro, Azure Solutions Architect Expert, Google Professional Cloud Architect.
- Security/Governance: ISO27001 lead implementer (or equivalent awareness), CISSP (nice), SOC2 familiarity.
- Data/Platform: Kubernetes (CKA), Terraform, Databricks/Snowflake (architect level).
- AI/ML: Google Pro ML Engineer / Azure AI Engineer / vendor LLM badges.






