Schulungsübersicht
Foundations of Agentic Systems in Production
- Agentic architectures: loops, tools, memory, and orchestration layers
- Lifecycle of agents: development, deployment, and continuous operation
- Challenges of production-scale agent management
Infrastructure and Deployment Models
- Deploying agents in containerized and cloud environments
- Scaling patterns: horizontal vs vertical scaling, concurrency, and throttling
- Multi-agent orchestration and workload balancing
Monitoring and Observability
- Key metrics: latency, success rate, memory usage, and agent call depth
- Tracing agent activity and call graphs
- Instrumenting observability using Prometheus, OpenTelemetry, and Grafana
Logging, Auditing, and Compliance
- Centralized logging and structured event collection
- Compliance and auditability in agentic workflows
- Designing audit trails and replay mechanisms for debugging
Performance Tuning and Resource Optimization
- Reducing inference overhead and optimizing agent orchestration cycles
- Model caching and lightweight embeddings for faster retrieval
- Load testing and stress scenarios for AI pipelines
Cost Control and Governance
- Understanding agent cost drivers: API calls, memory, compute, and external integrations
- Tracking agent-level costs and implementing chargeback models
- Automation policies to prevent agent sprawl and idle resource consumption
CI/CD and Rollout Strategies for Agents
- Integrating agent pipelines into CI/CD systems
- Testing, versioning, and rollback strategies for iterative agent updates
- Progressive rollouts and safe deployment mechanisms
Failure Recovery and Reliability Engineering
- Designing for fault tolerance and graceful degradation
- Retry, timeout, and circuit breaker patterns for agent reliability
- Incident response and post-mortem frameworks for AI operations
Capstone Project
- Build and deploy an agentic AI system with full monitoring and cost tracking
- Simulate load, measure performance, and optimize resource usage
- Present final architecture and monitoring dashboard to peers
Summary and Next Steps
Voraussetzungen
- Strong understanding of MLOps and production machine learning systems
- Experience with containerized deployments (Docker/Kubernetes)
- Familiarity with cloud cost optimization and observability tools
Audience
- MLOps engineers
- Site Reliability Engineers (SREs)
- Engineering managers overseeing AI infrastructure
Erfahrungsberichte (3)
Guter Mix aus Wissen und Praxis
Ion Mironescu - Facultatea S.A.I.A.P.M.
Kurs - Agentic AI for Enterprise Applications
Maschinelle Übersetzung
Die Mischung aus Theorie und Praxis sowie hoch- und niedrigstufigen Perspektiven
Ion Mironescu - Facultatea S.A.I.A.P.M.
Kurs - Autonomous Decision-Making with Agentic AI
Maschinelle Übersetzung
praktische Übungen
Daniel - Facultatea S.A.I.A.P.M.
Kurs - Agentic AI in Multi-Agent Systems
Maschinelle Übersetzung