AI Application Architect
OKX
Software Engineering, IT, Data Science
Singapore
Posted on Jun 2, 2026
OKX will be prioritising applicants who have a current right to work in Singapore, and do not require OKX's sponsorship of a visa.
Who We Are
At OKX, we believe that the future will be reshaped by crypto, and ultimately contribute to every individual's freedom. OKX is a leading crypto exchange, and the developer of OKX Wallet, giving millions access to crypto trading and decentralized crypto applications (dApps). OKX is also a trusted brand by hundreds of large institutions seeking access to crypto markets. We are safe and reliable, backed by our Proof of Reserves. Across our multiple offices globally, we are united by our core principles: We Before Me, Do the Right Thing, and Get Things Done. These shared values drive our culture, shape our processes, and foster a friendly, rewarding, and diverse environment for every OK-er. OKX is part of OKG, a group that brings the value of Blockchain to users around the world, through our leading products OKX, OKX Wallet, OKLink and more.
About The Team
The SRE team is dedicated to deeply integrating large language models (LLMs), AI Agents, and engineering platform capabilities to build an intelligent application system for R&D, operations, stability, and business scenarios. By creating an AI application architecture that is observable, evaluable, governable, and continuously evolving, the team is driving the company's shift from "tool-assisted" to "intelligent collaboration," improving R&D efficiency, system stability, fault diagnosis efficiency, and the quality of business decisions.
What You’ll Be Doing
- Design and build AI Harness capabilities for SRE / DevOps scenarios, including fault detection, change analysis, capacity risk identification, automated inspection, drill evaluation, and recovery recommendations.
- Drive the development of an automated RCA (Root Cause Analysis) system, combining logs, metrics, distributed tracing, events, changes, topology, and other data to achieve root cause analysis, impact scope assessment, and post-incident review support.
- Build AIOps platform capabilities, including intelligent alert noise reduction, anomaly detection, event correlation, trend prediction, fault attribution, and automated closed-loop remediation.
- Collaborate with R&D, SRE, platform, data, and business teams to embed AI capabilities into Code Review, CI/CD, GitOps, DevOps, incident response, and stability governance processes.
What We Look For In You
- Bachelor's degree or above in Computer Science or a related field, with 8+ years of experience in R&D, architecture, or platform engineering; experience building AI applications, SRE, AIOps, or DevOps platforms is preferred.
- Strong software architecture skills, familiar with microservices architecture, distributed systems, high-availability design, service governance, observability, and platform engineering.
- Familiar with LLM application development; understanding of core technologies such as LLM, RAG, Embedding, vector databases, Agents, Function Calling / Tool Calling, and Prompt Engineering. Understanding of the production challenges of AI applications, including hallucination control, result evaluation, permission boundaries, data security, cost control, observability, and failure fallback mechanisms.
- Experience delivering AI Agent or intelligent assistant products, able to design complex task decomposition, multi-tool invocation, multi-turn reasoning, context management, and human-machine collaboration workflows.
- Familiar with RCA or AIOps capability development, including log analysis, metric anomaly detection, distributed tracing, event correlation, alert noise reduction, topology analysis, and root cause localization.
- Proficient in at least one mainstream development language, such as Java, Python, Go, or TypeScript, with strong engineering implementation and system design skills.
- Familiar with cloud-native technology stacks and common middleware, such as Kubernetes, Docker, Kafka, Redis, MySQL, Elasticsearch, Prometheus, Grafana, OpenTelemetry, etc.
- Strong complex problem analysis skills and holistic architectural thinking, able to drive problem-solving from business, platform, process, and organizational collaboration perspectives.
- Ability to communicate in both Chinese and English is preferred as the role requires collaborating with cross-region stakeholders
Perks & Benefits
- Competitive total compensation package
- L&D programs and Education subsidy for employees' growth and development
- Various team building programs and company events
- Wellness and meal allowances
- Comprehensive healthcare schemes for employees and dependents
- More that we love to tell you along the process!
Notice:
All official OKX vacancies are published on this website. While roles may appear on selected third-party platforms from time to time, information on other sites may be inaccurate or outdated. If in doubt, please apply directly through our official careers website.
Information collected and processed as part of the recruitment process of any job application you choose to submit is subject to OKX's Candidate Privacy Notice.