Open to senior platform & SRE roles

I'm Faisal Bin Basha, AI Platform , DevSecOps , DevOps Engineer

Principal-level DevOps and AI Infrastructure Engineer designing, scaling, and securing mission-critical distributed systems across cloud and on-prem environments. I build resilient platforms on Kubernetes and AWS EKS, with deep expertise in observability, CI/CD automation, and DevSecOps.

Faisal Bin Basha
Faisal Bin Basha AI Platform Engineer
16+
Years in Tech
18
Certifications
8
Companies
4
Cloud Platforms

The toolkit behind resilient platforms

A deep technical stack assembled across 16+ years — spanning cloud architecture, container orchestration, observability, automation, security, and machine learning infrastructure.

Cloud & Infrastructure

Designing scalable, resilient, cost-efficient cloud architectures across hyperscalers and on-prem.

AWS EKS AWS RDS AWS S3 AWS Lambda Azure AKS Oracle Cloud Kubernetes VMware

DevOps & CI/CD

Automation-first pipelines that make deployments rapid, reliable, and repeatable.

Jenkins Ansible Docker JFrog Artifactory SonarQube GitLab BitBucket Groovy
📊

Observability & Data

Real-time visibility into distributed systems — metrics, logs, and databases at scale.

Prometheus Grafana cAdvisor Elasticsearch Logstash Kibana MySQL 8.2 Cluster
🧠

AI / ML & Languages

Building the infrastructure that powers next-generation AI/ML workloads.

Python TensorFlow Deep Learning Kubeflow DVC C++ R JavaScript
🔒

DevSecOps & Security

Secure-by-design principles applied across the full platform lifecycle.

Aqua Security AWS Cert Manager SSL/TLS CyberArk PAM Vulnerability Scanning CEH eJPT
💬

Leadership & Languages

Leading complex incident response, mentoring teams, and communicating across cultures.

English Arabic Hindi Tamil Malayalam

Certifications & Credentials

  • AWS
    Certified Security – SpecialtyAWS · 2025
  • AWS
    Certified Machine Learning – SpecialtyAWS · 2024
  • AWS
    Certified Solutions Architect – AssociateAWS · 2020
  • AWS
    Certified Cloud PractitionerAWS
  • Azure
    Azure 400 DevOps EngineerMicrosoft · 2021
  • Azure
    Azure Solutions Architect ExpertMicrosoft · 2021
  • Azure
    Exam 535: Architecting Azure SolutionsMicrosoft · 2018
  • Kubernetes
    Certified Kubernetes AdministratorCNCF · Linux Foundation
  • Linux
    Linux Foundation Certified SysAdminLFCS · 2024
  • Oracle
    OCI Observability ProfessionalOracle · 2025
  • Oracle
    OCI Data Science ProfessionalOracle · 2025
  • Oracle
    OCI Generative AI ProfessionalOracle · 2024
  • Oracle
    OCI AI FoundationsOracle · 2025
  • Oracle
    OCI Foundations AssociateOracle · 2025
  • Coursera
    Deep Learning SpecializationCoursera · Andrew Ng
  • Coursera
    Data Science SpecializationJohns Hopkins · Coursera
  • EC-Council
    Certified Ethical HackerEC-Council
  • INE
    eJPT Penetration TesterINE
  • Scrum
    Scrum Fundamentals CertifiedSCRUM

Where I studied

Graduate study in Computer Science, Artificial Intelligence and Cyber Law — across institutions in the US, UK and India.

09/2025 · Atlanta, USA

MS in Computer Science

Georgia Institute of Technology
01/2023 — 06/2024 · London, UK

MSc in Artificial Intelligence

University of West London
2021 — 2022 · Bangalore, India

Post Graduate Degree in Cyber Law & Forensic Law

National Law School of India University
2010 — 2014 · Chennai, India

Bachelor of Computer Application

University of Madras
1994 — 2000 · Dubai, UAE

High School

Our Own English High School — 80% aggregate · Maths & Physics Olympiad distinctions

How I can help

From greenfield platform design to hardening what you already run in production — here are the engagements I take on.

01

Cloud & Platform Architecture

Multi-cluster Kubernetes on AWS EKS, Azure AKS, OCI or on-prem. Designed for availability, resilience and cost efficiency.

  • EKS / AKS cluster design
  • Hybrid & on-prem Kubernetes
  • VPC, networking & IAM
  • Capacity planning & cost review
02

Site Reliability & Observability

End-to-end telemetry stacks that surface the right signal — alerting that respects SLIs/SLOs instead of paging on noise.

  • Prometheus, Grafana, cAdvisor
  • ELK / OpenSearch logging
  • Alertmanager & SLO design
  • Incident response & RCA
03

CI/CD & Automation

Jenkins pipelines, Ansible playbooks and infrastructure-as-code that make shipping and recovery boring in the best way.

  • Jenkins pipeline engineering
  • Ansible configuration management
  • Docker & container registries
  • Release & rollback automation
04

DevSecOps & Hardening

Security as a first-class citizen — SSL/TLS lifecycle, container scanning, secrets management and continuous vulnerability posture.

  • Aqua Security & image scanning
  • AWS Certificate Manager
  • CyberArk PAM
  • Penetration testing (eJPT, CEH)
05

AI / ML Platform Engineering

Scalable, secure infrastructure for AI/ML workloads — from training pipelines to model serving and data versioning.

  • Kubeflow on EKS/AKS
  • DVC data versioning
  • GPU scheduling & autoscaling
  • Recommendation engines
06

Database Reliability

High-throughput MySQL with Group Replication — tuning, replication health and recovery from transaction storms.

  • MySQL 8.x clustered environments
  • Replication conflict resolution
  • AWS RDS design & operation
  • Performance tuning

A new book on machine learning

My first book — Machine Learning with Statistics — is a quantitative guide for engineers who want to understand the probability and inference machinery underneath every model they ship. No black boxes, no hand-waving. Available on Amazon in Kindle, paperback, and hardcover.

7 chapters 105 pages Kindle · Paperback · Hardcover
Statistics isn't an afterthought in machine learning — it is the foundation.
  • Probability foundations & distributions
  • Statistical inference & hypothesis testing
  • Bayesian reasoning under uncertainty
  • Linear models & their generalisations
  • Bias–variance & model selection
  • Bootstrapping & cross-validation
  • Probabilistic deep learning

Work I'm proud of

A selection of the most impactful platforms and systems I've built, scaled and hardened.

Kubernetes AWS EKS ANUVU

Multi-Cluster Kubernetes Platform

Architected and ran containerized infrastructure across AWS EKS and on-prem clusters for mission-critical connectivity systems in aviation and maritime — the platform inflight Wi-Fi and IFE ride on.

Prometheus Grafana ELK

End-to-End Observability Stack

Designed and operationalised a full observability platform — Prometheus for metrics, Grafana for dashboards, cAdvisor for container telemetry, ELK for logs. Alerting that mapped to real user impact, not noise.

MySQL 8.2 Group Replication HA

MySQL 8.2 Clustered Environment

Administered a production MySQL 8.2 cluster with Group Replication. Resolved replication conflicts, rolled-back transactions and storage bottlenecks while keeping write throughput high and consistency intact.

Jenkins Ansible CI/CD

Jenkins CI/CD Pipeline Ecosystem

End-to-end Jenkins pipelines with Artifactory, SonarQube and Docker integrations across Python, C/C++, NodeJS and Vue codebases. Scheduled automation for health checks, log rotation and disk management. Ansible at the config layer.

DevSecOps Aqua Security SSL/TLS

DevSecOps Hardening Programme

Rolled out secure-by-design principles across the platform — SSL/TLS lifecycle via AWS Certificate Manager, container vulnerability scanning with Aqua Security, and NGINX-delivered certificate automation through Ansible.

Machine Learning Python Azure

ML Recommendation Engine

Built a recommendation engine that drove product visibility using purchase history, cart activity, brand preference and browsing behaviour — on-site suggestions and email campaigns. Boosted data mining and automation by 45%.

Let's build something reliable together

Open to Senior SRE, Platform Engineering, DevOps/DevSecOps leadership and AI Infrastructure roles — remote, hybrid, or on-site in the UAE. Also available for consulting engagements. I typically reply within 24 hours.