Based · Singapore NUS · MSc, School of Computing (Business Analytics), Specialised in Statistics

Portfolio

Maksim Silchenko

I’m an MSc student at NUS School of Computing (Business Analytics), specializing in Statistics. BlueDot AI Safety Grant recipient for chain-of-thought interpretability (Jun ’26), Jane Street Quantitative Reasoning Competition winner (Feb & Jun ’26), and National Mathematics Olympiad winner (top 0.015% nationally).

My work sits at the intersection of causal inference and production machine learning systems, translating complex models into strategic decisions. My past experience includes quantitative research at Orbuc, a London-based crypto startup, where I built statistical-arbitrage strategies and an LLM news agent; and a data & analytics consultancy at The TCM Group, where I introduced statistical methods for evaluating training outcomes and built an LLM agent to score and automate that evaluation. Two recent projects: an A/B-test guardrail that decides whether an experiment is safe to act on, and a hierarchical Bayesian difference-in-differences on Olist’s Brazilian e-commerce data that reversed a free-shipping policy’s apparent effect once confounding was removed.

Funded AI-Safety Research (ongoing) · BlueDot Impact Rapid Grant

Does a language model’s chain-of-thought actually cause its answer?

A Bayesian causal-mediation test of whether an LLM’s stated reasoning truly drives its answer or is just post-hoc, with calibrated uncertainty for scalable oversight.

Explore the funded AI-safety research →

Selected distinctions.

BlueDot Impact Rapid Grant · AI Safety

Awarded a BlueDot Impact Rapid Grant (Jun ’26) to fund Bayesian Causal Faithfulness for LLM Chain-of-Thought: calibrated uncertainty for whether a model’s reasoning is faithful or post-hoc.

MIPT National Mathematics Olympiad

Winner of the Russian National Mathematics Olympiad (Moscow Institute of Physics & Technology), top 0.015% nationally.

Jane Street Quantitative-Reasoning Puzzle

Winner, February 2026 and June 2026 monthly quantitative-reasoning and probability competition run by Jane Street’s research team.

Extracurricular Coursework

Stanford CS229 / CS230 · MIT RES.6-012 Probability · Imperial Mathematics for ML · IBM Data Science Specialization · McElreath’s Statistical Rethinking (2026).

Top of Cohort · Bayes Business School

Top of cohort in Quantitative Methods & Analytics, AI & Big Data, Capstone Project; ESADE Mergers & Acquisitions.

Full Scholarships

Full Scholarships at the National University of Singapore (Spring 2025) and ESADE Business School (Autumn 2024) exchanges, plus a fully-funded scholarship for the Beijing University of Posts & Telecommunications Agentic AI Bootcamp (Jun–Jul 2026).

§ 01 · Internships

Feb 2026 – Apr 2026 Internship

Orbuc Research · Quantitative Research Intern

Statistical Modeling & Time-Series Research · London, UK

Developed a time-series anomaly-detection model that classifies latent states with Hidden Markov Models to flag unstable, low-predictability conditions, validated across 5 out-of-sample windows where it held up in 4 of 5.
Built an LLM agent that automatically classifies events from live news streams, grounding each call with retrieval-augmented generation (RAG) over an embedded knowledge base, enforcing schema-constrained output, and aggregating results into a single risk score with a probabilistic layer.
Hardened the surrounding Python pipeline for reliability with SQL (SQLite) persistence of every event and classification, batched API calls, rate-limit handling, and an automated test suite.
Engineered the statistical analysis behind the firm’s March 2026 research report, characterising 9 historical anomaly periods and identifying 4 statistically significant shifts in correlation structure.

Hidden Markov Models Anomaly Detection LLM Agent RAG SQLite Time-Series

Catalyst Agent Snapshot ↗

Jan 2026 – Apr 2026 Internship

The TCM Group · Data & Analytics Consultant

Bayes Business School Capstone · Highest grade in cohort · London, UK

Designed a causal-inference framework (RCT-style wait-list control, paired hypothesis tests) to measure leadership-training outcomes, presented it to TCM’s senior leadership, and saw it adopted as the standard for ongoing programme evaluation.
Replaced multiple-choice surveys that correlated poorly with on-the-job behaviour by engineering a serverless LLM scoring engine (TypeScript, Vercel, OpenAI) that grades open-ended scenario responses 0 to 10 against rubrics, deployed with Zod schema validation and automatic retry so malformed model output is never returned.
Built the hypothesis-testing methodology for evaluation: a decision rule that picks the correct paired test by sample size and distribution (Wilcoxon signed-rank, paired t-test etc.), reports effect sizes with CIs, and tests against a meaningful-change threshold rather than zero.
Built an automated evaluation harness that benchmarks the scorer against a hand-labelled gold dataset (0.90 Spearman correlation with human raters), then halved its scoring error with a cross-validated calibration that corrects the model’s systematic bias.

Causal Inference RCT Design Hypothesis Testing LLM Rubric Scoring LLM-as-Judge Eval Calibration

Website + LLM Prototype ↗ GitHub repo ↗

Oct 2025 – Jan 2026 Internship

Oxford Comma Advisory · Data Analyst Intern

Education & Admissions Consultancy · London, UK

Cut university-shortlisting time from 2 hours to 3 minutes per student by achieving a 73% consultant approval rate on top-5 recommendations through a two-tower neural network trained on sparse implicit feedback with confidence-weighted negative sampling.
Found organic leads convert about 60% faster than paid channels and that early consultations carry a 2.3x hazard ratio, using Kaplan-Meier curves and a Cox proportional-hazards model on 800+ inquiries (non-converters right-censored).
Raised booking rates from 18% to 27% by training an XGBoost model (0.79 AUC) to predict which leads were most likely to book, then A/B-testing tailored follow-ups against the standard process.

Recommendation Systems Two-Tower NN Survival Analysis XGBoost A/B Testing

§ 02 · Projects

2026 Project

Olist Hierarchical Bayesian A/B Testing

Brazilian e-commerce panel · 97k orders · hierarchical Bayesian causal inference in PyMC 5

Reframed a flat marketplace A/B test as a causal-identification problem on a 97k-order panel, using a hierarchical Bayesian difference-in-differences in PyMC that corrected the policy effect from a naive −2pp to a DiD-identified +1.5pp.
Built the full analytics pipeline behind it (DuckDB medallion SQL, a NetworkX causal DAG, three Bayesian models, falsification tests) and translated the posterior into a costed −R$452K net envelope, turning an apparent per-customer win into a no-launch call.
Architected the analysis as a clean, importable Python package (ETL, feature builders, causal DAG, separate model factories per likelihood) with a DuckDB feature stack that rebuilds end-to-end in roughly five seconds.
Validated the models with held-out cross-validation (PSIS-LOO), posterior-predictive checks, and prior-sensitivity sweeps. The headline result shifted just 0.0004 logit across three hyperprior families.

PyMC 5 Hierarchical Bayes Difference-in-Differences DuckDB PSIS-LOO

Read case study → GitHub repo ↗

2026 Project

A/B-Test Experimentation Guardrail

Experiment-safety auditing tool · SRM detection & causal inference · optional Claude tool-use agent

Built a command-line A/B-test guardrail that checks whether an experiment is safe to interpret, using a chi-square Sample Ratio Mismatch test that flagged a compromised 61/39 allocation a standard metric t-test runs straight past.
Separated genuine treatment effects from confounding with propensity score matching and Rosenbaum sensitivity bounds to return a clear launch verdict; validated on a 300k-row stratified sample of Criteo’s real 13.98M-row Uplift experiment.
Built an optional Anthropic Claude tool-use agent mode: a bounded multi-turn loop that chooses which of 3 schema-defined tools to call next, with the LLM held off the numerical path so every figure comes from scipy and scikit-learn.
Designed the default pipeline to route deterministically and call the LLM once, at the end, narrating a code-decided verdict as a plain-English summary; one model call per audit, the statistical core covered by a 36-case pytest suite in CI (Python 3.10 to 3.12).
Engineered the tool as a production-shaped Python package (typed dataclasses, a custom exception hierarchy, an installable CLI, ruff-linted CI) with a data loader hardened against messy real-world exports; validating against the 14M-row Criteo dataset surfaced three production bugs, each fixed and locked down with a regression test.

SRM Detection Propensity Score Matching Claude Tool-Use Agent Criteo Uplift pytest + CI

Read case study → GitHub repo ↗

2025 Project

Cost-Sensitive Churn Pipeline

Energy-utility SME churn · 14,606 customers · cost-sensitive, decision-aware modelling

Built a SMOTE-balanced Random Forest on 14,606 SME utility customers, tuning the decision threshold against a GBP cost matrix to cut expected misclassification cost by ~£15.9M on a sealed test fold.
Pressure-tested the headline savings by sweeping the three cost assumptions behind it (customer lifetime value, campaign cost, retention rate), reporting the £15.9M with the conditions it depends on instead of as a standalone claim.
Stress-tested the model for time decay by splitting customers on contract activation date and re-scoring the newest cohort, exposing a discrimination drop (test AUC ~0.67 to ~0.62) that established periodic retraining as a deployment precondition rather than an afterthought.
Turned the churn score into a retention action list by running permutation feature importance on the frozen test pipeline, isolating profit-margin variables as the dominant drivers so the business knew which accounts to prioritise and why.
Added a Random Survival Forest for time-to-churn (held-out concordance 0.71, vs 0.56 for the Cox model it replaced), validated it on a temporal holdout, and shipped the pipeline as a typed Python library with a 48-test CI suite.

SMOTE Cost-Sensitive Thresholding Random Survival Forest Permutation Importance CI Suite

Read case study → GitHub repo ↗

2025 Project

Rare-Event Prediction & Reinforcement Learning for Sequential Allocation

Sovereign-default prediction · 34-year cross-country macro panel · 5-model benchmark + PPO from scratch

Implemented PPO with Generalised Advantage Estimation from scratch in TensorFlow/Keras over a 117-dimensional continuous action space; the learned policy beat a uniform-allocation baseline by ~13% in the deterministic setting and 5-9% under stochastic and contagion variants.
Built a rare-event classification pipeline on a 34-year cross-country macro panel: 117 entities, 21 features, a ~2.4% positive rate, with 88 hand-curated default events labelled against multiple authoritative sources.
Benchmarked five models on a temporal 1990-2014 / 2015-2023 split: logistic regression, random forest, gradient boosting, XGBoost, and a Two-Tower neural network (a dual-embedding architecture from large-scale recommender systems). Random forest gave the best discrimination (AUC 0.83); the Two-Tower underperformed (AUC 0.68), which I report and diagnose rather than hide.
Built a perturbation-based sensitivity harness that exposed a policy under-responsiveness in the trained agent, then implemented Welford running z-score normalisation over the 1,878-dimensional continuous state as the candidate fix.
Packaged the work to production engineering standards: a typed src/ package, a 47-test pytest suite, and a GitHub Actions CI pipeline running a Python 3.10/3.11 test matrix and notebook validation.

Rare-Event Prediction Model Benchmark Two-Tower NN PPO + GAE TensorFlow / Keras

Read case study → GitHub repo ↗

2025 Project

Credit Risk Classifier & MLOps Pipeline

Retail credit-risk modelling · probability-of-default under an asymmetric cost matrix · Q-learning

Built a probability-of-default model under an asymmetric cost matrix, lifting held-out profit by $363K over the default 0.5 threshold.
Architected the analytics codebase as an importable Python package with single-responsibility modules (data loading, threshold evaluation, calibration, RL environment), backed by an 82-test suite and a multi-version CI matrix.
Implemented tabular Q-learning for a finite-horizon inventory-optimisation MDP (deterministic seed, linear epsilon decay, illegal-action handling), and built the chronological 24/24-month evaluation that caught its in-sample edge as an overfit.
Implemented covariate-shift mitigations (importance weighting, cohort-adaptive thresholding) and a diagnostic that identified the cohort failure as a base-rate problem rather than covariate shift, recommending a 45% break-even repricing instead of a model retune.

Probability of Default Cost-Sensitive Classification Q-Learning Covariate Shift pytest + CI

Read case study → GitHub repo ↗

§ 03 · Blog

2026 · ~16 min read Essay

Lamps in the Dark

A geometric reading of Hidden Markov Models & the EM algorithm

When I was learning hidden Markov models, I couldn’t find an explanation that really showed how they work. This piece builds a geometric, intuitive picture of hidden Markov models, along with the EM steps, forward-backward, Baum-Welch, and the Viterbi algorithm, and even how they link to PCA. Interactive diagrams and animations throughout aim to make the picture stick in your head.

Hidden Markov Models EM Algorithm Baum-Welch Forward-Backward Viterbi

Read the essay →

§ 04 · Education

Jul 2026 – 2027 (Expected)
Singapore

National University of Singapore (NUS)

MSc in NUS School of Computing (Business Analytics), Specialised in Statistics

Jun – Jul 2026
Beijing, China

Beijing University of Posts & Telecommunications (BUPT)

Agentic AI Bootcamp · Fully-Funded Scholarship

Sep 2022 – Jun 2026
London, UK

Bayes Business School (City, University of London)

BSc International Business (Hons), Specialised in AI and Quantitative Methods; First-Class Honours (Highest Distinction)

Top of cohort in Quantitative Methods & Analytics, AI & Big Data, Capstone Project, and ESADE Mergers & Acquisitions.

NUS exchange, Singapore (Jan–Jun 2025) · Full Scholarship
ESADE Business School exchange, Barcelona, Spain (Aug–Dec 2024) · Full Scholarship
UT Austin exchange, Austin, USA (Aug–Dec 2023)

Extracurricular Quantitative Coursework: Stanford CS229 Machine Learning, Stanford CS230 Deep Learning, MIT RES.6-012 Introduction to Probability, Stanford EE178 Probabilistic Systems Analysis, Imperial College Mathematics for Machine Learning, IBM Applied Data Science Specialization (Databases & SQL, Visualization, Python), Statistical Rethinking 2026 (R. McElreath, Max Planck).

Sep 2021 – Jun 2022
London, UK

UCL Economics & Mathematics Foundation Programme

Final grade A* · Mathematics 87%, Highest Distinction

§ 05 · Skills

Statistical Methods

Causal Inference (DiD, RCT design, causal DAGs, wait-list controlled trials, synthetic controls, propensity-score matching, Rosenbaum sensitivity bounds, uplift modelling), A/B Testing & Experimentation (variant design, power analysis, SRM detection, CUPED variance reduction, multiple-testing correction), Bayesian Statistics (PyMC 5, NUTS, hierarchical models, posterior-predictive checks, PSIS-LOO), Survival Analysis (Cox PH, Kaplan-Meier, Random Survival Forest), Hypothesis Testing (t-test, Wilcoxon signed-rank, Mann-Whitney U, chi-square, Fisher z), Stochastic Processes & Sequential Modeling (Hidden Markov Models, time-series, state-space methods), SHAP, permutation importance, Brier, isotonic and LOOCV calibration, bootstrap & Hodges-Lehmann confidence intervals, Effect-size estimation (Cohen’s d, rank-biserial).

Machine Learning, Deep Learning & GenAI

Supervised/Unsupervised Learning, Gradient Boosting (XGBoost, LightGBM), Anomaly & Rare-Event Detection, Neural Networks (CNNs, RNNs, LSTMs, Transformers, Two-Tower), NLP, Computer Vision, Recommendation Systems (implicit feedback, negative sampling, embeddings & vector search), Reinforcement Learning (Q-learning, DQN, PPO + GAE, Actor-Critic), Probabilistic Graphical Models, LLMs, Prompt Engineering, RAG, OpenAI API, Anthropic Claude tool-use, Zod structured outputs, LLM-as-judge evaluation, LLM Fine-Tuning, RLHF, Agentic AI (LangChain, LlamaIndex), Generative AI Applications.

Programming & Databases

Python (NumPy, pandas, scikit-learn, TensorFlow, PyTorch, statsmodels, SciPy, XGBoost, LightGBM, PyMC), TypeScript, C++, R, SQL, BigQuery, Spark/PySpark, DuckDB, React 19, Vite, Vercel Serverless, Supabase (Postgres).

MLOps, Cloud & Visualisation

Google Cloud Platform (BigQuery, Vertex AI), Docker, Kubernetes, MLOps (CI/CD, Model Deployment, Feature Pipelines, ETL/Airflow), GitHub Actions, pytest, Vitest, ruff, Git/GitHub, Jupyter, LaTeX, Tableau, Power BI, matplotlib, seaborn, Plotly.

Spoken Languages

English (Fluent), Russian (Native), Ukrainian (Native), Belarusian (Fluent), Spanish (Professional Working).

Extracurricular Coursework

Stanford CS229 Machine Learning, Stanford CS230 Deep Learning, MIT RES.6-012 Introduction to Probability, Stanford EE178 Probabilistic Systems Analysis, Imperial College Mathematics for Machine Learning, IBM Applied Data Science Specialization, Statistical Rethinking 2026 (R. McElreath, Max Planck).