M. Silchenko
Based · Singapore NUS · MSc, School of Computing (Business Analytics), Specialised in Statistics
Maksim Silchenko
Portfolio

Maksim Silchenko

I’m an MSc student in NUS School of Computing (Business Analytics), specializing in Statistics, and a winner of the Jane Street Quantitative Reasoning Competition and the National Mathematics Olympiad (top 0.015% nationally).

My work sits at the intersection of causal inference, experimental design (A/B testing), and robust machine learning systems, translating complex models into executive-level strategic decisions. At The TCM Group I built a randomised-controlled-trial framework now adopted as the firm’s standard for measuring training impact; on a 97k-order marketplace dataset, a hierarchical Bayesian model surfaced category-level effects a pooled A/B test had missed.

Selected distinctions.

01

MIPT National Mathematics Olympiad

Winner of the Russian National Mathematics Olympiad (Moscow Institute of Physics & Technology), top 0.015% nationally.

02

Jane Street Quantitative-Reasoning Puzzle

Winner, February 2026 monthly quantitative-reasoning and probability competition run by Jane Street’s research team.

03

Extracurricular Coursework

Stanford CS229 / CS230 · MIT RES.6-012 Probability · Imperial Mathematics for ML · IBM Data Science Specialization · McElreath’s Statistical Rethinking (2026).

04

Top of Cohort · Bayes Business School

Top of cohort in Quantitative Methods & Analytics, AI & Big Data, Capstone Project; ESADE Mergers & Acquisitions.

05

Full Scholarships

Full Scholarship at the National University of Singapore (Spring 2025) and ESADE Business School (Autumn 2024) exchanges.

06

BUPT Agentic AI Scholarship

Fully-funded scholarship for Agentic AI Bootcamp by the Beijing University of Posts & Telecommunications (Jun–Jul 2026).

§ 01 · Internships
Feb 2026 – Apr 2026 Internship

Orbuc Research · Quantitative Research Intern

Statistical Modeling & Time-Series Research · London, UK

  • Developed a time-series anomaly-detection model that classifies latent states with Hidden Markov Models to flag unstable, low-predictability conditions, validated across 5 out-of-sample windows where it held up in 4 of 5.
  • Built an LLM agent that automatically classifies events from live news streams, grounding each call with retrieval-augmented generation (RAG) over an embedded knowledge base, enforcing schema-constrained output, and aggregating results into a single risk score with a probabilistic layer.
  • Hardened the surrounding Python pipeline for reliability with SQL (SQLite) persistence of every event and classification, batched API calls, rate-limit handling, and an automated test suite.
  • Engineered the statistical analysis behind the firm’s March 2026 research report, characterising 9 historical anomaly periods and identifying 4 statistically significant shifts in correlation structure.
Hidden Markov Models Anomaly Detection LLM Agent RAG SQLite Time-Series
Jan 2026 – Apr 2026 Internship

The TCM Group · Data & Analytics Consultant

Bayes Business School Capstone · Highest grade in cohort · London, UK

  • Designed a causal-inference framework (RCT-style wait-list control, paired hypothesis tests) to measure leadership-training outcomes, presented it to TCM’s senior leadership, and saw it adopted as the standard for ongoing programme evaluation.
  • Replaced multiple-choice surveys that correlated poorly with on-the-job behaviour by engineering a serverless LLM scoring engine (TypeScript, Vercel, OpenAI) that grades open-ended scenario responses 0 to 10 against rubrics, deployed with Zod schema validation and automatic retry so malformed model output is never returned.
  • Built the hypothesis-testing methodology for evaluation: a decision rule that picks the correct paired test by sample size and distribution (Wilcoxon signed-rank, paired t-test etc.), reports effect sizes with CIs, and tests against a meaningful-change threshold rather than zero.
  • Built an automated evaluation harness that benchmarks the scorer against a hand-labelled gold dataset (0.90 Spearman correlation with human raters), then halved its scoring error with a cross-validated calibration that corrects the model’s systematic bias.
Causal Inference RCT Design Hypothesis Testing LLM Rubric Scoring LLM-as-Judge Eval Calibration
Oct 2025 – Jan 2026 Internship

Oxford Comma Advisory · Data Analyst Intern

Education & Admissions Consultancy · London, UK

  • Cut university-shortlisting time from 2 hours to 3 minutes per student by achieving a 73% consultant approval rate on top-5 recommendations through a two-tower neural network trained on sparse implicit feedback with confidence-weighted negative sampling.
  • Found organic leads convert about 60% faster than paid channels and that early consultations carry a 2.3x hazard ratio, using Kaplan-Meier curves and a Cox proportional-hazards model on 800+ inquiries (non-converters right-censored).
  • Raised booking rates from 18% to 27% by training an XGBoost model (0.79 AUC) to predict which leads were most likely to book, then A/B-testing tailored follow-ups against the standard process.
Recommendation Systems Two-Tower NN Survival Analysis XGBoost A/B Testing
§ 02 · Projects
2026 Project

Olist Hierarchical Bayesian A/B Testing

Brazilian e-commerce panel · 97k orders · hierarchical Bayesian causal inference in PyMC 5

  • Reframed a flat marketplace A/B test as a causal-identification problem on a 97k-order panel, using a hierarchical Bayesian difference-in-differences in PyMC that corrected the policy effect from a naive −2pp to a DiD-identified +1.5pp.
  • Built the full analytics pipeline behind it (DuckDB medallion SQL, a NetworkX causal DAG, three Bayesian models, falsification tests) and translated the posterior into a costed −R$452K net envelope, turning an apparent per-customer win into a no-launch call.
  • Architected the analysis as a clean, importable Python package (ETL, feature builders, causal DAG, separate model factories per likelihood) with a DuckDB feature stack that rebuilds end-to-end in roughly five seconds.
  • Validated the models with held-out cross-validation (PSIS-LOO), posterior-predictive checks, and prior-sensitivity sweeps. The headline result shifted just 0.0004 logit across three hyperprior families.
PyMC 5 Hierarchical Bayes Difference-in-Differences DuckDB PSIS-LOO
2026 Project

A/B-Test Experimentation Guardrail

Experiment-safety auditing tool · SRM detection & causal inference · optional Claude tool-use agent

  • Built a command-line A/B-test guardrail that checks whether an experiment is safe to interpret, using a chi-square Sample Ratio Mismatch test that flagged a compromised 61/39 allocation a standard metric t-test runs straight past.
  • Separated genuine treatment effects from confounding with propensity score matching and Rosenbaum sensitivity bounds to return a clear launch verdict; validated on a 300k-row stratified sample of Criteo’s real 13.98M-row Uplift experiment.
  • Built an optional Anthropic Claude tool-use agent mode: a bounded multi-turn loop that chooses which of 3 schema-defined tools to call next, with the LLM held off the numerical path so every figure comes from scipy and scikit-learn.
  • Designed the default pipeline to route deterministically and call the LLM once, at the end, narrating a code-decided verdict as a plain-English summary; one model call per audit, the statistical core covered by a 36-case pytest suite in CI (Python 3.10 to 3.12).
  • Engineered the tool as a production-shaped Python package (typed dataclasses, a custom exception hierarchy, an installable CLI, ruff-linted CI) with a data loader hardened against messy real-world exports; validating against the 14M-row Criteo dataset surfaced three production bugs, each fixed and locked down with a regression test.
SRM Detection Propensity Score Matching Claude Tool-Use Agent Criteo Uplift pytest + CI
2025 Project

Cost-Sensitive Churn Pipeline

Energy-utility SME churn · 14,606 customers · cost-sensitive, decision-aware modelling

  • Built a SMOTE-balanced Random Forest on 14,606 SME utility customers, tuning the decision threshold against a GBP cost matrix to cut expected misclassification cost by ~£15.9M on a sealed test fold.
  • Pressure-tested the headline savings by sweeping the three cost assumptions behind it (customer lifetime value, campaign cost, retention rate), reporting the £15.9M with the conditions it depends on instead of as a standalone claim.
  • Stress-tested the model for time decay by splitting customers on contract activation date and re-scoring the newest cohort, exposing a discrimination drop (test AUC ~0.67 to ~0.62) that established periodic retraining as a deployment precondition rather than an afterthought.
  • Turned the churn score into a retention action list by running permutation feature importance on the frozen test pipeline, isolating profit-margin variables as the dominant drivers so the business knew which accounts to prioritise and why.
  • Added a Random Survival Forest for time-to-churn (held-out concordance 0.71, vs 0.56 for the Cox model it replaced), validated it on a temporal holdout, and shipped the pipeline as a typed Python library with a 48-test CI suite.
SMOTE Cost-Sensitive Thresholding Random Survival Forest Permutation Importance CI Suite
2025 Project

Rare-Event Prediction & Reinforcement Learning for Sequential Allocation

Sovereign-default prediction · 34-year cross-country macro panel · 5-model benchmark + PPO from scratch

  • Implemented PPO with Generalised Advantage Estimation from scratch in TensorFlow/Keras over a 117-dimensional continuous action space; the learned policy beat a uniform-allocation baseline by ~13% in the deterministic setting and 5-9% under stochastic and contagion variants.
  • Built a rare-event classification pipeline on a 34-year cross-country macro panel: 117 entities, 21 features, a ~2.4% positive rate, with 88 hand-curated default events labelled against multiple authoritative sources.
  • Benchmarked five models on a temporal 1990-2014 / 2015-2023 split: logistic regression, random forest, gradient boosting, XGBoost, and a Two-Tower neural network (a dual-embedding architecture from large-scale recommender systems). Random forest gave the best discrimination (AUC 0.83); the Two-Tower underperformed (AUC 0.68), which I report and diagnose rather than hide.
  • Built a perturbation-based sensitivity harness that exposed a policy under-responsiveness in the trained agent, then implemented Welford running z-score normalisation over the 1,878-dimensional continuous state as the candidate fix.
  • Packaged the work to production engineering standards: a typed src/ package, a 47-test pytest suite, and a GitHub Actions CI pipeline running a Python 3.10/3.11 test matrix and notebook validation.
Rare-Event Prediction Model Benchmark Two-Tower NN PPO + GAE TensorFlow / Keras
2025 Project

Credit Risk Classifier & MLOps Pipeline

Retail credit-risk modelling · probability-of-default under an asymmetric cost matrix · Q-learning

  • Built a probability-of-default model under an asymmetric cost matrix, lifting held-out profit by $363K over the default 0.5 threshold.
  • Architected the analytics codebase as an importable Python package with single-responsibility modules (data loading, threshold evaluation, calibration, RL environment), backed by an 82-test suite and a multi-version CI matrix.
  • Implemented tabular Q-learning for a finite-horizon inventory-optimisation MDP (deterministic seed, linear epsilon decay, illegal-action handling), and built the chronological 24/24-month evaluation that caught its in-sample edge as an overfit.
  • Implemented covariate-shift mitigations (importance weighting, cohort-adaptive thresholding) and a diagnostic that identified the cohort failure as a base-rate problem rather than covariate shift, recommending a 45% break-even repricing instead of a model retune.
Probability of Default Cost-Sensitive Classification Q-Learning Covariate Shift pytest + CI
§ 03 · Education
Jul 2026 – 2027 (Expected)
Singapore

National University of Singapore (NUS)

MSc in NUS School of Computing (Business Analytics), Specialised in Statistics

Jun – Jul 2026
Beijing, China

Beijing University of Posts & Telecommunications (BUPT)

Agentic AI Bootcamp · Fully-Funded Scholarship

Sep 2022 – Jun 2026
London, UK

Bayes Business School (City, University of London)

BSc International Business (Hons), Specialised in AI and Quantitative Methods; First-Class Honours (Highest Distinction)

Top of cohort in Quantitative Methods & Analytics, AI & Big Data, Capstone Project, and ESADE Mergers & Acquisitions.

  • NUS exchange, Singapore (Jan–Jun 2025) · Full Scholarship
  • ESADE Business School exchange, Barcelona, Spain (Aug–Dec 2024) · Full Scholarship
  • UT Austin exchange, Austin, USA (Aug–Dec 2023)

Extracurricular Quantitative Coursework: Stanford CS229 Machine Learning, Stanford CS230 Deep Learning, MIT RES.6-012 Introduction to Probability, Stanford EE178 Probabilistic Systems Analysis, Imperial College Mathematics for Machine Learning, IBM Applied Data Science Specialization (Databases & SQL, Visualization, Python), Statistical Rethinking 2026 (R. McElreath, Max Planck).

Sep 2021 – Jun 2022
London, UK

UCL Economics & Mathematics Foundation Programme

Final grade A* · Mathematics 87%, Highest Distinction

§ 04 · Skills

Statistical Methods

Causal Inference (DiD, RCT design, causal DAGs, wait-list controlled trials, synthetic controls, propensity-score matching, Rosenbaum sensitivity bounds, uplift modelling), A/B Testing & Experimentation (variant design, power analysis, SRM detection, CUPED variance reduction, multiple-testing correction), Bayesian Statistics (PyMC 5, NUTS, hierarchical models, posterior-predictive checks, PSIS-LOO), Survival Analysis (Cox PH, Kaplan-Meier, Random Survival Forest), Hypothesis Testing (t-test, Wilcoxon signed-rank, Mann-Whitney U, chi-square, Fisher z), Stochastic Processes & Sequential Modeling (Hidden Markov Models, time-series, state-space methods), SHAP, permutation importance, Brier, isotonic and LOOCV calibration, bootstrap & Hodges-Lehmann confidence intervals, Effect-size estimation (Cohen’s d, rank-biserial).

Machine Learning, Deep Learning & GenAI

Supervised/Unsupervised Learning, Gradient Boosting (XGBoost, LightGBM), Anomaly & Rare-Event Detection, Neural Networks (CNNs, RNNs, LSTMs, Transformers, Two-Tower), NLP, Computer Vision, Recommendation Systems (implicit feedback, negative sampling, embeddings & vector search), Reinforcement Learning (Q-learning, DQN, PPO + GAE, Actor-Critic), Probabilistic Graphical Models, LLMs, Prompt Engineering, RAG, OpenAI API, Anthropic Claude tool-use, Zod structured outputs, LLM-as-judge evaluation, LLM Fine-Tuning, RLHF, Agentic AI (LangChain, LlamaIndex), Generative AI Applications.

Programming & Databases

Python (NumPy, pandas, scikit-learn, TensorFlow, PyTorch, statsmodels, SciPy, XGBoost, LightGBM, PyMC), TypeScript, C++, R, SQL, BigQuery, Spark/PySpark, DuckDB, React 19, Vite, Vercel Serverless, Supabase (Postgres).

MLOps, Cloud & Visualisation

Google Cloud Platform (BigQuery, Vertex AI), Docker, Kubernetes, MLOps (CI/CD, Model Deployment, Feature Pipelines, ETL/Airflow), GitHub Actions, pytest, Vitest, ruff, Git/GitHub, Jupyter, LaTeX, Tableau, Power BI, matplotlib, seaborn, Plotly.

Spoken Languages

English (Fluent), Russian (Native), Ukrainian (Native), Belarusian (Fluent), Spanish (Professional Working).

Extracurricular Coursework

Stanford CS229 Machine Learning, Stanford CS230 Deep Learning, MIT RES.6-012 Introduction to Probability, Stanford EE178 Probabilistic Systems Analysis, Imperial College Mathematics for Machine Learning, IBM Applied Data Science Specialization, Statistical Rethinking 2026 (R. McElreath, Max Planck).