About
Thanks for visiting my page!
I specialize in building and maintaining infrastructure for financial market interaction.
Professional experience:
Developed Treasury Futures pricing engines and automated VaR ETL pipelines.
Built deep generative frameworks (GANs) to model implied volatility surfaces.
Optimized forecasting decision tree models and refactored big data SQL workflows.
Managed critical databases, created API dashboards, and designed regression pricing models.
Educational background:
Master’s in Financial Mathematics.
Bachelor’s in Management Information Systems and Finance.
Please reach out to me if you would like to connect!
Overview
This project is a fault-tolerant, containerized trading engine that integrates real-time data ingestion, processing, and machine learning for alpha generation.
Key highlights:
- High-Throughput Streaming: Engineered the system using Kafka and ClickHouse to ingest 70K+ ticks/minute on AWS with sub-second latency.
- Automated MLOps Pipeline: Architected a pipeline for real-time feature engineering and continuous model retraining to adapt to current market conditions.
- Execution Engine: Executes trades based on machine learning, generating signals with integrated risk and portfolio management for alpha generation.
- Efficient Storage: Maintains a rolling 30-day window of high-frequency data while archiving older data to the cloud.
Additionally, I took away several lessons from building this:
- Designing for constant uptime: Long-running use surfaced unexpected failures, underscoring the need for proactive resource monitoring, alerting, and graceful restarts.
- Scaling from prototype to production: Early ad hoc decisions led to painful rework (such as migrating to Docker and consolidating services), so I now design with production in mind from the outset.
- Building for modular growth: A monolithic first pass was quick to ship but became a bottleneck as the system grew, reinforcing the value of clear module boundaries and interfaces early on.
Architecture
DigitalOcean hosts the server the project runs on. It is currently supported by a $48/month droplet with 8GB RAM, 4 vCPUs, and 160GB SSD storage.
Docker serves as the containerization platform. Besides the module containers, there are containers for Kafka, Clickhouse, Grafana, Prometheus, and other miscellaneous services.
Grafana powers all dashboards for real-time monitoring, visualization, and alerting.
Binance serves as the source for cryptocurrency tick-level data, which is ingested from the exchange in real-time via a WebSocket provided through Finnhub.
Kafka captures all data from the WebSocket, where a producer publishes incoming messages to a Kafka topic managed by one broker, accessed via various consumers downstream.
ClickHouse is connected to the data stream via Kafka. New data is appended to the database via batch processing, and monitoring/diagnostics measures are periodically recorded.
AWS serves as a cold storage for old data. Data older than 7 days is converted to a parquet, deleted from ClickHouse, and uploaded to an S3 bucket.
- The machine learning module retrains five models at regular intervals with the most recent set of data, using Python libraries such as scikit-learn and TensorFlow.
- The execution module loads these ML models to generate real time signals and simulate trades. This process is supported with risk and portfolio monitoring engines.
Dashboards
Portfolio performance after this project ran for around 30 days.
Snapshot of trade executions, along with a demonstration of the risk engine setting trade limits.
List of the different ML models that are implemented in the project.
Time series of granular price ticks for ETH, one of the tickers (alongside BTC, SOL, ADA, and XRP) processed via Kafka stored in ClickHouse.
Snapshot of running memory usage by Docker container, showing that different services have varying resource demands.
Prediction Market Aggregator (PrediDesk)
Overview
Predidesk (predidesk.com) is a live prediction market aggregator that consolidates data from 5+ platforms into a single, unified interface. It solves the fragmentation problem in prediction markets by providing a centralized location for discovery and analysis.
The platform is built on modern data engineering principles, focusing on reliability, scalability, and automated intelligence.
Key technical highlights include:
- Cross-Exchange Ingestion: Architecting an automated pipeline that continuously polls and synthesizes raw data from 5+ disparate prediction market APIs into a normalized PostgreSQL schema.
- LLM-Driven Normalization: Leverages a multi-step Gemini-powered pipeline with vector embeddings to deconstruct unstructured metadata, rewrite resolutions, and cluster heterogeneous contracts.
- Arbitrage Discovery: Identifies real-time synthetic arbitrage opportunities across multiple trading platforms by leveraging normalized pricing and clustered question-answer mappings.
- Full-Stack Implementation: Features a containerized backend serving structured REST API endpoints via FastAPI and a responsive frontend optimized for real-time market discovery.
Platform Preview
The Predidesk landing page, showcasing different features of the site.
Arbitrage & Opportunities
Predidesk includes a dedicated Arbitrage Dashboard that scans for price discrepancies across multiple exchanges. By identifying inverse payout probabilities across different platforms, the system highlights guaranteed profit opportunities.
The Arbitrage view, identifying live spreads across multiple prediction market exchanges.
Writing
Check out my Substack!