About Me
Hi, I'm David O'Malley
I'm a passionate Data Engineer with 5+ years of experience designing and implementing data infrastructure, ETL pipelines, and analytics solutions. I specialize in transforming complex data challenges into scalable, efficient systems that drive business value.
With a background in Computer Science and a deep understanding of both software engineering and data science principles, I bridge the gap between raw data and actionable insights.
Education
MSc in Data Science
Stanford University, 2018
Location
San Francisco, California
Available for remote work
Interests
Big Data, Cloud Architecture
Machine Learning, Open Source
Technical Skills
A comprehensive set of technical skills and expertise developed through years of hands-on experience in data engineering and analytics projects.
Data Engineering
- ETL/ELT Pipelines
- Data Warehousing
- Data Modeling
- Data Governance
Databases
- PostgreSQL
- MongoDB
- Snowflake
- Redis
- Elasticsearch
Cloud Platforms
- AWS
- Google Cloud
- Azure
- Databricks
Programming
- Python
- SQL
- Scala
- Java
- Bash
Big Data
- Spark
- Hadoop
- Kafka
- Airflow
- dbt
Analytics
- Tableau
- Power BI
- Looker
- Data Visualization
DevOps
- Docker
- Kubernetes
- CI/CD
- Terraform
- Git
Data Processing
- Batch Processing
- Stream Processing
- Real-time Analytics
Machine Learning
- ML Pipelines
- Feature Engineering
- Model Deployment
Data Science
- Statistical Analysis
- Pandas
- NumPy
- Jupyter
Proficiency Levels
Featured Projects
A selection of my most impactful data engineering projects, showcasing my expertise in building scalable data solutions that drive business value.
Professional Experience
My professional journey in data engineering, showcasing a progression of roles with increasing responsibility and technical expertise.
- Lead a team of 5 data engineers in designing and implementing data pipelines processing 10TB+ daily
- Architected and deployed a cloud-based data lake on AWS using S3, Glue, and Athena, reducing data processing costs by 40%
- Implemented data quality monitoring framework using Great Expectations, reducing data incidents by 75%
- Collaborated with data science team to build ML feature pipelines that improved model performance by 30%
- Designed and implemented ETL pipelines using Apache Airflow and Spark for financial data processing
- Migrated on-premise data warehouse to Google BigQuery, improving query performance by 8x
- Built real-time data processing system using Kafka and Spark Streaming for fraud detection
- Developed data governance policies and implemented data lineage tracking
- Performed data analysis and created dashboards using Tableau for business stakeholders
- Developed SQL queries and stored procedures for data extraction and transformation
- Automated reporting processes, saving 20+ hours of manual work weekly
- Collaborated with product teams to define KPIs and implement tracking
- Assisted in developing machine learning models for predictive analytics
- Performed data cleaning and feature engineering on large datasets
- Implemented data visualization tools to communicate findings to stakeholders
- Contributed to research papers on applied machine learning techniques
Get In Touch
Have a project in mind or want to discuss potential opportunities? Feel free to reach out. I'm always open to discussing new projects, creative ideas, or opportunities to be part of your vision.