This is a performance testing framework for Spark SQL in Apache Spark 2.2+. The framework contains twelve benchmarks that can be executed in local mode. They are organized into three classes and ...
An end-to-end data pipeline for e-commerce events using AWS S3 and Snowflake. It covers JSON ingestion, raw data storage with VARIANT, data quality checks in staging, separation of clean and reject ...