Foundational ETL and BI Capabilities

Enterprise Data Engineering with Google Dataform

GitHub Repository BI Dashboard

Executive Summary

This project solves the critical business problem of data silos and delayed insights by implementing a modern, cloud-native data pipeline that transforms raw e-commerce data into actionable business intelligence. Built with Google Dataform, BigQuery, Cloud Storage and Looker Studio, it delivers:

The Need Addressed

Before this implementation, the e-commerce company The Look E-commerce faced:

Technology Stack Rationale

Product Description Decision Factor
BigQuery Serverless data warehouse for storing and querying structured data at petabyte scale using SQL. Performative, Scalable, Cost-efficient, Seamless integration, Supports analytics
Dataform Data transformation and orchestration tool built for SQL-based pipelines in BigQuery. Git version-control, data lineage clarity, Data Quality Checks & unit testing, Works directly in BigQuery
Looker Studio Business intelligence (BI) and visualization tool that connects directly to BigQuery. Seamless BigQuery integration, real-time updates
Cloud Storage Secure, scalable object storage for raw and semi-processed data. Seamless integration with BigQuery, cost-efficient

Data Model

erDiagram
    trx_fct {
        INTEGER order_id PK
        INTEGER product_id PK
        INTEGER inventory_item_id PK
        INTEGER user_id
        STRING order_status
        DATE date
        STRING product_brand
        STRING product_category
        INTEGER user_age
        STRING user_gender
        STRING user_country
        STRING iso_abbreviation
        NUMERIC sale_price
        NUMERIC product_cost
        NUMERIC gross_profit
        TIMESTAMP processing_timestamp
    }
    inventory_dim {
        INTEGER inventory_item_id PK
        DATE date_created
        DATE date_sold
        TIMESTAMP processing_timestamp
    }
    events_fct {
        INTEGER event_id PK
        INTEGER user_id
        INTEGER sequence_number
        INTEGER session_id
        DATE date_created
        STRING traffic_source
        STRING uri
        STRING event_type
        TIMESTAMP processing_timestamp
    }
    orders_dim {
        INTEGER order_id PK
        STRING status
        DATE date_created
        DATE date_returned
        DATE date_shipped
        DATE date_delivered
        INTEGER num_of_item
        TIME order_processing_time
        TIME order_transit_time
        TIME order_lead_time
        TIMESTAMP processing_timestamp
    }
    products_dim {
        INTEGER product_id PK
        NUMERIC cost
        STRING category
        STRING brand
        NUMERIC retail_price
        NUMERIC markup
        STRING department
        TIMESTAMP processing_timestamp
    }
    users_dim {
        INTEGER user_id PK
        INTEGER age
        STRING gender
        STRING country
        STRING traffic_source
        DATE date_created
        STRING iso_abbreviation
        TIMESTAMP processing_timestamp
    }
    trx_fct ||--|| orders_dim : "order_id"
    trx_fct ||--|| products_dim : "product_id"
    trx_fct ||--|| inventory_dim : "inventory_item_id"
    trx_fct ||--|| users_dim : "user_id"
    events_fct ||--|| users_dim : "user_id"
  

Total Cost of Ownership (TCO)

Based on current consumption, this project costs an estimated $13 per month.