Enterprise Data Engineering with Google Dataform
This project solves the critical business problem of data silos and delayed insights by implementing a modern, cloud-native data pipeline that transforms raw e-commerce data into actionable business intelligence. Built with Google Dataform, BigQuery, Cloud Storage and Looker Studio, it delivers:
Before this implementation, the e-commerce company The Look E-commerce faced:
| Product | Description | Decision Factor |
|---|---|---|
| BigQuery | Serverless data warehouse for storing and querying structured data at petabyte scale using SQL. | Performative, Scalable, Cost-efficient, Seamless integration, Supports analytics |
| Dataform | Data transformation and orchestration tool built for SQL-based pipelines in BigQuery. | Git version-control, data lineage clarity, Data Quality Checks & unit testing, Works directly in BigQuery |
| Looker Studio | Business intelligence (BI) and visualization tool that connects directly to BigQuery. | Seamless BigQuery integration, real-time updates |
| Cloud Storage | Secure, scalable object storage for raw and semi-processed data. | Seamless integration with BigQuery, cost-efficient |
erDiagram
trx_fct {
INTEGER order_id PK
INTEGER product_id PK
INTEGER inventory_item_id PK
INTEGER user_id
STRING order_status
DATE date
STRING product_brand
STRING product_category
INTEGER user_age
STRING user_gender
STRING user_country
STRING iso_abbreviation
NUMERIC sale_price
NUMERIC product_cost
NUMERIC gross_profit
TIMESTAMP processing_timestamp
}
inventory_dim {
INTEGER inventory_item_id PK
DATE date_created
DATE date_sold
TIMESTAMP processing_timestamp
}
events_fct {
INTEGER event_id PK
INTEGER user_id
INTEGER sequence_number
INTEGER session_id
DATE date_created
STRING traffic_source
STRING uri
STRING event_type
TIMESTAMP processing_timestamp
}
orders_dim {
INTEGER order_id PK
STRING status
DATE date_created
DATE date_returned
DATE date_shipped
DATE date_delivered
INTEGER num_of_item
TIME order_processing_time
TIME order_transit_time
TIME order_lead_time
TIMESTAMP processing_timestamp
}
products_dim {
INTEGER product_id PK
NUMERIC cost
STRING category
STRING brand
NUMERIC retail_price
NUMERIC markup
STRING department
TIMESTAMP processing_timestamp
}
users_dim {
INTEGER user_id PK
INTEGER age
STRING gender
STRING country
STRING traffic_source
DATE date_created
STRING iso_abbreviation
TIMESTAMP processing_timestamp
}
trx_fct ||--|| orders_dim : "order_id"
trx_fct ||--|| products_dim : "product_id"
trx_fct ||--|| inventory_dim : "inventory_item_id"
trx_fct ||--|| users_dim : "user_id"
events_fct ||--|| users_dim : "user_id"
Based on current consumption, this project costs an estimated $13 per month.