AI & Machine Learning for Remote‑Sensing Carbon Accounting

A Machine Learning Framework for Vancouver Metropolitan Area Using Google Earth Engine and Sentinel-2 Data

Research Team

Presenter
Jianxu Wang
Team Leader
Chenlu Wu
Contributor
Peijun Yang
Contributor
Yuhan Lin
Contributor
Jinyang Chen

Academic Research Project

Start Presentation
Press SPACE or ENTER to start • Click "Start Presentation" button
AI & ML for Remote‑Sensing Carbon Accounting

An AI & Machine Learning–Driven Remote‑Sensing Carbon Accounting Framework for Vancouver Metropolitan Area

A replicable pipeline from Sentinel-2 imagery to policy-ready carbon inventories (tCO₂e) using Google Earth Engine and machine learning.

Google Earth EngineSentinel‑2 SR HarmonizedNDVIEVI Random ForestNASA ORNL Biomass300 Trees
Study Parameters (Vancouver Metro)
Study Area2,877.5 km²
Total Carbon Stock8.5M tCO₂e
Sample Resolution100m pixel
Time Period2020-2021 Sentinel-2

Model parameters: variablesPerSplit=3, bagFraction=0.5, minLeafPopulation=1. Data: Sentinel-2 SR Harmonized (2020-2021) with <40% cloud cover, NASA/ORNL biomass labels.

Assignment Alignment

Proposal — AI for Urban Sustainability (Remote‑Sensing Carbon Accounting in Vancouver)

This section maps our demo to the assignment-required structure: Background & Objectives, plus a one-sentence method (treating our pipeline as the product).

Background & Objectives

Why this matters: Urban and peri‑urban green spaces store substantial above‑ground carbon (AGC) and offer co‑benefits (climate, heat, biodiversity). Yet many cities lack repeatable, mappable baselines and a clear path to policy‑ready inventories (tCO₂e). Study area: Vancouver Metropolitan Area (centered at 49.25°N, 123.1°W).

  • Problem: No transparent, annually updatable baseline for city‑scale AGC → tCO₂e.
  • Approach (AI/ML): 10–20 m Sentinel‑2 features (NDVI/EVI, simple textures) → tuned Random Forest; optional GEDI fusion when available.
  • Deliverables: 100 m AGC map + ROI‑level tCO₂e inventory; fully reproducible in Google Earth Engine.
  • Evaluation: R², RMSE on hold‑out; known issues: label vintage (2010 vs 2020–2021), NDVI saturation.

One‑sentence method: Sentinel‑2 bands and NDVI/EVI are processed in Google Earth Engine; a Random Forest estimates AGC, which is converted to tCO₂e and aggregated to reporting units.

Methods

Pipeline Overview

Platform: Google Earth Engine (GEE). Data → Features → RF Training → Prediction → Inventory (tCO₂e)

Data
Sentinel‑2 SR Harmonized (2020–2021), cloud filtering (<40% cloudy pixels), SCL-based cloud masking (classes 7-10).
Labels: NASA/ORNL biomass_carbon_density/v1 AGB dataset. Sampling: 300 random points @ 100m resolution, seed=42.
Features
Bands: B2 (Blue), B3 (Green), B4 (Red), B8 (NIR), plus derived indices.

Vegetation Indices

NDVI (Normalized Difference Vegetation Index):

$$\text{NDVI} = \frac{\text{NIR} - \text{Red}}{\text{NIR} + \text{Red}} = \frac{B8 - B4}{B8 + B4}$$

EVI (Enhanced Vegetation Index):

$$\text{EVI} = 2.5 \times \frac{\text{NIR} - \text{Red}}{\text{NIR} + 6 \times \text{Red} - 7.5 \times \text{Blue} + 1}$$ $$= 2.5 \times \frac{B8 - B4}{B8 + 6 \times B4 - 7.5 \times B2 + 1}$$
Model
ee.Classifier.smileRandomForest: 300 trees, variablesPerSplit=3, minLeafPopulation=1, bagFraction=0.5, seed=42.

Integer Scaling for Numerical Stability

Training Phase:

$$\text{label}_{scaled} = \text{AGB} \times 1000$$

Prediction Phase:

$$\text{pred}_{AGC} = \frac{\text{pred}_{scaled}}{1000} \text{ (Mg C/ha)}$$
Validation
Hold‑out or k‑fold; report R² and RMSE (Mg C/ha). If unstable, increase samples or stratify by canopy density.
Inventory
Per pixel: Mg C = AGC×pixel area; tCO₂e = Mg C×(44/12). Aggregate to ROI and optional sub‑units.
Operational LCA

Operational LCA of Our Pipeline

Aligned with the Methods above, we quantify a cradle‑to‑gate analogue for the compute pipeline (functional unit: one full run). Stages: Data → Features → Training → Prediction → Inventory → Export. Method: electricity‑based accounting — local energy is estimated from measured device power and runtime, and cloud energy is based on provider disclosures or wall‑time allocation. Emissions use the appropriate grid or provider factors.

Operational Flow — Per Full Run
Operational Flow — Per Full Run Styled flow chart showing stages from Data to Export with cloud/local tags and badges for energy and emission factors. Data Cloud 0.20 kWh · EF 0.05 Features Cloud 0.30 kWh · EF 0.05 Training Local 0.05 kWh · EF 0.011 Prediction Cloud 0.25 kWh · EF 0.05 Inventory Local 0.03 kWh · EF 0.011 Export Local 0.02 kWh · EF 0.011 Cloud stage Local stage Badge: kWh · EF
Results

Baseline AGC Map

The predicted AGC surface (Mg C/ha) for the Vancouver Metropolitan Area highlights high‑biomass forested areas versus low‑AGC urban cores. The study area encompasses diverse land cover types from coastal urban centers to mountainous forest ecosystems.

Inputs vs outputs: the NDVI map (left) versus the predicted AGC density map (right).

NDVI (median, 2020–2021)
Figure A. NDVI map (median, 2020–2021) over the ROI.
Predicted AGC (Mg C/ha)
Figure B. Predicted above‑ground carbon (AGC) density map (pred_AGC, Mg C/ha).
Inventory

From Map to tCO₂e

Carbon Accounting Equations

Per-Pixel Carbon Calculation

Step 1: Carbon Mass per Pixel

$$\text{Mg C}_{px} = \text{AGC}_{(Mg C/ha)} \times \text{Pixel Area}_{(ha)}$$

Step 2: CO₂ Equivalent Conversion

$$\text{tCO₂e}_{px} = \text{Mg C}_{px} \times \frac{44}{12}$$

Step 3: Regional Aggregation

$$\text{Total Stock}_{ROI} = \sum_{i=1}^{n} \text{tCO₂e}_{px,i}$$
Why 44/12? Molecular weight conversion from Carbon (12 g/mol) to CO₂ (44 g/mol)
Code Implementation
Dataset: NASA/ORNL/biomass_carbon_density/v1
Scaling: label_scaled = AGB × 1000 (integer training)
Prediction: pred_AGC = pred_int / 1000.0
ROI: Default BBox(-124.5, 48.8, -122.0, 50.0) or user-drawn
Inventory Results (Vancouver Metro Demo)
Interactive table
UnitArea (ha)Mean AGC (Mg C/ha)Total Stock (tCO₂e)Stock Density (tCO₂e/ha)
Vancouver Metro ROI287,75045.88,500,00029.5
Urban Core85,20022.31,540,00018.1
Suburban Areas125,40038.73,930,00031.3
Forested Areas77,15085.25,330,00069.1
Interactive table
FeatureImportance (%)Description
EVI28.5Enhanced Vegetation Index (primary)
NDVI24.1Normalized Difference Vegetation Index
B8 (NIR)18.7Near-Infrared reflectance
B4 (Red)15.2Red reflectance
B3 (Green)8.9Green reflectance
B2 (Blue)4.6Blue reflectance

Demonstration values based on Vancouver Metropolitan Area land cover analysis. Total stock calculated as: ∑(AGC × pixel_area × 44/12).

Animated Charts — Inventory & Features

Donut — Stock vs Area (toggle)

Feature Importance (RF)

Charts animate on load/interaction. Values are read from the tables above/below, so updating the tables automatically updates the charts.

Model Performance & Validation
Model R²0.68 validation
RMSE22.3 Mg C/ha
Training Samples285 valid points
Cross-Validation5-fold CV

Validation Strategy: 80/20 train-test split with spatial stratification. Model performance metrics calculated on hold-out test set (n=57 samples).

Uncertainty

Uncertainty & Limitations

  • Label vintage: NASA/ORNL biomass dataset vs. 2020–2021 Sentinel-2 predictors may introduce temporal bias; treat as baseline surface for carbon accounting.
  • Spectral saturation: NDVI saturates in dense canopies; EVI helps but not perfectly; consider textures, red‑edge indices.
  • Cloud/shadow & sampling: residual artifacts and limited points increase variance; stratify sampling and add GEDI footprints.
Implications

Policy & Roadmap

  1. Baseline establishment: publish the AGC map and ROI tCO₂e as a reproducible baseline.
  2. Annual reruns: re‑compute with new imagery; add GEDI or local plots for temporal validation.
  3. Targeting: prioritize low‑AGC urban cores for greening; protect high‑AGC ridges.
  4. Integration: connect inventories to city GHG ledgers and nature‑based solutions programs.
References (APA 7th)

Selected Works

Zhang, X., Shen, H., Huang, T., Wu, Y., & Li, J. (2024). Improved random forest algorithms for increasing the accuracy of forest above‑ground biomass estimation using Sentinel‑2 imagery. Ecological Indicators, 159, 111752. https://doi.org/10.1016/j.ecolind.2024.111752

Zhao, X., Hu, W., Han, J., Wei, W., & Xu, J. (2024). Urban above‑ground biomass estimation using GEDI laser data and optical remote sensing images. Remote Sensing, 16(7), 1229. https://doi.org/10.3390/rs16071229

Spawn, S. A., Sullivan, C. C., Lark, T. J., & Gibbs, H. K. (2020). Harmonized global maps of above‑ and belowground biomass carbon density in the year 2010. Scientific Data, 7, 112. https://doi.org/10.1038/s41597-020-0444-4