Inputs and Outputs¶
This page summarizes required inputs, expected units/CRS, and outputs by module.
Global assumptions¶
CRS: EPSG:4326 for all rasters and vector inputs unless noted.
Area units: number of DEM grid cells; the effective cell area depends on the DEM resolution.
Discharge units: m³/s (raw), area-normalized to mm/day for model inputs.
Observed streamflow CSV schema¶
Lookup CSV (station metadata)¶
Required columns (default names in parentheses):
- station id (id)
- latitude (latitude)
- longitude (longitude)
Notes: - Coordinates must be in EPSG:4326. - Station IDs are treated as strings.
Per-station CSV (time series)¶
Required columns (default names in parentheses):
- date (date)
- discharge (discharge)
Notes:
- Dates must be parseable by pandas (e.g., YYYY-MM-DD).
- Discharge is expected in m³/s.
- One CSV per station; filenames follow {id}.csv by default.
Predicted streamflow units¶
Model training is performed on linear targets. When area_normalize=True,
the target is area-normalized discharge depth (mm/day), and predictions are
converted back to volumetric discharge (m³/s) by reversing the area
normalization. The CSV outputs written to
{working_dir}/predicted_streamflow_data are in m³/s.
When loss_function="asym_laplace_nll", the model predicts 3 values per
sample (location + asymmetric scales). The runner/simulator uses the first
value (location term) as discharge prediction for plots and CSV outputs.
If area_normalize=False is used, the model trains and predicts in raw
m³/s, and no area-based conversion is applied at inference time.
Note: Prediction time series start after a one-year warmup period. The first 365 days of the simulation window are used as model context and are not written to the output CSVs.
Module reference¶
DEM (bakaano.dem.DEM)¶
Inputs: - study_area: basin shapefile (EPSG:4326) - local_data_path (optional): local DEM GeoTIFF (EPSG:4326)
Outputs:
- {working_dir}/elevation/dem_clipped.tif
- {working_dir}/elevation/slope_clipped.tif
Soil (bakaano.soil.Soil)¶
Inputs: - study_area: basin shapefile (EPSG:4326)
Outputs:
- {working_dir}/soil/clipped_AWCh3_M_sl6_1km_ll.tif
- {working_dir}/soil/clipped_WWP_M_sl6_1km_ll.tif
- {working_dir}/soil/clipped_AWCtS_M_sl6_1km_ll.tif
NDVI (bakaano.ndvi.NDVI)¶
Inputs: - start_date / end_date: YYYY-MM-DD - study_area: basin shapefile (EPSG:4326)
Outputs:
- {working_dir}/ndvi/daily_ndvi_climatology.pkl
- Intermediate NDVI GeoTIFFs in {working_dir}/ndvi/
Tree cover (bakaano.tree_cover.TreeCover)¶
Inputs: - start_date / end_date: YYYY-MM-DD - study_area: basin shapefile (EPSG:4326)
Outputs:
- {working_dir}/vcf/mean_tree_cover.tif
- {working_dir}/vcf/mean_herb_cover.tif
AlphaEarth (bakaano.alpha_earth.AlphaEarth)¶
Inputs: - start_date / end_date: YYYY-MM-DD - study_area: basin shapefile (EPSG:4326)
Outputs:
- {working_dir}/alpha_earth/band_A00.tif … band_A63.tif
Meteo (bakaano.meteo.Meteo)¶
Inputs: - start_date / end_date: YYYY-MM-DD - data_source: CHELSA, ERA5, or CHIRPS
Outputs:
NetCDFs in
{working_dir}/{data_source}/(pr, tasmax, tasmin, tas)For Earth Engine downloads (ERA5/CHIRPS), intermediate GeoTIFFs are stored in scratch folders before conversion to NetCDF.
VegET + routing (bakaano.veget.VegET)¶
Inputs: - DEM, soil, NDVI, tree cover, meteo - routing_method: mfd, d8, dinf - climate_data_source: CHELSA, ERA5, or CHIRPS - resume / checkpoint_days (optional): resume interrupted routing runs and
control checkpoint frequency
Outputs:
- Routed runoff in {working_dir}/runoff_output/wacc_sparse_arrays.pkl
- Resume state during interrupted runs:
{working_dir}/runoff_output/wacc_resume_state.pkl
Temporary checkpoint chunks during interrupted runs:
{working_dir}/runoff_output/wacc_resume_chunks/*.pklRiver grid in
{working_dir}/catchment/river_grid.tif(if generated)
Streamflow training (bakaano.streamflow_trainer)¶
Inputs: - GRDC NetCDF (or CSV lookup + per-station CSVs) - Routed runoff in {working_dir}/runoff_output - AlphaEarth bands in {working_dir}/alpha_earth
Outputs:
- Trained model: {working_dir}/models/bakaano_model.keras
- AlphaEarth scaler: {working_dir}/models/alpha_earth_scaler.pkl
Streamflow simulation (bakaano.streamflow_simulator)¶
Inputs: - Trained model - Routed runoff and AlphaEarth bands - GRDC NetCDF or station CSVs (optional)
Outputs:
- Predicted streamflow CSVs in {working_dir}/predicted_streamflow_data