bakaano.streamflow_trainer¶

Training pipeline for regional streamflow models.

Role: Build training datasets and train the TCN-based streamflow model.

bakaano.streamflow_trainer.asym_laplace_plus_mse_sqrt(y_true, params, scale_min=0.0001, mse_weight=0.05)¶

bakaano.streamflow_trainer.asym_laplace_nll(y_true, params, r_clip=5.0, scale_clip=(0.001, 5.0), peak_weight=0.3)¶

class bakaano.streamflow_trainer.DataPreprocessor(working_dir, study_area, grdc_streamflow_nc_file, train_start, train_end, routing_method, catchment_size_threshold)[source]¶

Bases: object

_extract_station_rowcol(lat, lon)[source]¶

Extract the row and column indices for a given latitude and longitude from given raster file.

Parameters:

lat (float) – The latitude of the station.
lon (float) – The longitude of the station.

Returns:

row (int) – The row index corresponding to the given latitude and longitude.
col (int) – The column index corresponding to the given latitude and longitude.

_snap_coordinates(lat, lon)[source]¶

Snap the given latitude and longitude to the nearest river segment based on a river grid.

Parameters:

lat (float) – The latitude to be snapped.
lon (float) – The longitude to be snapped.

Returns:

snapped_lat (float) – The latitude of the nearest river segment.
snapped_lon (float) – The longitude of the nearest river segment.

load_observed_streamflow(grdc_streamflow_nc_file)[source]¶

Load and filter observed GRDC streamflow data in a schema-robust way. Works for single- and multi-station NetCDFs.

Parameters:: grdc_streamflow_nc_file (str) – Path to GRDC NetCDF file.
Returns:: Filtered GRDC subset for the study area.
Return type:: xarray.Dataset

_open_grdc_dataset(grdc_streamflow_nc_file)[source]¶: Open GRDC NetCDF with backend fallback for Colab/Drive compatibility.

load_observed_streamflow_from_csv_dir(csv_dir, lookup_csv, id_col='id', lat_col='latitude', lon_col='longitude', date_col='date', discharge_col='discharge', file_pattern='{id}.csv')[source]¶

Load observed streamflow from per-station CSV files using a lookup table.

The lookup table must include station identifiers and coordinates. The method filters stations to the study area, then loads per-station CSVs by ID.

Parameters:

csv_dir (str) – Directory containing per-station CSV files.
lookup_csv (str) – CSV file with station ids and coordinates.
id_col (str) – Station id column in lookup CSV.
lat_col (str) – Latitude column in lookup CSV.
lon_col (str) – Longitude column in lookup CSV.
date_col (str) – Date column in station CSVs.
discharge_col (str) – Discharge column in station CSVs.
file_pattern (str) – Pattern for station CSV filenames (e.g., "{id}.csv").

Returns:

Mapping of station_id to observed discharge DataFrame.

Return type:

dict

get_data()[source]¶

Extract and preprocess predictor and response variables for each station based on its coordinates.

Returns:: A list containing two elements: - self.data_list: A list of tuples, each containing predictors (DataFrame) and response (DataFrame). - self.catchment: A list of tuples, each containing catchment data (accumulation and slope values).
Return type:: list

class bakaano.streamflow_trainer.StreamflowModel(working_dir, batch_size, num_epochs, learning_rate=0.0001, loss_function='huber', train_start=None, train_end=None, seed=100, area_normalize=True, lr_schedule=None, warmup_epochs=3, min_learning_rate=1e-05)[source]¶

Bases: object

Role: Define and train the multi-scale TCN streamflow model.

Full-materialization training variant of the regional streamflow model.

Key characteristics (actual behavior): - Prepares per-station scaled series using area normalization (optional). - Materializes all valid 365-day sliding windows in memory. - Trains directly with in-memory NumPy arrays. - Enables XLA globally via tf.config.optimizer.set_jit(True).

_build_lr_callback()[source]¶: Create a learning-rate schedule callback with optional warmup.

prepare_data(data_list)[source]¶

Prepare the data for training the streamflow prediction model.

This materializes all sliding windows (365), filters NaNs once, and concatenates across stations.

build_model()[source]¶

train_model()[source]¶

load_regional_model(path)[source]¶

Load a previously saved regional model from disk.

Parameters:

path (str) – Path to the saved model file.
Returns – tensorflow.keras.Model: Loaded model instance.

regional_summary()[source]¶: Print the Keras model summary.