bakaano.streamflow_trainer

Training pipeline for regional streamflow models.

Role: Build training datasets and train the TCN-based streamflow model.

bakaano.streamflow_trainer.asym_laplace_plus_mse_sqrt(y_true, params, scale_min=0.0001, mse_weight=0.05)
bakaano.streamflow_trainer.asym_laplace_nll(y_true, params, r_clip=5.0, scale_clip=(0.001, 5.0), peak_weight=0.3)
class bakaano.streamflow_trainer.DataPreprocessor(working_dir, study_area, grdc_streamflow_nc_file, train_start, train_end, routing_method, catchment_size_threshold)[source]

Bases: object

_extract_station_rowcol(lat, lon)[source]

Extract the row and column indices for a given latitude and longitude from given raster file.

Parameters:
  • lat (float) – The latitude of the station.

  • lon (float) – The longitude of the station.

Returns:

  • row (int) – The row index corresponding to the given latitude and longitude.

  • col (int) – The column index corresponding to the given latitude and longitude.

_snap_coordinates(lat, lon)[source]

Snap the given latitude and longitude to the nearest river segment based on a river grid.

Parameters:
  • lat (float) – The latitude to be snapped.

  • lon (float) – The longitude to be snapped.

Returns:

  • snapped_lat (float) – The latitude of the nearest river segment.

  • snapped_lon (float) – The longitude of the nearest river segment.

load_observed_streamflow(grdc_streamflow_nc_file)[source]

Load and filter observed GRDC streamflow data in a schema-robust way. Works for single- and multi-station NetCDFs.

Parameters:

grdc_streamflow_nc_file (str) – Path to GRDC NetCDF file.

Returns:

Filtered GRDC subset for the study area.

Return type:

xarray.Dataset

_open_grdc_dataset(grdc_streamflow_nc_file)[source]

Open GRDC NetCDF with backend fallback for Colab/Drive compatibility.

load_observed_streamflow_from_csv_dir(csv_dir, lookup_csv, id_col='id', lat_col='latitude', lon_col='longitude', date_col='date', discharge_col='discharge', file_pattern='{id}.csv')[source]

Load observed streamflow from per-station CSV files using a lookup table.

The lookup table must include station identifiers and coordinates. The method filters stations to the study area, then loads per-station CSVs by ID.

Parameters:
  • csv_dir (str) – Directory containing per-station CSV files.

  • lookup_csv (str) – CSV file with station ids and coordinates.

  • id_col (str) – Station id column in lookup CSV.

  • lat_col (str) – Latitude column in lookup CSV.

  • lon_col (str) – Longitude column in lookup CSV.

  • date_col (str) – Date column in station CSVs.

  • discharge_col (str) – Discharge column in station CSVs.

  • file_pattern (str) – Pattern for station CSV filenames (e.g., "{id}.csv").

Returns:

Mapping of station_id to observed discharge DataFrame.

Return type:

dict

get_data()[source]

Extract and preprocess predictor and response variables for each station based on its coordinates.

Returns:

A list containing two elements: - self.data_list: A list of tuples, each containing predictors (DataFrame) and response (DataFrame). - self.catchment: A list of tuples, each containing catchment data (accumulation and slope values).

Return type:

list

class bakaano.streamflow_trainer.StreamflowModel(working_dir, batch_size, num_epochs, learning_rate=0.0001, loss_function='huber', train_start=None, train_end=None, seed=100, area_normalize=True, lr_schedule=None, warmup_epochs=3, min_learning_rate=1e-05)[source]

Bases: object

Role: Define and train the multi-scale TCN streamflow model.

Full-materialization training variant of the regional streamflow model.

Key characteristics (actual behavior): - Prepares per-station scaled series using area normalization (optional). - Materializes all valid 365-day sliding windows in memory. - Trains directly with in-memory NumPy arrays. - Enables XLA globally via tf.config.optimizer.set_jit(True).

_build_lr_callback()[source]

Create a learning-rate schedule callback with optional warmup.

prepare_data(data_list)[source]

Prepare the data for training the streamflow prediction model.

This materializes all sliding windows (365), filters NaNs once, and concatenates across stations.

build_model()[source]
train_model()[source]
load_regional_model(path)[source]

Load a previously saved regional model from disk.

Parameters:
  • path (str) – Path to the saved model file.

  • Returns – tensorflow.keras.Model: Loaded model instance.

regional_summary()[source]

Print the Keras model summary.