bakaano.streamflow_simulator

Simulation and inference utilities for streamflow prediction.

Role: Prepare simulation inputs and run trained model inference.

bakaano.streamflow_simulator._open_dataset_with_fallback(nc_path)[source]

Open NetCDF with backend fallback for Colab/Drive compatibility.

class bakaano.streamflow_simulator.PredictDataPreprocessor(working_dir, study_area, sim_start, sim_end, routing_method, grdc_streamflow_nc_file=None, catchment_size_threshold=None, runoff_output_dir=None)[source]

Bases: object

_extract_station_rowcol(lat, lon)[source]

Extract the row and column indices for a given latitude and longitude from given raster file.

Parameters:
  • lat (float) – The latitude of the station.

  • lon (float) – The longitude of the station.

Returns:

  • row (int) – The row index corresponding to the given latitude and longitude.

  • col (int) – The column index corresponding to the given latitude and longitude.

_snap_coordinates(lat, lon)[source]

Snap the given latitude and longitude to the nearest river segment based on a river grid.

Parameters:
  • lat (float) – The latitude to be snapped.

  • lon (float) – The longitude to be snapped.

Returns:

  • snapped_lat (float) – The latitude of the nearest river segment.

  • snapped_lon (float) – The longitude of the nearest river segment.

_check_point_in_region(olat, olon)[source]

Check whether a single (olat, olon) point lies within a study-area shapefile.

  • If NOT inside: raise SystemExit with a formatted, user-facing message

  • If inside: print confirmation and do nothing

load_observed_streamflow(grdc_streamflow_nc_file)[source]

Load and filter observed GRDC streamflow data in a schema-robust way. Works for single- and multi-station NetCDFs.

Parameters:

grdc_streamflow_nc_file (str) – Path to GRDC NetCDF file.

Returns:

Filtered GRDC subset for the study area.

Return type:

xarray.Dataset

load_observed_streamflow_from_csv_dir(csv_dir, lookup_csv, id_col='id', lat_col='latitude', lon_col='longitude', date_col='date', discharge_col='discharge', file_pattern='{id}.csv')[source]

Load observed streamflow from per-station CSV files using a lookup table.

The lookup table must include station identifiers and coordinates. The method filters stations to the study area, then loads per-station CSVs by ID.

Parameters:
  • csv_dir (str) – Directory containing per-station CSV files.

  • lookup_csv (str) – CSV file with station ids and coordinates.

  • id_col (str) – Station id column in lookup CSV.

  • lat_col (str) – Latitude column in lookup CSV.

  • lon_col (str) – Longitude column in lookup CSV.

  • date_col (str) – Date column in station CSVs.

  • discharge_col (str) – Discharge column in station CSVs.

  • file_pattern (str) – Pattern for station CSV filenames (e.g., "{id}.csv").

Returns:

Mapping of station_id to observed discharge DataFrame.

Return type:

dict

get_data()[source]

Extract and preprocess predictor and response variables for each station based on its coordinates.

Returns:

A list containing two elements: - self.data_list: A list of tuples, each containing predictors (DataFrame) and response (DataFrame). - self.catchment: A list of tuples, each containing catchment data (accumulation and slope values).

Return type:

list

get_data_latlng(latlist, lonlist)[source]

Prepare predictors for arbitrary latitude/longitude points.

Parameters:
  • latlist (list[float]) – Latitudes to simulate.

  • lonlist (list[float]) – Longitudes to simulate.

Returns:

[data_list, catchment, latlist, lonlist].

Return type:

list

class bakaano.streamflow_simulator.PredictStreamflow(working_dir, area_normalize=True)[source]

Bases: object

prepare_data(data_list)[source]
prepare_data_latlng(data_list)[source]
load_model(path)[source]

Load a trained regional model from disk.

Parameters:

path (str) – Path to the saved Keras model.

Returns:

Loaded model instance.

Return type:

tensorflow.keras.Model

summary()[source]

Print a summary of the loaded model.