ecoscope.analysis.trend_analysis#

Todo: - Outputs a dataframe representing the GAMM for a unique dataset as a benchmark - Extract forest cover as a task in Ecoscope. Workflow process -

Module Contents#

class ecoscope.analysis.trend_analysis.GAMRegressor(alpha=0.1, degree_of_freedom=20, degree=3, family='gaussian')#

Bases: sklearn.base.BaseEstimator, sklearn.base.RegressorMixin

Generalized Additive Model (GAM) Regressor using B-Splines.

A scikit-learn compatible wrapper around statsmodels GLMGam that provides a user-friendly interface for fitting GAMs to time series data.

Parameters:
  • alpha (float, default=0.1) – Smoothing parameter. Higher values result in smoother curves (more linear).

  • degree_of_freedom (int, default=20) – Degrees of freedom for the spline basis.

  • degree (int, default=3) – Degree of the B-spline basis (cubic splines by default).

  • family ({"gaussian", "poisson", "binomial"}, default="gaussian") – Distribution family for the GLM.

Examples

>>> from ecoscope.analysis.trend_analysis import GAMRegressor
>>> import numpy as np
>>> X = np.array([2000, 2001, 2002, 2003, 2004]).reshape(-1, 1)
>>> y = np.array([100, 95, 90, 85, 80])
>>> gam = GAMRegressor(alpha=0.1).fit(X, y)
>>> predictions = gam.predict(X)
alpha = 0.1#
degree_of_freedom = 20#
degree = 3#
fit(X, y, upper_bound=None, lower_bound=None)#

Fit the GAM model.

Parameters:
  • X (array-like of shape (n_samples, 1) or (n_samples,)) – Training data (typically time/date values).

  • y (array-like of shape (n_samples,)) – Target values.

  • upper_bound (float, optional) – Upper bound for spline knots. If None, uses max(X).

  • lower_bound (float, optional) – Lower bound for spline knots. If None, uses min(X).

Returns:

self – Returns self for method chaining.

Return type:

GAMRegressor

_check_is_fitted()#

Check if the model has been fitted.

Return type:

None

predict(X)#

Predict using the fitted model.

Parameters:

X (array-like of shape (n_samples, 1) or (n_samples,)) – Samples to predict.

Returns:

y_pred – Predicted values.

Return type:

ndarray of shape (n_samples,)

Raises:

ValueError – If the model has not been fitted.

aic()#

Return Akaike Information Criterion.

Return type:

float

bic()#

Return Bayesian Information Criterion.

Return type:

float

mse(X, y)#

Return Mean Squared Error on given data.

Parameters:
  • X (array-like) – Input data.

  • y (array-like) – True target values.

Returns:

Mean squared error.

Return type:

float

r_squared(X, y)#

Return R-squared (coefficient of determination) on given data.

Parameters:
  • X (array-like) – Input data.

  • y (array-like) – True target values.

Returns:

R-squared value. 1.0 indicates perfect fit, 0.0 indicates model performs same as predicting the mean.

Return type:

float

predict_with_ci(X)#

Predict with confidence intervals.

Parameters:

X (array-like of shape (n_samples, 1) or (n_samples,)) – Samples to predict.

Returns:

  • mean (ndarray) – Predicted mean values.

  • ci_lower (ndarray) – Lower bound of confidence interval.

  • ci_upper (ndarray) – Upper bound of confidence interval.

Raises:

ValueError – If the model has not been fitted.

Return type:

Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray]

ecoscope.analysis.trend_analysis.choose_cross_validator(X)#

Choose appropriate cross-validator based on sample size.

Parameters:

X (ndarray) – Input data.

Returns:

Cross-validation strategy.

Return type:

BaseCrossValidator

ecoscope.analysis.trend_analysis._fit_and_score_ic(alpha, X, y, metric, lower_bound, upper_bound, degree_of_freedom, degree, family)#

Fit GAM and return alpha with its information criterion (aic, bic) score.

ecoscope.analysis.trend_analysis._fit_and_score_cv(alpha, fold_idx, train_index, test_index, X, y, metric, lower_bound, upper_bound, degree_of_freedom, degree, family)#

Fit GAM on fold of test/train data and return alpha, fold index, and score.

ecoscope.analysis.trend_analysis.optimize_gam_cv(X, y, alphas, cross_validator, metric='aic', lower_bound=None, upper_bound=None, degree_of_freedom=20, degree=3, family='gaussian')#

Optimize GAM smoothing parameter using cross-validation.

Parameters:
  • X (ndarray) – Training data.

  • y (ndarray) – Target values.

  • alphas (ndarray) – Array of alpha values to search.

  • cross_validator (BaseCrossValidator) – Cross-validation strategy.

  • metric ({"aic", "bic", "euclidean", "mse", "r_squared"}, default="aic") – Metric to optimize. AIC/BIC are computed on full data, others use cross-validation. Note: r_squared is maximized, others are minimized.

  • lower_bound (float, optional) – Lower bound for spline knots.

  • upper_bound (float, optional) – Upper bound for spline knots.

  • degree_of_freedom (int, default=20) – Degrees of freedom for the spline basis.

  • degree (int, default=3) – Degree of the B-spline basis (cubic splines by default).

  • family ({"gaussian", "poisson", "binomial"}, default="gaussian") – Distribution family for the GLM.

Returns:

  • best_alpha (float) – Optimal alpha value.

  • best_gam (GAMRegressor) – Fitted GAM with optimal alpha.

Return type:

Tuple[float, GAMRegressor]

ecoscope.analysis.trend_analysis.optimize_gam(X, y, cross_validator=None, alphas=None, lower_bound=None, upper_bound=None, bound_padding_ratio=0.1, metric='aic', degree_of_freedom=20, degree=3, family='gaussian')#

Optimize GAM smoothing parameter with automatic defaults.

Parameters:
  • X (ndarray) – Training data.

  • y (ndarray) – Target values.

  • cross_validator (BaseCrossValidator, optional) – Cross-validation strategy. If None, chosen automatically.

  • alphas (ndarray, optional) – Array of alpha values to search. Defaults to logspace(-4, 4, 100).

  • lower_bound (float, optional) – Lower bound for spline knots. If None, computed as min(X) minus padding based on bound_padding_ratio.

  • upper_bound (float, optional) – Upper bound for spline knots. If None, computed as max(X) plus padding based on bound_padding_ratio.

  • bound_padding_ratio (float, default=0.1) – Fraction of the data range to use as padding when computing default bounds. For example, 0.1 adds 10% of (max(X) - min(X)) as padding.

  • metric ({"aic", "bic", "euclidean", "mse", "r_squared"}, default="aic") – Metric to optimize. AIC/BIC are computed on full data, others use cross-validation. Note: r_squared is maximized, others are minimized.

  • degree_of_freedom (int, default=20) – Degrees of freedom for the spline basis.

  • degree (int, default=3) – Degree of the B-spline basis (cubic splines by default).

  • family ({"gaussian", "poisson", "binomial"}, default="gaussian") – Distribution family for the GLM.

Returns:

  • best_alpha (float) – Optimal alpha value.

  • best_gam (GAMRegressor) – Fitted GAM with optimal alpha.

Return type:

Tuple[float, GAMRegressor]

Examples

>>> from ecoscope.analysis.trend_analysis import optimize_gam
>>> import numpy as np
>>> X = np.array([2000, 2001, 2002, 2003, 2004])
>>> y = np.array([100, 95, 90, 85, 80])
>>> alpha, gam = optimize_gam(X, y, metric="aic")

Extract forest cover trends from Google Earth Engine dataset.

Parameters:
  • aoi (gpd.GeoDataFrame) – Area of interest geometry (must have CRS set).

  • tree_cover_threshold (float, default=60.0) – Minimum tree cover percentage to consider as forest (0-100).

  • scale (int, default=30) – Pixel scale in meters for reduction.

  • max_pixels (int, default=1e9) – Maximum pixels for reduction.

Returns:

DataFrame with columns: - year: Year of observation - loss_area: Forest loss area in acres for that year - cumsum_loss_area: Cumulative loss area in acres - survival_area: Remaining forest area in acres

Return type:

pd.DataFrame