ecoscope.analysis.trend_analysis#
Todo: - Outputs a dataframe representing the GAMM for a unique dataset as a benchmark - Extract forest cover as a task in Ecoscope. Workflow process -
Module Contents#
- class ecoscope.analysis.trend_analysis.GAMRegressor(alpha=0.1, degree_of_freedom=20, degree=3, family='gaussian')#
Bases:
sklearn.base.BaseEstimator,sklearn.base.RegressorMixinGeneralized Additive Model (GAM) Regressor using B-Splines.
A scikit-learn compatible wrapper around statsmodels GLMGam that provides a user-friendly interface for fitting GAMs to time series data.
- Parameters:
alpha (float, default=0.1) – Smoothing parameter. Higher values result in smoother curves (more linear).
degree_of_freedom (int, default=20) – Degrees of freedom for the spline basis.
degree (int, default=3) – Degree of the B-spline basis (cubic splines by default).
family ({"gaussian", "poisson", "binomial"}, default="gaussian") – Distribution family for the GLM.
Examples
>>> from ecoscope.analysis.trend_analysis import GAMRegressor >>> import numpy as np >>> X = np.array([2000, 2001, 2002, 2003, 2004]).reshape(-1, 1) >>> y = np.array([100, 95, 90, 85, 80]) >>> gam = GAMRegressor(alpha=0.1).fit(X, y) >>> predictions = gam.predict(X)
- alpha = 0.1#
- degree_of_freedom = 20#
- degree = 3#
- fit(X, y, upper_bound=None, lower_bound=None)#
Fit the GAM model.
- Parameters:
X (array-like of shape (n_samples, 1) or (n_samples,)) – Training data (typically time/date values).
y (array-like of shape (n_samples,)) – Target values.
upper_bound (float, optional) – Upper bound for spline knots. If None, uses max(X).
lower_bound (float, optional) – Lower bound for spline knots. If None, uses min(X).
- Returns:
self – Returns self for method chaining.
- Return type:
- _check_is_fitted()#
Check if the model has been fitted.
- Return type:
None
- predict(X)#
Predict using the fitted model.
- Parameters:
X (array-like of shape (n_samples, 1) or (n_samples,)) – Samples to predict.
- Returns:
y_pred – Predicted values.
- Return type:
ndarray of shape (n_samples,)
- Raises:
ValueError – If the model has not been fitted.
- aic()#
Return Akaike Information Criterion.
- Return type:
float
- bic()#
Return Bayesian Information Criterion.
- Return type:
float
- mse(X, y)#
Return Mean Squared Error on given data.
- Parameters:
X (array-like) – Input data.
y (array-like) – True target values.
- Returns:
Mean squared error.
- Return type:
float
- r_squared(X, y)#
Return R-squared (coefficient of determination) on given data.
- Parameters:
X (array-like) – Input data.
y (array-like) – True target values.
- Returns:
R-squared value. 1.0 indicates perfect fit, 0.0 indicates model performs same as predicting the mean.
- Return type:
float
- predict_with_ci(X)#
Predict with confidence intervals.
- Parameters:
X (array-like of shape (n_samples, 1) or (n_samples,)) – Samples to predict.
- Returns:
mean (ndarray) – Predicted mean values.
ci_lower (ndarray) – Lower bound of confidence interval.
ci_upper (ndarray) – Upper bound of confidence interval.
- Raises:
ValueError – If the model has not been fitted.
- Return type:
Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray]
- ecoscope.analysis.trend_analysis.choose_cross_validator(X)#
Choose appropriate cross-validator based on sample size.
- Parameters:
X (ndarray) – Input data.
- Returns:
Cross-validation strategy.
- Return type:
BaseCrossValidator
- ecoscope.analysis.trend_analysis._fit_and_score_ic(alpha, X, y, metric, lower_bound, upper_bound, degree_of_freedom, degree, family)#
Fit GAM and return alpha with its information criterion (aic, bic) score.
- ecoscope.analysis.trend_analysis._fit_and_score_cv(alpha, fold_idx, train_index, test_index, X, y, metric, lower_bound, upper_bound, degree_of_freedom, degree, family)#
Fit GAM on fold of test/train data and return alpha, fold index, and score.
- ecoscope.analysis.trend_analysis.optimize_gam_cv(X, y, alphas, cross_validator, metric='aic', lower_bound=None, upper_bound=None, degree_of_freedom=20, degree=3, family='gaussian')#
Optimize GAM smoothing parameter using cross-validation.
- Parameters:
X (ndarray) – Training data.
y (ndarray) – Target values.
alphas (ndarray) – Array of alpha values to search.
cross_validator (BaseCrossValidator) – Cross-validation strategy.
metric ({"aic", "bic", "euclidean", "mse", "r_squared"}, default="aic") – Metric to optimize. AIC/BIC are computed on full data, others use cross-validation. Note: r_squared is maximized, others are minimized.
lower_bound (float, optional) – Lower bound for spline knots.
upper_bound (float, optional) – Upper bound for spline knots.
degree_of_freedom (int, default=20) – Degrees of freedom for the spline basis.
degree (int, default=3) – Degree of the B-spline basis (cubic splines by default).
family ({"gaussian", "poisson", "binomial"}, default="gaussian") – Distribution family for the GLM.
- Returns:
best_alpha (float) – Optimal alpha value.
best_gam (GAMRegressor) – Fitted GAM with optimal alpha.
- Return type:
Tuple[float, GAMRegressor]
- ecoscope.analysis.trend_analysis.optimize_gam(X, y, cross_validator=None, alphas=None, lower_bound=None, upper_bound=None, bound_padding_ratio=0.1, metric='aic', degree_of_freedom=20, degree=3, family='gaussian')#
Optimize GAM smoothing parameter with automatic defaults.
- Parameters:
X (ndarray) – Training data.
y (ndarray) – Target values.
cross_validator (BaseCrossValidator, optional) – Cross-validation strategy. If None, chosen automatically.
alphas (ndarray, optional) – Array of alpha values to search. Defaults to logspace(-4, 4, 100).
lower_bound (float, optional) – Lower bound for spline knots. If None, computed as min(X) minus padding based on bound_padding_ratio.
upper_bound (float, optional) – Upper bound for spline knots. If None, computed as max(X) plus padding based on bound_padding_ratio.
bound_padding_ratio (float, default=0.1) – Fraction of the data range to use as padding when computing default bounds. For example, 0.1 adds 10% of (max(X) - min(X)) as padding.
metric ({"aic", "bic", "euclidean", "mse", "r_squared"}, default="aic") – Metric to optimize. AIC/BIC are computed on full data, others use cross-validation. Note: r_squared is maximized, others are minimized.
degree_of_freedom (int, default=20) – Degrees of freedom for the spline basis.
degree (int, default=3) – Degree of the B-spline basis (cubic splines by default).
family ({"gaussian", "poisson", "binomial"}, default="gaussian") – Distribution family for the GLM.
- Returns:
best_alpha (float) – Optimal alpha value.
best_gam (GAMRegressor) – Fitted GAM with optimal alpha.
- Return type:
Tuple[float, GAMRegressor]
Examples
>>> from ecoscope.analysis.trend_analysis import optimize_gam >>> import numpy as np >>> X = np.array([2000, 2001, 2002, 2003, 2004]) >>> y = np.array([100, 95, 90, 85, 80]) >>> alpha, gam = optimize_gam(X, y, metric="aic")
- ecoscope.analysis.trend_analysis.get_forest_cover_trends(aoi, tree_cover_threshold=60.0, scale=30, max_pixels=1000000000.0)#
Extract forest cover trends from Google Earth Engine dataset.
- Parameters:
aoi (gpd.GeoDataFrame) – Area of interest geometry (must have CRS set).
tree_cover_threshold (float, default=60.0) – Minimum tree cover percentage to consider as forest (0-100).
scale (int, default=30) – Pixel scale in meters for reduction.
max_pixels (int, default=1e9) – Maximum pixels for reduction.
- Returns:
DataFrame with columns: - year: Year of observation - loss_area: Forest loss area in acres for that year - cumsum_loss_area: Cumulative loss area in acres - survival_area: Remaining forest area in acres
- Return type:
pd.DataFrame