ecoscope.analysis.trend_analysis ================================ .. py:module:: ecoscope.analysis.trend_analysis .. autoapi-nested-parse:: Todo: - Outputs a dataframe representing the GAMM for a unique dataset as a benchmark - Extract forest cover as a task in Ecoscope. Workflow process - Module Contents --------------- .. py:class:: GAMRegressor(alpha = 0.1, degree_of_freedom = 20, degree = 3, family = 'gaussian') Bases: :py:obj:`sklearn.base.BaseEstimator`, :py:obj:`sklearn.base.RegressorMixin` Generalized Additive Model (GAM) Regressor using B-Splines. A scikit-learn compatible wrapper around statsmodels GLMGam that provides a user-friendly interface for fitting GAMs to time series data. :param alpha: Smoothing parameter. Higher values result in smoother curves (more linear). :type alpha: float, default=0.1 :param degree_of_freedom: Degrees of freedom for the spline basis. :type degree_of_freedom: int, default=20 :param degree: Degree of the B-spline basis (cubic splines by default). :type degree: int, default=3 :param family: Distribution family for the GLM. :type family: {"gaussian", "poisson", "binomial"}, default="gaussian" .. rubric:: Examples >>> from ecoscope.analysis.trend_analysis import GAMRegressor >>> import numpy as np >>> X = np.array([2000, 2001, 2002, 2003, 2004]).reshape(-1, 1) >>> y = np.array([100, 95, 90, 85, 80]) >>> gam = GAMRegressor(alpha=0.1).fit(X, y) >>> predictions = gam.predict(X) .. py:attribute:: alpha :value: 0.1 .. py:attribute:: degree_of_freedom :value: 20 .. py:attribute:: degree :value: 3 .. py:method:: fit(X, y, upper_bound = None, lower_bound = None) Fit the GAM model. :param X: Training data (typically time/date values). :type X: array-like of shape (n_samples, 1) or (n_samples,) :param y: Target values. :type y: array-like of shape (n_samples,) :param upper_bound: Upper bound for spline knots. If None, uses max(X). :type upper_bound: float, optional :param lower_bound: Lower bound for spline knots. If None, uses min(X). :type lower_bound: float, optional :returns: **self** -- Returns self for method chaining. :rtype: GAMRegressor .. py:method:: _check_is_fitted() Check if the model has been fitted. .. py:method:: predict(X) Predict using the fitted model. :param X: Samples to predict. :type X: array-like of shape (n_samples, 1) or (n_samples,) :returns: **y_pred** -- Predicted values. :rtype: ndarray of shape (n_samples,) :raises ValueError: If the model has not been fitted. .. py:method:: aic() Return Akaike Information Criterion. .. py:method:: bic() Return Bayesian Information Criterion. .. py:method:: mse(X, y) Return Mean Squared Error on given data. :param X: Input data. :type X: array-like :param y: True target values. :type y: array-like :returns: Mean squared error. :rtype: float .. py:method:: r_squared(X, y) Return R-squared (coefficient of determination) on given data. :param X: Input data. :type X: array-like :param y: True target values. :type y: array-like :returns: R-squared value. 1.0 indicates perfect fit, 0.0 indicates model performs same as predicting the mean. :rtype: float .. py:method:: predict_with_ci(X) Predict with confidence intervals. :param X: Samples to predict. :type X: array-like of shape (n_samples, 1) or (n_samples,) :returns: * **mean** (*ndarray*) -- Predicted mean values. * **ci_lower** (*ndarray*) -- Lower bound of confidence interval. * **ci_upper** (*ndarray*) -- Upper bound of confidence interval. :raises ValueError: If the model has not been fitted. .. py:function:: choose_cross_validator(X) Choose appropriate cross-validator based on sample size. :param X: Input data. :type X: ndarray :returns: Cross-validation strategy. :rtype: BaseCrossValidator .. py:function:: _fit_and_score_ic(alpha, X, y, metric, lower_bound, upper_bound, degree_of_freedom, degree, family) Fit GAM and return alpha with its information criterion (aic, bic) score. .. py:function:: _fit_and_score_cv(alpha, fold_idx, train_index, test_index, X, y, metric, lower_bound, upper_bound, degree_of_freedom, degree, family) Fit GAM on fold of test/train data and return alpha, fold index, and score. .. py:function:: optimize_gam_cv(X, y, alphas, cross_validator, metric = 'aic', lower_bound = None, upper_bound = None, degree_of_freedom = 20, degree = 3, family = 'gaussian') Optimize GAM smoothing parameter using cross-validation. :param X: Training data. :type X: ndarray :param y: Target values. :type y: ndarray :param alphas: Array of alpha values to search. :type alphas: ndarray :param cross_validator: Cross-validation strategy. :type cross_validator: BaseCrossValidator :param metric: Metric to optimize. AIC/BIC are computed on full data, others use cross-validation. Note: r_squared is maximized, others are minimized. :type metric: {"aic", "bic", "euclidean", "mse", "r_squared"}, default="aic" :param lower_bound: Lower bound for spline knots. :type lower_bound: float, optional :param upper_bound: Upper bound for spline knots. :type upper_bound: float, optional :param degree_of_freedom: Degrees of freedom for the spline basis. :type degree_of_freedom: int, default=20 :param degree: Degree of the B-spline basis (cubic splines by default). :type degree: int, default=3 :param family: Distribution family for the GLM. :type family: {"gaussian", "poisson", "binomial"}, default="gaussian" :returns: * **best_alpha** (*float*) -- Optimal alpha value. * **best_gam** (*GAMRegressor*) -- Fitted GAM with optimal alpha. .. py:function:: optimize_gam(X, y, cross_validator = None, alphas = None, lower_bound = None, upper_bound = None, bound_padding_ratio = 0.1, metric = 'aic', degree_of_freedom = 20, degree = 3, family = 'gaussian') Optimize GAM smoothing parameter with automatic defaults. :param X: Training data. :type X: ndarray :param y: Target values. :type y: ndarray :param cross_validator: Cross-validation strategy. If None, chosen automatically. :type cross_validator: BaseCrossValidator, optional :param alphas: Array of alpha values to search. Defaults to logspace(-4, 4, 100). :type alphas: ndarray, optional :param lower_bound: Lower bound for spline knots. If None, computed as min(X) minus padding based on bound_padding_ratio. :type lower_bound: float, optional :param upper_bound: Upper bound for spline knots. If None, computed as max(X) plus padding based on bound_padding_ratio. :type upper_bound: float, optional :param bound_padding_ratio: Fraction of the data range to use as padding when computing default bounds. For example, 0.1 adds 10% of (max(X) - min(X)) as padding. :type bound_padding_ratio: float, default=0.1 :param metric: Metric to optimize. AIC/BIC are computed on full data, others use cross-validation. Note: r_squared is maximized, others are minimized. :type metric: {"aic", "bic", "euclidean", "mse", "r_squared"}, default="aic" :param degree_of_freedom: Degrees of freedom for the spline basis. :type degree_of_freedom: int, default=20 :param degree: Degree of the B-spline basis (cubic splines by default). :type degree: int, default=3 :param family: Distribution family for the GLM. :type family: {"gaussian", "poisson", "binomial"}, default="gaussian" :returns: * **best_alpha** (*float*) -- Optimal alpha value. * **best_gam** (*GAMRegressor*) -- Fitted GAM with optimal alpha. .. rubric:: Examples >>> from ecoscope.analysis.trend_analysis import optimize_gam >>> import numpy as np >>> X = np.array([2000, 2001, 2002, 2003, 2004]) >>> y = np.array([100, 95, 90, 85, 80]) >>> alpha, gam = optimize_gam(X, y, metric="aic") .. py:function:: get_forest_cover_trends(aoi, tree_cover_threshold = 60.0, scale = 30, max_pixels = 1000000000.0) Extract forest cover trends from Google Earth Engine dataset. :param aoi: Area of interest geometry (must have CRS set). :type aoi: gpd.GeoDataFrame :param tree_cover_threshold: Minimum tree cover percentage to consider as forest (0-100). :type tree_cover_threshold: float, default=60.0 :param scale: Pixel scale in meters for reduction. :type scale: int, default=30 :param max_pixels: Maximum pixels for reduction. :type max_pixels: int, default=1e9 :returns: DataFrame with columns: - year: Year of observation - loss_area: Forest loss area in acres for that year - cumsum_loss_area: Cumulative loss area in acres - survival_area: Remaining forest area in acres :rtype: pd.DataFrame