ecoscope.analysis.trend_analysis
================================

.. py:module:: ecoscope.analysis.trend_analysis

.. autoapi-nested-parse::

   Todo:
   - Outputs a dataframe representing the GAMM for a unique dataset as a benchmark
   - Extract forest cover as a task in Ecoscope. Workflow process
   -


Module Contents
---------------

.. py:class:: GAMRegressor(alpha = 0.1, degree_of_freedom = 20, degree = 3, family = 'gaussian')

   Bases: :py:obj:`sklearn.base.BaseEstimator`, :py:obj:`sklearn.base.RegressorMixin`


   Generalized Additive Model (GAM) Regressor using B-Splines.

   A scikit-learn compatible wrapper around statsmodels GLMGam that provides
   a user-friendly interface for fitting GAMs to time series data.

   :param alpha: Smoothing parameter. Higher values result in smoother curves (more linear).
   :type alpha: float, default=0.1
   :param degree_of_freedom: Degrees of freedom for the spline basis.
   :type degree_of_freedom: int, default=20
   :param degree: Degree of the B-spline basis (cubic splines by default).
   :type degree: int, default=3
   :param family: Distribution family for the GLM.
   :type family: {"gaussian", "poisson", "binomial"}, default="gaussian"

   .. rubric:: Examples

   >>> from ecoscope.analysis.trend_analysis import GAMRegressor
   >>> import numpy as np
   >>> X = np.array([2000, 2001, 2002, 2003, 2004]).reshape(-1, 1)
   >>> y = np.array([100, 95, 90, 85, 80])
   >>> gam = GAMRegressor(alpha=0.1).fit(X, y)
   >>> predictions = gam.predict(X)


   .. py:attribute:: alpha
      :value: 0.1


   .. py:attribute:: degree_of_freedom
      :value: 20


   .. py:attribute:: degree
      :value: 3


   .. py:method:: fit(X, y, upper_bound = None, lower_bound = None)

      Fit the GAM model.

      :param X: Training data (typically time/date values).
      :type X: array-like of shape (n_samples, 1) or (n_samples,)
      :param y: Target values.
      :type y: array-like of shape (n_samples,)
      :param upper_bound: Upper bound for spline knots. If None, uses max(X).
      :type upper_bound: float, optional
      :param lower_bound: Lower bound for spline knots. If None, uses min(X).
      :type lower_bound: float, optional

      :returns: **self** -- Returns self for method chaining.
      :rtype: GAMRegressor


   .. py:method:: _check_is_fitted()

      Check if the model has been fitted.


   .. py:method:: predict(X)

      Predict using the fitted model.

      :param X: Samples to predict.
      :type X: array-like of shape (n_samples, 1) or (n_samples,)

      :returns: **y_pred** -- Predicted values.
      :rtype: ndarray of shape (n_samples,)

      :raises ValueError: If the model has not been fitted.


   .. py:method:: aic()

      Return Akaike Information Criterion.


   .. py:method:: bic()

      Return Bayesian Information Criterion.


   .. py:method:: mse(X, y)

      Return Mean Squared Error on given data.

      :param X: Input data.
      :type X: array-like
      :param y: True target values.
      :type y: array-like

      :returns: Mean squared error.
      :rtype: float


   .. py:method:: r_squared(X, y)

      Return R-squared (coefficient of determination) on given data.

      :param X: Input data.
      :type X: array-like
      :param y: True target values.
      :type y: array-like

      :returns: R-squared value. 1.0 indicates perfect fit, 0.0 indicates
                model performs same as predicting the mean.
      :rtype: float


   .. py:method:: predict_with_ci(X)

      Predict with confidence intervals.

      :param X: Samples to predict.
      :type X: array-like of shape (n_samples, 1) or (n_samples,)

      :returns: * **mean** (*ndarray*) -- Predicted mean values.
                * **ci_lower** (*ndarray*) -- Lower bound of confidence interval.
                * **ci_upper** (*ndarray*) -- Upper bound of confidence interval.

      :raises ValueError: If the model has not been fitted.


.. py:function:: choose_cross_validator(X)

   Choose appropriate cross-validator based on sample size.

   :param X: Input data.
   :type X: ndarray

   :returns: Cross-validation strategy.
   :rtype: BaseCrossValidator


.. py:function:: _fit_and_score_ic(alpha, X, y, metric, lower_bound, upper_bound, degree_of_freedom, degree, family)

   Fit GAM and return alpha with its information criterion (aic, bic) score.


.. py:function:: _fit_and_score_cv(alpha, fold_idx, train_index, test_index, X, y, metric, lower_bound, upper_bound, degree_of_freedom, degree, family)

   Fit GAM on fold of test/train data and return alpha, fold index, and score.


.. py:function:: optimize_gam_cv(X, y, alphas, cross_validator, metric = 'aic', lower_bound = None, upper_bound = None, degree_of_freedom = 20, degree = 3, family = 'gaussian')

   Optimize GAM smoothing parameter using cross-validation.

   :param X: Training data.
   :type X: ndarray
   :param y: Target values.
   :type y: ndarray
   :param alphas: Array of alpha values to search.
   :type alphas: ndarray
   :param cross_validator: Cross-validation strategy.
   :type cross_validator: BaseCrossValidator
   :param metric: Metric to optimize. AIC/BIC are computed on full data, others use
                  cross-validation. Note: r_squared is maximized, others are minimized.
   :type metric: {"aic", "bic", "euclidean", "mse", "r_squared"}, default="aic"
   :param lower_bound: Lower bound for spline knots.
   :type lower_bound: float, optional
   :param upper_bound: Upper bound for spline knots.
   :type upper_bound: float, optional
   :param degree_of_freedom: Degrees of freedom for the spline basis.
   :type degree_of_freedom: int, default=20
   :param degree: Degree of the B-spline basis (cubic splines by default).
   :type degree: int, default=3
   :param family: Distribution family for the GLM.
   :type family: {"gaussian", "poisson", "binomial"}, default="gaussian"

   :returns: * **best_alpha** (*float*) -- Optimal alpha value.
             * **best_gam** (*GAMRegressor*) -- Fitted GAM with optimal alpha.


.. py:function:: optimize_gam(X, y, cross_validator = None, alphas = None, lower_bound = None, upper_bound = None, bound_padding_ratio = 0.1, metric = 'aic', degree_of_freedom = 20, degree = 3, family = 'gaussian')

   Optimize GAM smoothing parameter with automatic defaults.

   :param X: Training data.
   :type X: ndarray
   :param y: Target values.
   :type y: ndarray
   :param cross_validator: Cross-validation strategy. If None, chosen automatically.
   :type cross_validator: BaseCrossValidator, optional
   :param alphas: Array of alpha values to search. Defaults to logspace(-4, 4, 100).
   :type alphas: ndarray, optional
   :param lower_bound: Lower bound for spline knots. If None, computed as min(X) minus
                       padding based on bound_padding_ratio.
   :type lower_bound: float, optional
   :param upper_bound: Upper bound for spline knots. If None, computed as max(X) plus
                       padding based on bound_padding_ratio.
   :type upper_bound: float, optional
   :param bound_padding_ratio: Fraction of the data range to use as padding when computing default
                               bounds. For example, 0.1 adds 10% of (max(X) - min(X)) as padding.
   :type bound_padding_ratio: float, default=0.1
   :param metric: Metric to optimize. AIC/BIC are computed on full data, others use
                  cross-validation. Note: r_squared is maximized, others are minimized.
   :type metric: {"aic", "bic", "euclidean", "mse", "r_squared"}, default="aic"
   :param degree_of_freedom: Degrees of freedom for the spline basis.
   :type degree_of_freedom: int, default=20
   :param degree: Degree of the B-spline basis (cubic splines by default).
   :type degree: int, default=3
   :param family: Distribution family for the GLM.
   :type family: {"gaussian", "poisson", "binomial"}, default="gaussian"

   :returns: * **best_alpha** (*float*) -- Optimal alpha value.
             * **best_gam** (*GAMRegressor*) -- Fitted GAM with optimal alpha.

   .. rubric:: Examples

   >>> from ecoscope.analysis.trend_analysis import optimize_gam
   >>> import numpy as np
   >>> X = np.array([2000, 2001, 2002, 2003, 2004])
   >>> y = np.array([100, 95, 90, 85, 80])
   >>> alpha, gam = optimize_gam(X, y, metric="aic")


.. py:function:: get_forest_cover_trends(aoi, tree_cover_threshold = 60.0, scale = 30, max_pixels = 1000000000.0)

   Extract forest cover trends from Google Earth Engine dataset.

   :param aoi: Area of interest geometry (must have CRS set).
   :type aoi: gpd.GeoDataFrame
   :param tree_cover_threshold: Minimum tree cover percentage to consider as forest (0-100).
   :type tree_cover_threshold: float, default=60.0
   :param scale: Pixel scale in meters for reduction.
   :type scale: int, default=30
   :param max_pixels: Maximum pixels for reduction.
   :type max_pixels: int, default=1e9

   :returns: DataFrame with columns:
             - year: Year of observation
             - loss_area: Forest loss area in acres for that year
             - cumsum_loss_area: Cumulative loss area in acres
             - survival_area: Remaining forest area in acres
   :rtype: pd.DataFrame