ecoscope.analysis.classifier
============================

.. py:module:: ecoscope.analysis.classifier


Module Contents
---------------

.. py:type:: ColorValue
   :canonical: str | float


.. py:type:: HexColor
   :canonical: str


.. py:data:: classification_methods

.. py:function:: apply_classification(dataframe, input_column_name, output_column_name = None, labels = None, scheme = 'natural_breaks', label_prefix = '', label_suffix = '', label_ranges = False, label_decimals = 1, **kwargs)

   Classifies the data in a DataFrame column using specified classification scheme.

   Args:
   dataframe (pd.DatFrame): The data.
   input_column_name (str): The dataframe column to classify.
   output_column_names (str): The dataframe column that will contain the classification.
       Defaults to "<input_column_name>_classified"
   labels (list[str]): labels of bins, use bin edges if labels==None.
   scheme (str): Classification scheme to use [equal_interval, natural_breaks, quantile, std_mean, max_breaks,
   fisher_jenks]
   label_prefix (str): Prepends provided string to each label
   label_suffix (str): Appends provided string to each label
   label_ranges (bool): Applicable only when 'labels' is not set
                        If True, generated labels will be the range between bin edges,
                        rather than the bin edges themselves.
   label_decimals (int): Applicable only when 'labels' is not set
                         Specifies the number of decimal places in the label


   **kwargs:
       Additional keyword arguments specific to the classification scheme, passed to mapclassify.
       See below

   Applicable to equal_interval, natural_breaks, quantile, max_breaks & fisher_jenks:
   k (int): The number of classes required

   Applicable only to natural_breaks:
   initial (int): The number of initial solutions generated with different centroids.
       The best of initial results are returned.

   Applicable only to max_breaks:
   mindiff (float): The minimum difference between class breaks.

   Applicable only to std_mean:
   multiples (numpy.array): The multiples of the standard deviation to add/subtract
       from the sample mean to define the bins.
   anchor (bool): Anchor upper bound of one class to the sample mean.

   For more information, see https://pysal.org/mapclassify/api.html

   Returns:
   The input dataframe with a classification column appended.


.. py:function:: apply_color_map(dataframe, input_column_name, cmap, output_column_name = None)

   Creates a new column on the provided dataframe with the given cmap applied over the specified input column

   Args:
   dataframe (pd.DatFrame): The data.
   input_column_name (str): The dataframe column who's values will be inform the cmap values.
   cmap (str, list, dict): Either a named mpl.colormap, a list of string hex values, or a dict mapping
       values to hex color strings. When a dict is provided, each key is a data value and each value is
       a hex color string (e.g. {"stop": "#FF0000", "go": "#00FF00"}). Data values not present in the
       dict are given set as fully transparent.
   output_column_name(str): The dataframe column that will contain the classification.
       Defaults to "<input_column_name>_colormap"

   Returns:
   The input dataframe with a color map appended.


.. py:function:: classify_percentile(df, percentile_levels, input_column_name, output_column_name = 'percentile')

   Creates a new column on the provided dataframe with the percentile bin of the input_column
   Uses much the same methodology as `get_percentile_area` but applies
   generally to a numeric dataframe column instead of a raster grid

   Args:
   df (pd.DataFrame | gpd.GeoDatFrame): The data.
   percentile_levels (list[int]): list of k-th percentile scores.
   input_column_name (str): The column to apply classification to.
   output_column_name (str): The dataframe column that will contain the classification.
       Defaults to "percentile"

   Returns:
   The input dataframe with percentile classification appended.