ecoscope.analysis.classifier#

Module Contents#

type ecoscope.analysis.classifier.ColorValue = str | float#
type ecoscope.analysis.classifier.HexColor = str#
ecoscope.analysis.classifier.classification_methods#
ecoscope.analysis.classifier.apply_classification(dataframe, input_column_name, output_column_name=None, labels=None, scheme='natural_breaks', label_prefix='', label_suffix='', label_ranges=False, label_decimals=1, **kwargs)#

Classifies the data in a DataFrame column using specified classification scheme.

Args: dataframe (pd.DatFrame): The data. input_column_name (str): The dataframe column to classify. output_column_names (str): The dataframe column that will contain the classification.

Defaults to “<input_column_name>_classified”

labels (list[str]): labels of bins, use bin edges if labels==None. scheme (str): Classification scheme to use [equal_interval, natural_breaks, quantile, std_mean, max_breaks, fisher_jenks] label_prefix (str): Prepends provided string to each label label_suffix (str): Appends provided string to each label label_ranges (bool): Applicable only when ‘labels’ is not set

If True, generated labels will be the range between bin edges, rather than the bin edges themselves.

label_decimals (int): Applicable only when ‘labels’ is not set

Specifies the number of decimal places in the label

**kwargs:

Additional keyword arguments specific to the classification scheme, passed to mapclassify. See below

Applicable to equal_interval, natural_breaks, quantile, max_breaks & fisher_jenks: k (int): The number of classes required

Applicable only to natural_breaks: initial (int): The number of initial solutions generated with different centroids.

The best of initial results are returned.

Applicable only to max_breaks: mindiff (float): The minimum difference between class breaks.

Applicable only to std_mean: multiples (numpy.array): The multiples of the standard deviation to add/subtract

from the sample mean to define the bins.

anchor (bool): Anchor upper bound of one class to the sample mean.

For more information, see https://pysal.org/mapclassify/api.html

Returns: The input dataframe with a classification column appended.

Parameters:
  • dataframe (pandas.DataFrame)

  • input_column_name (str)

  • output_column_name (str | None)

  • labels (list[str] | None)

  • scheme (Literal['equal_interval', 'natural_breaks', 'quantile', 'std_mean', 'max_breaks', 'fisher_jenks'])

  • label_prefix (str)

  • label_suffix (str)

  • label_ranges (bool)

  • label_decimals (int)

Return type:

pandas.DataFrame

ecoscope.analysis.classifier.apply_color_map(dataframe, input_column_name, cmap, output_column_name=None)#

Creates a new column on the provided dataframe with the given cmap applied over the specified input column

Args: dataframe (pd.DatFrame): The data. input_column_name (str): The dataframe column who’s values will be inform the cmap values. cmap (str, list, dict): Either a named mpl.colormap, a list of string hex values, or a dict mapping

values to hex color strings. When a dict is provided, each key is a data value and each value is a hex color string (e.g. {“stop”: “#FF0000”, “go”: “#00FF00”}). Data values not present in the dict are given set as fully transparent.

output_column_name(str): The dataframe column that will contain the classification.

Defaults to “<input_column_name>_colormap”

Returns: The input dataframe with a color map appended.

Parameters:
  • dataframe (pandas.DataFrame)

  • input_column_name (str)

  • cmap (str | list[HexColor] | dict[ColorValue, HexColor])

  • output_column_name (str | None)

Return type:

pandas.DataFrame

ecoscope.analysis.classifier.classify_percentile(df, percentile_levels, input_column_name, output_column_name='percentile')#

Creates a new column on the provided dataframe with the percentile bin of the input_column Uses much the same methodology as get_percentile_area but applies generally to a numeric dataframe column instead of a raster grid

Args: df (pd.DataFrame | gpd.GeoDatFrame): The data. percentile_levels (list[int]): list of k-th percentile scores. input_column_name (str): The column to apply classification to. output_column_name (str): The dataframe column that will contain the classification.

Defaults to “percentile”

Returns: The input dataframe with percentile classification appended.

Parameters:
  • df (pandas.DataFrame | geopandas.GeoDataFrame)

  • percentile_levels (list[int])

  • input_column_name (str)

  • output_column_name (str)

Return type:

pandas.DataFrame | geopandas.GeoDataFrame