ecoscope.analysis.classifier#
Module Contents#
- type ecoscope.analysis.classifier.ColorValue = str | float#
- type ecoscope.analysis.classifier.HexColor = str#
- ecoscope.analysis.classifier.classification_methods#
- ecoscope.analysis.classifier.apply_classification(dataframe, input_column_name, output_column_name=None, labels=None, scheme='natural_breaks', label_prefix='', label_suffix='', label_ranges=False, label_decimals=1, **kwargs)#
Classifies the data in a DataFrame column using specified classification scheme.
Args: dataframe (pd.DatFrame): The data. input_column_name (str): The dataframe column to classify. output_column_names (str): The dataframe column that will contain the classification.
Defaults to “<input_column_name>_classified”
labels (list[str]): labels of bins, use bin edges if labels==None. scheme (str): Classification scheme to use [equal_interval, natural_breaks, quantile, std_mean, max_breaks, fisher_jenks] label_prefix (str): Prepends provided string to each label label_suffix (str): Appends provided string to each label label_ranges (bool): Applicable only when ‘labels’ is not set
If True, generated labels will be the range between bin edges, rather than the bin edges themselves.
- label_decimals (int): Applicable only when ‘labels’ is not set
Specifies the number of decimal places in the label
- **kwargs:
Additional keyword arguments specific to the classification scheme, passed to mapclassify. See below
Applicable to equal_interval, natural_breaks, quantile, max_breaks & fisher_jenks: k (int): The number of classes required
Applicable only to natural_breaks: initial (int): The number of initial solutions generated with different centroids.
The best of initial results are returned.
Applicable only to max_breaks: mindiff (float): The minimum difference between class breaks.
Applicable only to std_mean: multiples (numpy.array): The multiples of the standard deviation to add/subtract
from the sample mean to define the bins.
anchor (bool): Anchor upper bound of one class to the sample mean.
For more information, see https://pysal.org/mapclassify/api.html
Returns: The input dataframe with a classification column appended.
- Parameters:
dataframe (pandas.DataFrame)
input_column_name (str)
output_column_name (str | None)
labels (list[str] | None)
scheme (Literal['equal_interval', 'natural_breaks', 'quantile', 'std_mean', 'max_breaks', 'fisher_jenks'])
label_prefix (str)
label_suffix (str)
label_ranges (bool)
label_decimals (int)
- Return type:
pandas.DataFrame
- ecoscope.analysis.classifier.apply_color_map(dataframe, input_column_name, cmap, output_column_name=None)#
Creates a new column on the provided dataframe with the given cmap applied over the specified input column
Args: dataframe (pd.DatFrame): The data. input_column_name (str): The dataframe column who’s values will be inform the cmap values. cmap (str, list, dict): Either a named mpl.colormap, a list of string hex values, or a dict mapping
values to hex color strings. When a dict is provided, each key is a data value and each value is a hex color string (e.g. {“stop”: “#FF0000”, “go”: “#00FF00”}). Data values not present in the dict are given set as fully transparent.
- output_column_name(str): The dataframe column that will contain the classification.
Defaults to “<input_column_name>_colormap”
Returns: The input dataframe with a color map appended.
- Parameters:
dataframe (pandas.DataFrame)
input_column_name (str)
cmap (str | list[HexColor] | dict[ColorValue, HexColor])
output_column_name (str | None)
- Return type:
pandas.DataFrame
- ecoscope.analysis.classifier.classify_percentile(df, percentile_levels, input_column_name, output_column_name='percentile')#
Creates a new column on the provided dataframe with the percentile bin of the input_column Uses much the same methodology as get_percentile_area but applies generally to a numeric dataframe column instead of a raster grid
Args: df (pd.DataFrame | gpd.GeoDatFrame): The data. percentile_levels (list[int]): list of k-th percentile scores. input_column_name (str): The column to apply classification to. output_column_name (str): The dataframe column that will contain the classification.
Defaults to “percentile”
Returns: The input dataframe with percentile classification appended.
- Parameters:
df (pandas.DataFrame | geopandas.GeoDataFrame)
percentile_levels (list[int])
input_column_name (str)
output_column_name (str)
- Return type:
pandas.DataFrame | geopandas.GeoDataFrame