cat2cat.dataclass

Classes

cat2cat_data

The dataclass to represent the data argument used in the cat2cat procedure

cat2cat_mappings

The dataclass to represent the mappings argument used in the cat2cat procedure

cat2cat_ml

The dataclass to represent the ml argument used in the cat2cat procedure

Module Contents

class cat2cat.dataclass.cat2cat_data

The dataclass to represent the data argument used in the cat2cat procedure

Parameters:
  • old (DataFrame) – older time point in a panel, has to have all columns set in the rest of arguments.

  • new (DataFrame) – newer time point in a panel, has to have all columns set in the rest of arguments.

  • cat_var_old (str) – name of the categorical variable in the older time point.

  • cat_var_new (str) – name of the categorical variable in the newer time point.

  • time_var (str) – name of the time variable.

  • id_var (Optional[str]) – name of the unique identifier variable - if this is specified then for subjects observe in both periods the direct mapping is applied.

  • multiplier_var (Optional[str]) – name of the multiplier variable - number of replication needed to reproduce the population.

old: pandas.DataFrame
new: pandas.DataFrame
cat_var_old: str
cat_var_new: str
time_var: str
id_var: str | None = None
multiplier_var: str | None = None
__post_init__() None
class cat2cat.dataclass.cat2cat_mappings

The dataclass to represent the mappings argument used in the cat2cat procedure

Parameters:
  • trans (DataFrame) – mapping (transition) table (with 2 columns, old and new encoding) - all categories for cat_var in old and new datasets have to be included.

  • diretion (str) – “backward” or “forward”

  • freqs (Optional[Dict[Any, int]]) – If It is not provided then is assessed automatically. Artificial counts for each variable level in the base period. It is optional nevertheless will be often needed, as gives more control.

Note

The mapping (transition) table should to have a candidate for each category from the targeted for an update period. The observation from targeted for an updated period without a matched category from base period is removed.

trans: pandas.DataFrame
direction: str
freqs: Dict[Any, int] | None = None
__post_init__() None
class cat2cat.dataclass.cat2cat_ml

The dataclass to represent the ml argument used in the cat2cat procedure

Parameters:
  • data (DataFrame) – dataset with features and the cat_var.

  • cat_var (str) – the dependent variable name.

  • features (Sequence[str]) – list of feature names. Numeric/logical columns are used directly; categorical/object/string columns are one-hot encoded by the ML helpers.

  • models (Sequence[ClassifierMixin]) – scikit-learn classifier instances.

  • on_fail (str) – how failed ML weights are handled: “freq”, “naive”, “na”, or “error”.

  • fail_warn (bool) – warn when failed ML weights are replaced or retained as missing.

data: pandas.DataFrame
cat_var: str
features: Sequence[str]
models: Sequence[sklearn.base.ClassifierMixin]
on_fail: str = 'freq'
fail_warn: bool = True
__post_init__() None