cat2cat.dataclass
Classes
The dataclass to represent the data argument used in the cat2cat procedure |
|
The dataclass to represent the mappings argument used in the cat2cat procedure |
|
The dataclass to represent the ml argument used in the cat2cat procedure |
Module Contents
- class cat2cat.dataclass.cat2cat_data
The dataclass to represent the data argument used in the cat2cat procedure
- Parameters:
old (DataFrame) – older time point in a panel, has to have all columns set in the rest of arguments.
new (DataFrame) – newer time point in a panel, has to have all columns set in the rest of arguments.
cat_var_old (str) – name of the categorical variable in the older time point.
cat_var_new (str) – name of the categorical variable in the newer time point.
time_var (str) – name of the time variable.
id_var (Optional[str]) – name of the unique identifier variable - if this is specified then for subjects observe in both periods the direct mapping is applied.
multiplier_var (Optional[str]) – name of the multiplier variable - number of replication needed to reproduce the population.
- old: pandas.DataFrame
- new: pandas.DataFrame
- cat_var_old: str
- cat_var_new: str
- time_var: str
- id_var: str | None = None
- multiplier_var: str | None = None
- __post_init__() None
- class cat2cat.dataclass.cat2cat_mappings
The dataclass to represent the mappings argument used in the cat2cat procedure
- Parameters:
trans (DataFrame) – mapping (transition) table (with 2 columns, old and new encoding) - all categories for cat_var in old and new datasets have to be included.
diretion (str) – “backward” or “forward”
freqs (Optional[Dict[Any, int]]) – If It is not provided then is assessed automatically. Artificial counts for each variable level in the base period. It is optional nevertheless will be often needed, as gives more control.
Note
The mapping (transition) table should to have a candidate for each category from the targeted for an update period. The observation from targeted for an updated period without a matched category from base period is removed.
- trans: pandas.DataFrame
- direction: str
- freqs: Dict[Any, int] | None = None
- __post_init__() None
- class cat2cat.dataclass.cat2cat_ml
The dataclass to represent the ml argument used in the cat2cat procedure
- Parameters:
data (DataFrame) – dataset with features and the cat_var.
cat_var (str) – the dependent variable name.
features (Sequence[str]) – list of feature names. Numeric/logical columns are used directly; categorical/object/string columns are one-hot encoded by the ML helpers.
models (Sequence[ClassifierMixin]) – scikit-learn classifier instances.
on_fail (str) – how failed ML weights are handled: “freq”, “naive”, “na”, or “error”.
fail_warn (bool) – warn when failed ML weights are replaced or retained as missing.
- data: pandas.DataFrame
- cat_var: str
- features: Sequence[str]
- models: Sequence[sklearn.base.ClassifierMixin]
- on_fail: str = 'freq'
- fail_warn: bool = True
- __post_init__() None