cat2cat.dataclass ================= .. py:module:: cat2cat.dataclass Classes ------- .. autoapisummary:: cat2cat.dataclass.cat2cat_data cat2cat.dataclass.cat2cat_mappings cat2cat.dataclass.cat2cat_ml Module Contents --------------- .. py:class:: cat2cat_data The dataclass to represent the data argument used in the cat2cat procedure :param old: older time point in a panel, has to have all columns set in the rest of arguments. :type old: DataFrame :param new: newer time point in a panel, has to have all columns set in the rest of arguments. :type new: DataFrame :param cat_var_old: name of the categorical variable in the older time point. :type cat_var_old: str :param cat_var_new: name of the categorical variable in the newer time point. :type cat_var_new: str :param time_var: name of the time variable. :type time_var: str :param id_var: name of the unique identifier variable - if this is specified then for subjects observe in both periods the direct mapping is applied. :type id_var: Optional[str] :param multiplier_var: name of the multiplier variable - number of replication needed to reproduce the population. :type multiplier_var: Optional[str] .. py:attribute:: old :type: pandas.DataFrame .. py:attribute:: new :type: pandas.DataFrame .. py:attribute:: cat_var_old :type: str .. py:attribute:: cat_var_new :type: str .. py:attribute:: time_var :type: str .. py:attribute:: id_var :type: Optional[str] :value: None .. py:attribute:: multiplier_var :type: Optional[str] :value: None .. py:method:: __post_init__() -> None .. py:class:: cat2cat_mappings The dataclass to represent the mappings argument used in the cat2cat procedure :param trans: mapping (transition) table (with 2 columns, old and new encoding) - all categories for cat_var in old and new datasets have to be included. :type trans: DataFrame :param diretion: "backward" or "forward" :type diretion: str :param freqs: If It is not provided then is assessed automatically. Artificial counts for each variable level in the base period. It is optional nevertheless will be often needed, as gives more control. :type freqs: Optional[Dict[Any, int]] .. note:: The mapping (transition) table should to have a candidate for each category from the targeted for an update period. The observation from targeted for an updated period without a matched category from base period is removed. .. py:attribute:: trans :type: pandas.DataFrame .. py:attribute:: direction :type: str .. py:attribute:: freqs :type: Optional[Dict[Any, int]] :value: None .. py:method:: __post_init__() -> None .. py:class:: cat2cat_ml The dataclass to represent the ml argument used in the cat2cat procedure :param data: dataset with features and the `cat_var`. :type data: DataFrame :param cat_var: the dependent variable name. :type cat_var: str :param features: list of feature names. Numeric/logical columns are used directly; categorical/object/string columns are one-hot encoded by the ML helpers. :type features: Sequence[str] :param models: scikit-learn classifier instances. :type models: Sequence[ClassifierMixin] :param on_fail: how failed ML weights are handled: "freq", "naive", "na", or "error". :type on_fail: str :param fail_warn: warn when failed ML weights are replaced or retained as missing. :type fail_warn: bool .. py:attribute:: data :type: pandas.DataFrame .. py:attribute:: cat_var :type: str .. py:attribute:: features :type: Sequence[str] .. py:attribute:: models :type: Sequence[sklearn.base.ClassifierMixin] .. py:attribute:: on_fail :type: str :value: 'freq' .. py:attribute:: fail_warn :type: bool :value: True .. py:method:: __post_init__() -> None