cat2cat.dataclass
=================

.. py:module:: cat2cat.dataclass


Classes
-------

.. autoapisummary::

   cat2cat.dataclass.cat2cat_data
   cat2cat.dataclass.cat2cat_mappings
   cat2cat.dataclass.cat2cat_ml


Module Contents
---------------

.. py:class:: cat2cat_data

   The dataclass to represent the data argument used in the cat2cat procedure

   :param old: older time point in a panel, has to have all columns set in the rest of arguments.
   :type old: DataFrame
   :param new: newer time point in a panel, has to have all columns set in the rest of arguments.
   :type new: DataFrame
   :param cat_var_old: name of the categorical variable in the older time point.
   :type cat_var_old: str
   :param cat_var_new: name of the categorical variable in the newer time point.
   :type cat_var_new: str
   :param time_var: name of the time variable.
   :type time_var: str
   :param id_var: name of the unique identifier variable - if this is specified then for subjects observe in both periods the direct mapping is applied.
   :type id_var: Optional[str]
   :param multiplier_var: name of the multiplier variable - number of replication needed to reproduce the population.
   :type multiplier_var: Optional[str]


   .. py:attribute:: old
      :type:  pandas.DataFrame


   .. py:attribute:: new
      :type:  pandas.DataFrame


   .. py:attribute:: cat_var_old
      :type:  str


   .. py:attribute:: cat_var_new
      :type:  str


   .. py:attribute:: time_var
      :type:  str


   .. py:attribute:: id_var
      :type:  Optional[str]
      :value: None


   .. py:attribute:: multiplier_var
      :type:  Optional[str]
      :value: None


   .. py:method:: __post_init__() -> None


.. py:class:: cat2cat_mappings

   The dataclass to represent the mappings argument used in the cat2cat procedure

   :param trans: mapping (transition) table (with 2 columns, old and new encoding) - all categories for cat_var in old and new datasets have to be included.
   :type trans: DataFrame
   :param diretion: "backward" or "forward"
   :type diretion: str
   :param freqs: If It is not provided then is assessed automatically.
                 Artificial counts for each variable level in the base period.
                 It is optional nevertheless will be often needed, as gives more control.
   :type freqs: Optional[Dict[Any, int]]

   .. note::

      The mapping (transition) table should to have a candidate for each category from the targeted for an update period.
      The observation from targeted for an updated period without a matched category from base period is removed.


   .. py:attribute:: trans
      :type:  pandas.DataFrame


   .. py:attribute:: direction
      :type:  str


   .. py:attribute:: freqs
      :type:  Optional[Dict[Any, int]]
      :value: None


   .. py:method:: __post_init__() -> None


.. py:class:: cat2cat_ml

   The dataclass to represent the ml argument used in the cat2cat procedure

   :param data: dataset with features and the `cat_var`.
   :type data: DataFrame
   :param cat_var: the dependent variable name.
   :type cat_var: str
   :param features: list of feature names. Numeric/logical columns are used directly;
                    categorical/object/string columns are one-hot encoded by the ML helpers.
   :type features: Sequence[str]
   :param models: scikit-learn classifier instances.
   :type models: Sequence[ClassifierMixin]
   :param on_fail: how failed ML weights are handled: "freq", "naive", "na", or "error".
   :type on_fail: str
   :param fail_warn: warn when failed ML weights are replaced or retained as missing.
   :type fail_warn: bool


   .. py:attribute:: data
      :type:  pandas.DataFrame


   .. py:attribute:: cat_var
      :type:  str


   .. py:attribute:: features
      :type:  Sequence[str]


   .. py:attribute:: models
      :type:  Sequence[sklearn.base.ClassifierMixin]


   .. py:attribute:: on_fail
      :type:  str
      :value: 'freq'


   .. py:attribute:: fail_warn
      :type:  bool
      :value: True


   .. py:method:: __post_init__() -> None