cat2cat.cat2cat_utils
=====================

.. py:module:: cat2cat.cat2cat_utils


Functions
---------

.. autoapisummary::

   cat2cat.cat2cat_utils.prune_c2c
   cat2cat.cat2cat_utils.dummy_c2c


Module Contents
---------------

.. py:function:: prune_c2c(df: pandas.DataFrame, prune_fun: Callable[[numpy.ndarray], numpy.ndarray], wei_var: str = 'wei_freq_c2c', index_var: str = 'index_c2c', inplace: bool = False) -> pandas.DataFrame

   Pruning which could be useful after the mapping process

   :param df: a specific period from the cat2cat function result.
   :type df: DataFrame
   :param prune_fun: a function to process a 1D-array of weights (float) and return a 1D-array of boolean of the same length.
                     The weighs will be reweighted automatically to still to sum to one per each original observation.
   :type prune_fun: callable
   :param wei_var: By default "wei_freq_c2c".
   :type wei_var: str
   :param index_var: By default "index_c2c".
   :type index_var: str
   :param inplace: Whether to perform the operation inplace. By default False.
   :type inplace: bool

   :returns: df argument with possibly reduced number of rows.
   :rtype: DataFrame

   .. note::

      - non-zero prune_fun - lambda x: x > 0
      - highest1 prune_fun - lambda x: arange(len(x)) == argmax(x)
      - highest prune_fun - lambda x: x == max(x)

   >>> from cat2cat import cat2cat
   >>> from cat2cat.dataclass import cat2cat_data, cat2cat_mappings, cat2cat_ml
   >>> from sklearn.ensemble import RandomForestClassifier
   >>> from cat2cat.datasets import load_trans, load_occup
   >>> trans = load_trans()
   >>> occup = load_occup()
   >>> o_old = occup.loc[occup.year == 2008, :].copy()
   >>> o_new = occup.loc[occup.year == 2010, :].copy()
   >>> data_c2c = cat2cat_data(o_old, o_new, "code", "code", "year")
   >>> mappings_c2c = cat2cat_mappings(trans, "forward")
   >>> c2c = cat2cat(data_c2c, mappings_c2c)
   >>> #
   >>> # non-zero - lambda x: x > 0
   >>> # highest1 - lambda x: arange(len(x)) == argmax(x)
   >>> # highest - lambda x: x == max(x)
   >>> #
   >>> # non-zero
   >>> prune_c2c(c2c["old"], lambda x: x > 0)
             id        age    sex  edu        exp  ...  index_c2c  g_new_c2c  rep_c2c wei_naive_c2c  wei_freq_c2c
   ...


.. py:function:: dummy_c2c(df: pandas.DataFrame, cat_var: str, models: Optional[Sequence] = None, inplace: bool = False) -> pandas.DataFrame

   Add default cat2cat columns to a `data.frame`

   The function is useful to achive consitent columns across all panel periods,
   even for ones for which cat2cat procedure was not applied.

   :param df: a specific period from the cat2cat function result.
   :type df: DataFrame
   :param cat_car: name of categorial variable
   :type cat_car: str
   :param models: an optional list of str, ml models applied (class name).
                  By default turn off, equal None.
   :type models: Optional[Sequence]
   :param inplace: Whether to perform the operation inplace. By default False.
   :type inplace: bool

   :returns: df arg DataFrame but with additional columns connected with cat2cat procedure.
             The base added columns if not already exist: index_c2c, g_new_c2c, rep_c2c, wei_naive_c2c, wei_freq_c2c.
             Additionaly ml models connected columns like wei_MLNAME_c2c.
   :rtype: DataFrame