cat2cat.mappings ================ .. py:module:: cat2cat.mappings Functions --------- .. autoapisummary:: cat2cat.mappings.get_mappings cat2cat.mappings.get_freqs cat2cat.mappings.cat_apply_freq Module Contents --------------- .. py:function:: get_mappings(x: Table) -> Dict[str, Dict[Any, List[Any]]] Transforming a mapping table with mappings to two associative lists Transforming a transition table with mappings to two associative lists to rearrange the one classification encoding into another, an associative list that maps keys to values is used. More precisely, an association list is used which is a linked list in which each list element consists of a key and value or values. An association list where unique categories codes are keys and matching categories from next or previous time point are values. A transition table is used to build such associative lists. :param x: transition table with 2 columns where first column is assumed to be the older encoding. :type x: pandas.DataFrame or numpy.ndarray :returns: dict with 2 internal dicts, `to_old` and `to_new`. :rtype: Dict[str, Dict[Any, List[Any]]] .. note:: There was made an effort to handle missings properly but please try to avoid of using NaN or None. It is recommended to use string or float types. Alternative solution can be representing missing values as a specific number (9999) or string ("Missing"). >>> from cat2cat.mappings import get_mappings >>> from numpy import array, nan >>> trans = array([ ... [1111, 111101], [1111, 111102], [1123, 111405], [nan, 111405], ... [1212, 112006], [1212, 112008], [1212, 112090], [1212, nan], ... ]) >>> mappings = get_mappings(trans) >>> mappings["to_old"] {111101.0: [1111.0], 111102.0: [1111.0], 111405.0: [1123.0, nan], 112006.0: [1212.0], 112008.0: [1212.0], 112090.0: [1212.0], nan: [1212.0]} >>> mappings["to_new"] {1111.0: [111101.0, 111102.0], 1123.0: [111405.0], 1212.0: [112006.0, 112008.0, 112090.0, nan], nan: [111405.0]} .. py:function:: get_freqs(x: Sequence[Any], multiplier: Optional[Sequence[int]] = None) -> Dict[Any, int] Getting frequencies from a vector with an optional multiplier :param x: a list like, categorical variable to summarize. :type x: Sequence[Any] :param multiplier: a list like, how many times to repeat certain value, additional weights. Have the same length as the x argument. Defaults to None. :type multiplier: Optional[Sequence[int]] :returns: with unique values and their counts :rtype: dict >>> get_freqs([1,1,1,2,1,2,2,11]) {1: 4, 2: 3, 11: 1} .. py:function:: cat_apply_freq(to_x: Dict[Any, List[Any]], freqs: Dict[Any, int]) -> Dict[Any, List[float]] Applying frequencies to the object returned by the `get_mappings` function :param to_x: object returned by `get_mappings` function. :type to_x: Dict[Any, List[Any]] :param freqs: object like the one returned by the `get_freqs` function. :type freqs: Dict[Any, int] :returns: the same shape as the to_x arg but the values are probabilities now. :rtype: Dict[Any, List[float]] .. note:: freqs arg keys and to_x arg values have to be of the same type >>> from cat2cat.mappings import get_mappings, get_freqs, cat_apply_freq >>> from cat2cat.datasets import load_trans, load_occup >>> mappings = get_mappings(load_trans()) >>> occup = load_occup() >>> codes_new = occup.code[occup.year == 2010].map(str).values >>> freqs = get_freqs(codes_new) >>> mapp_new_p = cat_apply_freq(mappings["to_new"], freqs) >>> mappings["to_new"]['3481'] ['441401', '441402', '441403', '441490'] >>> mapp_new_p['3481'] [0.0, 0.6, 0.0, 0.4]