cat2cat.mappings

Functions

`get_mappings`(→ Dict[str, Dict[Any, List[Any]]])	Transforming a mapping table with mappings to two associative lists
`get_freqs`(→ Dict[Any, int])	Getting frequencies from a vector with an optional multiplier
`cat_apply_freq`(→ Dict[Any, List[float]])	Applying frequencies to the object returned by the get_mappings function

Module Contents

cat2cat.mappings.get_mappings(x: Table) → Dict[str, Dict[Any, List[Any]]]

Transforming a mapping table with mappings to two associative lists

Transforming a transition table with mappings to two associative lists to rearrange the one classification encoding into another, an associative list that maps keys to values is used. More precisely, an association list is used which is a linked list in which each list element consists of a key and value or values. An association list where unique categories codes are keys and matching categories from next or previous time point are values. A transition table is used to build such associative lists.

Parameters:: x (pandas.DataFrame or numpy.ndarray) – transition table with 2 columns where first column is assumed to be the older encoding.
Returns:: dict with 2 internal dicts, to_old and to_new.
Return type:: Dict[str, Dict[Any, List[Any]]]

Note

There was made an effort to handle missings properly but please try to avoid of using NaN or None. It is recommended to use string or float types. Alternative solution can be representing missing values as a specific number (9999) or string (“Missing”).

>>> from cat2cat.mappings import get_mappings
>>> from numpy import array, nan
>>> trans = array([
...   [1111, 111101], [1111, 111102], [1123, 111405], [nan, 111405],
...   [1212, 112006], [1212, 112008], [1212, 112090], [1212, nan],
... ])
>>> mappings = get_mappings(trans)
>>> mappings["to_old"]
{111101.0: [1111.0], 111102.0: [1111.0], 111405.0: [1123.0, nan], 112006.0: [1212.0], 112008.0: [1212.0], 112090.0: [1212.0], nan: [1212.0]}
>>> mappings["to_new"]
{1111.0: [111101.0, 111102.0], 1123.0: [111405.0], 1212.0: [112006.0, 112008.0, 112090.0, nan], nan: [111405.0]}

cat2cat.mappings.get_freqs(x: Sequence[Any], multiplier: Sequence[int] | None = None) → Dict[Any, int]

Getting frequencies from a vector with an optional multiplier

Parameters:

x (Sequence[Any]) – a list like, categorical variable to summarize.
multiplier (Optional[Sequence[int]]) – a list like, how many times to repeat certain value, additional weights. Have the same length as the x argument. Defaults to None.

Returns:

with unique values and their counts

Return type:

dict

>>> get_freqs([1,1,1,2,1,2,2,11])
{1: 4, 2: 3, 11: 1}

cat2cat.mappings.cat_apply_freq(to_x: Dict[Any, List[Any]], freqs: Dict[Any, int]) → Dict[Any, List[float]]

Applying frequencies to the object returned by the get_mappings function

Parameters:

to_x (Dict[Any, List[Any]]) – object returned by get_mappings function.
freqs (Dict[Any, int]) – object like the one returned by the get_freqs function.

Returns:

the same shape as the to_x arg but the values are probabilities now.

Return type:

Dict[Any, List[float]]

Note

freqs arg keys and to_x arg values have to be of the same type

>>> from cat2cat.mappings import get_mappings, get_freqs, cat_apply_freq
>>> from cat2cat.datasets import load_trans, load_occup
>>> mappings = get_mappings(load_trans())
>>> occup = load_occup()
>>> codes_new = occup.code[occup.year == 2010].map(str).values
>>> freqs = get_freqs(codes_new)
>>> mapp_new_p = cat_apply_freq(mappings["to_new"], freqs)
>>> mappings["to_new"]['3481']
['441401', '441402', '441403', '441490']
>>> mapp_new_p['3481']
[0.0, 0.6, 0.0, 0.4]