cat2cat.cat2cat_ml
Functions
|
Automatic mapping in a panel dataset - cat2cat procedure |
Module Contents
- cat2cat.cat2cat_ml.cat2cat_ml_run(mappings: cat2cat.dataclass.cat2cat_mappings, ml: cat2cat.dataclass.cat2cat_ml, **kwargs: Any) cat2cat_ml_run_results
Automatic mapping in a panel dataset - cat2cat procedure
- Parameters:
mappings (cat2cat_mappings) – dataclass with mappings related arguments. Please check out the cat2cat.dataclass.cat2cat_mappings for more information.
ml (Optional[cat2cat_ml]) – dataclass with ml related arguments. Please check out the cat2cat.dataclass.cat2cat_ml for more information.
**kwargs – additional arguments passed to the cat2cat_ml_run function. min_match (float): minimum share of categories from the base period that have to be matched in the mapping table. Between 0 and 1. Default 0.8. test_prop (float): share of the data used for testing. Between 0 and 1. Default 0.2. split_seed (int): random seed for the train_test_split function. Default 42.
- Returns:
cat2cat_ml_run_class
Note
Please check out the cat2cat.cat2cat.cat2cat for more information.
>>> from cat2cat import cat2cat >>> from cat2cat.cat2cat_ml import cat2cat_ml_run >>> from cat2cat.dataclass import cat2cat_data, cat2cat_mappings, cat2cat_ml >>> from sklearn.discriminant_analysis import LinearDiscriminantAnalysis >>> from sklearn.tree import DecisionTreeClassifier >>> from cat2cat.datasets import load_trans, load_occup >>> trans = load_trans() >>> occup = load_occup() >>> o_old = occup.loc[occup.year == 2008, :].copy() >>> o_new = occup.loc[occup.year == 2010, :].copy() >>> mappings = cat2cat_mappings(trans = trans, direction = "backward") >>> ml = cat2cat_ml( ... occup.loc[occup.year >= 2010, :].copy(), ... "code", ... ["salary", "age", "edu", "sex"], ... [DecisionTreeClassifier(random_state=1234), LinearDiscriminantAnalysis()] ... ) >>> cat2cat_ml_run(mappings = mappings, ml = ml) ...