metabci.brainda.algorithms.utils.model_selection module

class metabci.brainda.algorithms.utils.model_selection.EnhancedLeaveOneGroupOut(return_validate: bool = True)[source]

Bases: LeaveOneGroupOut

Leave one method for cross-validation. Performs leave-one method cross validation that can contain validation sets.

author:Swolf <swolfforever@gmail.com>

Created on:2021-11-29

update log:

2023-12-26 by sunchang<18822197631@163.com>

Parameters:

return_validate (bool) – Whether a validation set is required, which defaults to True.

return_validate

Same as return_validate in Parameters.

Type:

bool

validate_spliter

Validate set divider, valid only if return_validate is True. See sklearn.model_selection.StratifiedShuffleSplit() for details.

Type:

sklearn.model_selection.StratifiedShuffleSplit()

set_split_request(*, groups: bool | None | str = '$UNCHANGED$') EnhancedLeaveOneGroupOut

Request metadata passed to the split method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to split if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to split.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

groups (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for groups parameter in split.

Returns:

self – The updated object.

Return type:

object

split(X, y=None, groups=None)[source]

Returns the training, validation, and test set index subscript (return_validate is True) or the training, test set data (return_validate is False).

author:Swolf <swolfforever@gmail.com>

Created on:2021-11-29

update log:

2023-12-26 by sunchang<18822197631@163.com>

X: array-like, shape(n_samples, n_features)

Training data. n_samples indicates the number of samples, and n_features indicates the number of features.

y: array-like, shape(n_samples,)

Category label.Further adjustment is required by _generate_sequential_groups(y).

groups: None

The grouping label of the sample used when the data set is split into training, validation (return_validate is True), and test sets. The number of groups (the number of validation breaks) is calculated by this parameter. The number of groups here actually determines the sample size of the “one” part of the leave-one method. For example, a set composed of 6 samples with the group number [1,1,2,3,3] means that the set is divided into three parts, with the number of samples being 2, 1 and 3 respectively. In the reserve-one method, the set composed of 2 samples,1 samples and 3 samples is regarded as a test set, and the remaining part is regarded as a training set. groups can be entered externally or computed by an internal function based on the category label.

train: ndarray

Training set sample index subscript or training set data.

validate: ndarray

Validate set sample index index subscript (return_validate is True).

test: ndarray

Test set sample index subscript or test set data.

get_n_splits:Returns the number of packet iterators, that is, the number of packets. _generate_sequential_groups:The sample group tag “groups” is generated.

class metabci.brainda.algorithms.utils.model_selection.EnhancedStratifiedKFold(n_splits: int = 5, shuffle: bool = False, return_validate: bool = True, random_state: int | RandomState | None = None)[source]

Bases: StratifiedKFold

Enhanced Stratified KFold cross-validator.

if return_validate is True, split return (train, validate, test) indexs, else (train, test) as the sklearn StratifiedKFold.fit the validate size should be the same as the test size.

Hierarchical K-fold cross-validation. When the samples are unbalanced, the data set is divided according to the proportion of each type of sample to the total sample.

Performs hierarchical k-fold cross-validation that can contain validation sets. The sample size of the validation set will be the same as that of the test set.

author:Swolf <swolfforever@gmail.com>

Created on:2021-11-29

update log:

2023-12-26 by sunchang<18822197631@163.com>

Parameters:
  • n_splits (int) – Cross validation fold, default is 5.

  • shuffle (bool) – Whether to scramble the sample order. The default is False.

  • return_validate (bool) – Whether a validation set is required, which defaults to True.

  • random_state (int or numpy.random.RandomState()) – Random initial state. When shuffle is True, random_state determines the initial ordering of the samples, hrough which the randomness of the selection of various data samples in each compromise can be controlled. See sklearn. Model_selection. StratifiedKFold () for details. The default is None.

return_validate

Same as return_validate in Parameters.

Type:

bool

validate_spliter

Validate set divider, valid only if return_validate is True. See sklearn.model_selection.StratifiedShuffleSplit() for details.

Type:

sklearn.model_selection.StratifiedShuffleSplit()

split(X, y, groups=None)[source]

Returns the training, validation, and test set index subscript (return_validate is True) or the training, test set data (return_validate is False).

author:Swolf <swolfforever@gmail.com>

Created on:2021-11-29

update log:

2023-12-26 by sunchang<18822197631@163.com>

X: array-like, shape(n_samples, n_features)

Training data. n_samples indicates the number of samples, and n_features indicates the number of features.

y: array-like, shape(n_samples,)

Category label.

groups: None

Ignorable parameter, used only for version matching.

train: ndarray

Training set sample index subscript or training set data.

validate: ndarray

Validate set sample index index subscript (return_validate is True).

test: ndarray

Test set sample index subscript or test set data.

class metabci.brainda.algorithms.utils.model_selection.EnhancedStratifiedShuffleSplit(test_size: float, train_size: float, n_splits: int = 5, validate_size: float | None = None, return_validate: bool = True, random_state: int | RandomState | None = None)[source]

Bases: StratifiedShuffleSplit

Hierarchical random cross validation. When the samples are unbalanced, the data set is divided according to the proportion of each type of sample to the total sample. Perform hierarchical random cross validation that can contain validation sets. The sample size of the validation set will be the same as that of the test set.

author:Swolf <swolfforever@gmail.com>

Created on:2021-11-29

update log:

2023-12-26 by sunchang<18822197631@163.com>

Parameters:
  • test_size (float) – Test set ratio (0-1).

  • train_size (float) – Train set ratio (0-1).

  • n_splits (int) – Cross validation fold, default is 5.

  • validate_size (float or None) – The proportion of the validation set (when return_validate is True) (0-1), defaults to None.

  • return_validate (bool) – Whether a validation set is required, which defaults to True.

  • random_state (int or numpy.random.RandomState()) – Random initial state. See sklearn. Model_selection. StratifiedShuffleSplit () for details, the default value is None.

return_validate

Same as return_validate in Parameters.

Type:

bool

validate_spliter

Validate set divider, valid only if return_validate is True. See sklearn.model_selection.StratifiedShuffleSplit() for details.

Type:

sklearn.model_selection.StratifiedShuffleSplit()

split(X, y, groups=None)[source]

Returns the training, validation, and test set index subscript (return_validate is True) or the training, test set data (return_validate is False).

author:Swolf <swolfforever@gmail.com>

Created on:2021-11-29

update log:

2023-12-26 by sunchang<18822197631@163.com>

X: array-like, shape(n_samples, n_features)

Training data. n_samples indicates the number of samples, and n_features indicates the number of features.

y: array-like, shape(n_samples,)

Category label.

groups: None

Ignorable parameter, used only for version matching.

train: ndarray

Training set sample index subscript or training set data.

validate: ndarray

Validate set sample index index subscript (return_validate is True).

test: ndarray

Test set sample index subscript or test set data.

metabci.brainda.algorithms.utils.model_selection.generate_char_indices(meta: DataFrame, kfold: int = 6, random_state: int | RandomState | None = None)[source]

Generate the trail index of train set, validation set and test set. This method directly manipulate characters

author: WuJieYu

Created on: 2023-03-17

update log:2023-12-26 by sunchang<18822197631@163.com>

Parameters:
  • meta (DataFrame) – meta of all trials.

  • kfold (int) – Number of folds for cross validation.

  • random_state (Optional[Union[int, RandomState]]) – State of random, default: None.

Returns:

indices – Trial index for train set, validation set and test set. Ensemble in a tuple.

Return type:

list

metabci.brainda.algorithms.utils.model_selection.generate_kfold_indices(meta: DataFrame, kfold: int = 5, random_state: int | RandomState | None = None)[source]

The EnhancedStratifiedKFold class is invoked at the meta data structure level to generate cross-validation grouping subscripts. The subscript of K-fold cross-validation is generated based on meta class data structure.

author:Swolf <swolfforever@gmail.com>

Created on:2021-11-29

update log:

2023-12-26 by sunchang<18822197631@163.com>

Parameters:
  • meta (pandas.DataFrame) – metaBCI’s custom data class.

  • kfold (int) – Cross validation fold, default is 5.

  • random_state (int 或 numpy.random.RandomState) – Random initial state, defaults to None.

Returns:

indices – The index subscript of the double-nested dictionary structure, the key of the outer dictionary is “subject name”, the corresponding value classes_indices is dict format, and the content is {’ e_name ‘: k_indices}. The key of the inner dictionary is the event class name and the value is the attempt index subscript k_indices for K-fold cross-validation. The variable is a list, and the internal elements are tuples (ix_train, ix_val, ix_test) composed of the indexes of the corresponding data sets.

Return type:

dict, {‘subject id’: classes_indices}

metabci.brainda.algorithms.utils.model_selection.generate_loo_indices(meta: DataFrame)[source]

The EnhancedLeaveOneGroupOut class is invoked at the meta data structure level to generate cross-validation grouping subscripts. The subscript of leave-one method cross-validation is generated based on meta class data structure.

author:Swolf <swolfforever@gmail.com>

Created on:2021-11-29

update log:

2023-12-26 by sunchang<18822197631@163.com>

Parameters:

meta (pandas.DataFrame) – metaBCI’s custom data class.

Returns:

indices – The index subscript of the double-nested dictionary structure, the key of the outer dictionary is “subject name”, the corresponding value classes_indices is dict format, and the content is {’ e_name ‘: k_indices}. The key of the inner dictionary is the event class name and the value is the attempt index subscript k_indices for K-fold cross-validation. The variable is a list, and the internal elements are tuples (ix_train, ix_val, ix_test) composed of the indexes of the corresponding data sets.

Return type:

dict, {‘subject id’: classes_indices}

metabci.brainda.algorithms.utils.model_selection.generate_shuffle_indices(meta: DataFrame, n_splits: int = 5, test_size: float = 0.1, validate_size: float = 0.1, train_size: float = 0.8, random_state: int | RandomState | None = None)[source]

Level in the meta data structure called EnhancedStratifiedShuffleSplit class, generating cross validation grouping subscript. Generate hierarchical random cross-validation subscripts based on meta-class data structures.

author:Swolf <swolfforever@gmail.com>

Created on:2021-11-29

update log:

2023-12-26 by sunchang<18822197631@163.com>

Parameters:
  • meta (pandas.DataFrame) – metaBCI’s custom data class.

  • n_splits (int) – Random verification fold, default is 5.

  • test_size (float) – The default value is 0.1.

  • validate_size (int) – The default value is 0.1, which is the same as that of the test set.

  • train_size (int) – The proportion of the number of training sets is 0.8 by default (the sum of the proportion of test sets and verification sets is 1).

  • random_state (int 或 numpy.random.RandomState) – Random initial state, defaults to None.

Returns:

indices – The index subscript of the double-nested dictionary structure, the key of the outer dictionary is “subject name”, the corresponding value classes_indices is dict format, and the content is {’ e_name ‘: k_indices}. The key of the inner dictionary is the event class name and the value is the attempt index subscript k_indices for K-fold cross-validation. The variable is a list, and the internal elements are tuples (ix_train, ix_val, ix_test) composed of the indexes of the corresponding data sets.

Return type:

dict, {‘subject id’: classes_indices}

metabci.brainda.algorithms.utils.model_selection.match_char_kfold_indices(k: int, meta: DataFrame, indices)[source]

Divide train set, validation set and test set. This method directly manipulate characters

author: WuJieYu

Created on: 2023-03-17

update log:2023-12-26 by sunchang<18822197631@163.com>

Parameters:
  • k (int) – Number of folds for cross validation.

  • meta (DataFrame) – meta of all trials.

  • indices (list) – indices of trial index.

Returns:

train_ix, val_ix, test_ix – trial index for train set, validation set and test set.

Return type:

list

metabci.brainda.algorithms.utils.model_selection.match_kfold_indices(k: int, meta: DataFrame, indices)[source]

At the level of meta data structure, hierarchical K-fold cross-validation packet subscripts are matched to generate specific indexes. Based on meta class data structure and combined with the output results of generate_kfold_indices(), the specific index is generated.

author:Swolf <swolfforever@gmail.com>

Created on:2021-11-29

update log:

2023-12-26 by sunchang<18822197631@163.com>

Parameters:
  • k (int) – Cross-verify the index of folds.

  • meta (pandas.DataFrame) – metaBCI’s custom data class.

  • indices (dict, {‘subject id’: classes_indices}) – Subscript dictionary generated by generate_kfold_indices().

Returns:

  • train_ix (ndarray, ‘subject id’: classes_indices) – The index of the training set trials required for k-fold verification of the full class data of all subjects (i.e., meta-class data).

  • val_ix (ndarray, ‘subject id’: classes_indices) – The validation set trial index required for validation of the meta-class data at k-fold validation.

  • test_ix (ndarray, ‘subject id’: classes_indices) – The test set trial index required for validation of the meta-class data at the k-fold.

metabci.brainda.algorithms.utils.model_selection.match_loo_indices(k: int, meta: DataFrame, indices)[source]

At the meta data structure level, a method is matched to cross-validate the grouping subscript and generate the specific index. Based on the meta class data structure and combined with the output of generate_loo_indices(), the specific index is generated.

author:Swolf <swolfforever@gmail.com>

Created on:2021-11-29

update log:

2023-12-26 by sunchang<18822197631@163.com>

Parameters:
  • k (int) – Cross-verify the index of folds.

  • meta (pandas.DataFrame) – metaBCI’s custom data class.

  • indices (dict, {‘subject id’: classes_indices}) – Subscript dictionary generated by generate_loo_indices().

Returns:

  • train_ix (ndarray, ‘subject id’: classes_indices) – The index of the training set trial required by the k-fold verification of meta class data.

  • val_ix (ndarray, ‘subject id’: classes_indices) – The validation set trial index required for validation of the meta-class data at k-fold validation.

  • test_ix (ndarray, ‘subject id’: classes_indices) – The test set trial index required for validation of the meta-class data at the k-fold.

metabci.brainda.algorithms.utils.model_selection.match_loo_indices_dict(X: Dict, y: Dict, meta: DataFrame, indices, k: int)[source]
metabci.brainda.algorithms.utils.model_selection.match_shuffle_indices(k: int, meta: DataFrame, indices)[source]

Random cross-validation grouping subscripts are matched at the meta data structure level to generate specific indexes. Based on the meta class data structure and combined with the output of generate_shuffle_indices(), a specific index is generated.

author:Swolf <swolfforever@gmail.com>

Created on:2021-11-29

update log:

2023-12-26 by sunchang<18822197631@163.com>

Parameters:
  • k (int) – Cross-verify the index of folds.

  • meta (pandas.DataFrame) – metaBCI’s custom data class.

  • indices (dict, {‘subject id’: classes_indices}) – A subscript dictionary generated by generate_shuffle_indices().

Returns:

  • train_ix (ndarray, ‘subject id’: classes_indices) – The index of the training set trial required by the k-fold verification of meta class data.

  • val_ix (ndarray, ‘subject id’: classes_indices) – The validation set trial index required for validation of the meta-class data at k-fold validation.

  • test_ix (ndarray, ‘subject id’: classes_indices) – The test set trial index required for validation of the meta-class data at the k-fold.

metabci.brainda.algorithms.utils.model_selection.set_random_seeds(seed: int)[source]

Set seeds for python random module numpy.random and torch.

author:Swolf <swolfforever@gmail.com>

Created on:2021-11-29

update log:

2023-12-26 by sunchang<18822197631@163.com>

Parameters:

seed (int) – Random seed.