This example is derived from a Kaggle kernel. If you never heard of Kaggle and are interested in Machine Learning tinkering, I strongly recommend to go for a look.
The dataset (cervical.csv) used for this part can be found here. It contains data for about 714 miRNA expressions and 58 samples (and we know that the first 29 are Normal samples while the other 29 are Tumoral).
Link to the Jupyter notebook
- For an idea of what an SVM is look here.
- The bagging part can be thought as a way to reduce overfitting, since it creates different subsets of the data on which to train the classifier (see the docs).
- The K-fold cross-validation is a needed part in every good research work: the data used for training and the one used for testing is sampled K times and the final result is the mean of all the K test scores obtained.