Machine Learning Interview Questions
- Why reasons resulted in Machine learning introduction?
The simplest answer is for making our lives easier. In the early
days of intelligent applications, numerous systems depended on hardcode rules
of “if” and “else” decisions for processing data or adjusting the user input.
Imagine spam filter whose job is to move the right incoming email messages to a
spam folder.
With machine learning algorithms, one is offered ample
information for the data to learn and identify patterns from the data. One is
not required to write new rules for each problem in machine learning.
2. What are several
Types of Machine Learning algorithms?
There are several machine learning algorithms. Broadly speaking
Machine learning algorithms are divided in supervised, unsupervised, and
reinforcement learning.
3.What is Supervised
Learning?
Supervised learning simply putmachine learning algorithm of
deducing a function from labelled training data. Some of the supervised
learning algorithms are:
- Support
Vector Machines
- Regression
- Naive Bayes
- Decision Trees
4. What is Unsupervised Learning?
Unsupervised learning is second type of ML algorithm considered
for finding patterns on the set of data provided. In this one does not have to
dependent on variable or label to predict.
Unsupervised learning algorithms include:
- Clustering,
- Anomaly
Detection,
- Neural Networks and Latent Variable Models.
In case you wish to gain more clarity then machine learning coding bootcamp can offer you the right guidance for successful career opportunities.
5. What is ‘Naive’ concept in Naive Bayes?
Naive Bayes methodology is a supervised learning algorithm; it is naive as it makes supposition by applying Bayes’ theorem that all characteristics are independent of each other. Consult a machine learning bootcamp to understand the technique and further tools for cracking the interview.
6. What is PCA? When do you use it?
Principal component analysis (PCA) is the most commonly used for
dimension reduction and measures the variation in each variable. If there is
little alteration, it throws the variable out.
Principal component analysis makes the dataset easy to visualize, and is used in finance, neuroscience, and pharmacology. It is further useful in pre-processing stage, when linear correlations are present between features. Consider coding bootcamp for learning tools and techniques.
7. Explain SVM Algorithm.
A SVM or Support Vector Machine is a strong and versatile supervised machine learning model, capable of performing linear or non-linear classification, outlier detection and regression.
8. What are Support Vectors in SVM?
Support Vector Machine (SVM) is an algorithm which makes fitting line between different classes that maximizes the distance from line to the points of the classes. In this manner, it tries to find a robust separation between classes. Support Vectors are points of edge of dividing hyper plane.
9. What are Different Kernels in SVM?
There are 6 types of kernels in SVM however, following four are
widely used:
- Linear
Kernel- used when data is linearly separable.
- Polynomial
kernel - When one has discrete data that has no natural notion of
efficiency.
- Radial basis
kernel - Is used for creating a decision boundary for doing a better job
of separating two classes compared to the the linear kernel.
- Sigmoid kernel - Is used as an activation function for neural networks.
10. What is Cross-Validation?
Cross-Validation is a method of splitting data in three parts- training, validation and testing. Data is split into K subsets, and models have trained on k-1 of the datasets. The last subset is held for testing and is conducted for each of the subsets. This is k-fold cross-validation. Lastly, the scores from all the k-folds are averaged for producing final score.
11. What is Bias in Machine Learning?
Bias in data indicates there is inconsistency in data. The inconsistency may be cause due to several reasons which are not reciprocally exclusive.
12. What is the Difference Between Classification and
Regression?
Classification is used for producing discrete results whereas, classification is used for classifying data into some definite categories.
13. Define Precision and Recall?
Precision and recall are ways of monitoring power of machine learning implementation. But these are often used at the same time. Precision may inspect relevance whereas recall answers the questions. Basically, the meaning of precision is the fact of being exact and accurate. Same goes in machine learning models as well. In case one has set of items that model needs to predict to be relevant then it could answer how many items are truly relevant.
15. How to Tackle Overfitting and Underfitting?
Overfitting means model fitted for training data well, in this case, one needs to resample the data and estimate model accuracy using techniques like K-fold cross-validation. Whereas in case of underfitting one is not able to understand or capture the patterns from data, in such case, one needs to change the algorithms, or one needs to feed more data points in the model for accuracy.
16. What is a Neural Network?
Neural Network to put in simple words is model of human brain. Much like brain, it has neurons that activate when encountering something relatable. Different neurons are connected via connections which help information flow from one neuron to another.
17. What is Ensemble learning?
Ensemble learning is a method that joins multiple machine
learning models for creating powerful models.
There are numerous reasons for a mode to be different. Some are:
- Different
Hypothesis
- Different
Population
- Different
Modelling techniques
When working with model’s training and testing data, one can
experience an error. This error might be bias, irreducible error or variance.
Now model should have a balance between bias and variance, this
one call a bias-variance trade-off. This ensemble learning is a manner to
perform this trade-off. There are numerous ensemble techniques available but
when aggregating multiple models there are general two methods- Bagging and
Boosting.
18 . How does one make sure which Machine Learning Algorithm to
use?
It solely depends on the dataset one has. If the data is discrete one makes use of SVM. If the dataset is continuous one uses linear regression. So, while there is no specific way for knowing which ML algorithm to use, it entirely depends on the exploratory data analysis (EDA)
19. How to Handle Outlier Values?
An outlier is an action in the dataset which is far away from
other observations in the dataset. Tools used for discovering outlier are:
Z-score
Box plot
Z-score
Scatter plot
Conclusion,
Comments
Post a Comment