Comprehensive Machine Learning Terminology Guide
Welcome to the Comprehensive Machine Learning Terminology Guide! Whether you're a newcomer to the field of machine learning or an experienced practitioner looking to brush up on your vocabulary, this guide is designed to be your go-to resource for understanding the key terms and concepts that form the foundation of ML.
Fundamental Concepts
- Machine Learning (ML): A subset of artificial intelligence that focuses on building systems that can learn from and make decisions based on data.
- Artificial Intelligence (AI): The broader field of creating intelligent machines that can simulate human thinking capability and behavior.
- Deep Learning: A subset of machine learning based on artificial neural networks with multiple layers.
- Dataset: A collection of data used for training and testing machine learning models.
- Feature: An individual measurable property or characteristic of a phenomenon being observed.
- Label: The target variable that we're trying to predict in supervised learning.
- Model: A mathematical representation of a real-world process, learned from data.
- Algorithm: A step-by-step procedure or formula for solving a problem.
- Training: The process of teaching a model to make predictions or decisions based on data.
- Inference: Using a trained model to make predictions on new, unseen data.
Types of Machine Learning
- Supervised Learning: Learning from labeled data to predict outcomes for unforeseen data.
- Unsupervised Learning: Finding hidden patterns or intrinsic structures in input data without labeled responses.
- Semi-Supervised Learning: Learning from a combination of labeled and unlabeled data.
- Reinforcement Learning: Learning to make decisions by interacting with an environment.
- Transfer Learning: Applying knowledge gained from one task to a related task.
Model Evaluation and Metrics
- Accuracy: The proportion of correct predictions among the total number of cases examined.
- Precision: The proportion of true positive predictions among all positive predictions.
- Recall: The proportion of true positive predictions among all actual positive cases.
- F1 Score: The harmonic mean of precision and recall.
- ROC Curve: A graphical plot illustrating the diagnostic ability of a binary classifier system.
- AUC (Area Under the Curve): A measure of the ability of a classifier to distinguish between classes.
- Confusion Matrix: A table used to describe the performance of a classification model.
- Cross-Validation: A resampling procedure used to evaluate machine learning models on a limited data sample.
- Overfitting: When a model learns the training data too well, including noise and fluctuations.
- Underfitting: When a model is too simple to capture the underlying structure of the data.
Neural Networks and Deep Learning
- Neuron: The basic unit of a neural network, loosely modeled on the biological neuron.
- Activation Function: A function that determines the output of a neuron given an input or set of inputs.
- Weights: Parameters within a neural network that determine the strength of the connection between neurons.
- Bias: An additional parameter in neural networks used to adjust the output along with the weighted sum of the inputs to the neuron.
- Backpropagation: An algorithm for training neural networks by iteratively adjusting the network's weights based on the error in its predictions.
- Gradient Descent: An optimization algorithm used to minimize the loss function by iteratively moving in the direction of steepest descent.
- Epoch: One complete pass through the entire training dataset.
- Batch: A subset of the training data used in one iteration of model training.
- Learning Rate: A hyperparameter that controls how much to change the model in response to the estimated error each time the model weights are updated.
- Convolutional Neural Network (CNN): A type of neural network commonly used for image recognition and processing.
- Recurrent Neural Network (RNN): A type of neural network designed to recognize patterns in sequences of data.
- Long Short-Term Memory (LSTM): A type of RNN capable of learning long-term dependencies.
- Transformer: A model architecture that relies entirely on an attention mechanism to draw global dependencies between input and output.
Feature Engineering and Selection
- Feature Engineering: The process of using domain knowledge to extract features from raw data.
- Feature Selection: The process of selecting a subset of relevant features for use in model construction.
- Dimensionality Reduction: Techniques for reducing the number of input variables in a dataset.
- Principal Component Analysis (PCA): A statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables.
Ensemble Methods
- Ensemble Learning: The process of combining multiple models to solve a computational intelligence problem.
- Bagging: An ensemble method that uses multiple subsets of the training data to train different models.
- Boosting: An ensemble method that combines weak learners to create a strong learner.
- Random Forest: An ensemble learning method that constructs a multitude of decision trees.
Natural Language Processing (NLP)
- Tokenization: The process of breaking down text into individual words or subwords.
- Stemming: The process of reducing inflected words to their word stem or root form.
- Lemmatization: The process of grouping together different inflected forms of a word.
- Word Embedding: A learned representation for text where words with similar meaning have a similar representation.
- Named Entity Recognition (NER): The task of identifying and classifying named entities in text.
- Sentiment Analysis: The use of natural language processing to identify and extract subjective information from text.
Reinforcement Learning
- Agent: The learner or decision-maker in a reinforcement learning scenario.
- Environment: The world in which the agent operates and learns.
- State: The current situation or condition of the agent in the environment.
- Action: A move or decision made by the agent.
- Reward: The feedback from the environment to evaluate the action taken by the agent.
- Policy: A strategy used by the agent to determine the next action based on the current state.
Advanced Concepts
- Generative Adversarial Network (GAN): A class of machine learning frameworks where two neural networks contest with each other.
- Attention Mechanism: A technique that mimics cognitive attention, enhancing the important parts of the input data and diminishing the irrelevant parts.
- Transfer Learning: A research problem in machine learning that focuses on storing knowledge gained while solving one problem and applying it to a different but related problem.
- Few-Shot Learning: A type of machine learning where a model is trained to recognize new classes from only a few examples.
- Explainable AI (XAI): Artificial intelligence systems where the results can be understood by humans.
- Federated Learning: A machine learning technique that trains an algorithm across multiple decentralized devices or servers holding local data samples.
- AutoML: The process of automating the end-to-end process of applying machine learning to real-world problems.
Conclusion
If you are reading this, thank you so much! I appreciate it a lot ❤️.
Follow me on Twitter @appyzdl5 for regular updates, insights, and engaging conversations about ML.
Check out my Github with projects like miniGit and ML algos from scratch: @appyzdl