What is machine learning?
 Is true AI possible?
 It is very hard to define a human mind with the mathematical rigor of a Turing machine (a programmable algorithm). Although we still do not have a working model of a mouse brain, we do have hardware capable of simulating it. A mouse has around 4 million neurons in the cerebral cortex. A human being has 80120 billion neurons. Thus, you can imagine how much more research will need to be conducted in order to get a working model of a human mind.
Is there any set of rules that can define the entire scope of human expression? You could argue that we only need to do topdown approach and do not need to understand individual workings of every neuron. In that case you might study some nonmonotonic logic, abductive reasoning, decision theory, etc. When new theories come, more exceptions and paradoxes occur. Alternatively, how could any simple logic encompass the evolving meaning that humans attribute to words and abstract ideas?
Although in it’s early stages, machine learning attempts to solve this dilemma. By enriching a computer with the ability to improve its output through positive and negative feedback loops, it takes another step closer towards true artificial intelligence. But for now these models are merely Pavlovian conditioning on a singularly focused skill set (in this case natural language). General intelligence displayed by humans is a much broader test of intelligence  one that summarizes positive correlations among different cognitive tasks, reflecting the fact that an individual’s performance on one type of cognitive task tends to be comparable to that person’s performance on other kinds of cognitive tasks.
Douglas Hofstadter, in his books Gödel, Escher, Bach and I Am a Strange Loop, cites Gödel’s theorems as an example of what he calls a strange loop, a hierarchical, selfreferential structure existing within an axiomatic formal system. He argues that this is the same kind of structure which gives rise to consciousness, the sense of “I”, in the human mind. While the selfreference in Gödel’s theorem comes from the Gödel sentence asserting its own unprovability (ie This sentence is false.), the selfreference in the human mind comes from the way in which the brain abstracts and categorises stimuli into “symbols”, or groups of neurons which respond to concepts, in what is effectively also a formal system, eventually giving rise to symbols modelling the concept of the very entity doing the perception. Hofstadter argues that a strange loop in a sufficiently complex formal system can give rise to a “downward” or “upsidedown” causality, a situation in which the normal hierarchy of causeandeffect is flipped upsidedown. In the case of Gödel’s theorem, this manifests, in short, as the following:
"Merely from knowing the formula's meaning, one can infer its truth or falsity without any effort to derive it in the oldfashioned way, which requires one to trudge methodically "upwards" from the axioms. This is not just peculiar; it is astonishing. Normally, one cannot merely look at what a mathematical conjecture says and simply appeal to the content of that statement on its own to deduce whether the statement is true or false."
For example, calculating Pi for a human would yield π, whereas a computer would only stop calculating after it has run out of memory and crashed. According to Godel’s incompleteness theorem, a computer cannot escape the inherent limitations of a formal axiomatic ruleset. In the case of the mind, a far more complex formal system, this “downward causality” manifests, in Hofstadter’s view, as the ineffable human instinct that the causality of our minds lies on the high level of desires, concepts, personalities, thoughts and ideas, rather than on the low level of interactions between neurons or even fundamental particles, even though according to physics the latter seems to possess the causal power.
"There is thus a curious upsidedownness to our normal human way of perceiving the world: we are built to perceive “big stuff” rather than “small stuff”, even though the domain of the tiny seems to be where the actual motors driving reality reside."
Thus, cognition is a function of how one’s own brain categorizes stimuli into a formal system. The presence of free will becomes apparent if these higher level abstractions disagree with the underlying stimuli which gave rise to it.
Looked at this way, Gödel's proof suggests – though by no means does it prove! – that there could be some highlevel way of viewing the mind/brain, involving concepts which do not appear on lower levels, and that this level might have explanatory power that does not exist – not even in principle – on lower levels. It would mean that some facts could be explained on the high level quite easily, but not on lower levels at all. No matter how long and cumbersome a lowlevel statement were made, it would not explain the phenomena in question. What might such highlevel concepts be? It has been proposed for eons, by various holistically or "soulistically" inclined scientists and humanists that consciousness is a phenomenon that escapes explanation in terms of brain components; so here is a candidate at least. There is also the everpuzzling notion of free will. So perhaps these qualities could be "emergent" in the sense of requiring explanations which cannot be furnished by the physiology alone.
 Machine learning vs Statistics
 The basic premise of machine learning (ML) is to build algorithms that can receive input data and use statistical analysis to predict an output while updating outputs as new data becomes available. Statistics focuses on quantifying uncertainty by formalizing the relationship between variables and mathematical equations. ML focuses on prediction and classification by using algorithms that learn from data instead of explicity programmed instructions.
 Common Applications

 Image recognition
 Object detection and tracking
 Speech recognition & synthesis
 Algorithmic trading strategies
 Sentiment analysis
 Supervised vs Unsupervised
 Supervised  Data is labeled and the algorithms learn to predict the output from input data.
 Unsupervised  Data is unlabeled and the algorithms learn to find structure from the input data.
 Training and Testing
 Separating data into training and testing sets is an important part of evaluating data mining models. Typically, when you separate a data set into a training set and testing set, most of the data is used for training, and a smaller portion of the data is used for testing.
Building models
 Tensor
 Tensors are a type of data structure used in linear algebra. It is a container which can house data in N dimensions, along with its linear operations. A tensor processing unit (TPU) is an AI accelerator applicationspecific integrated circuit (ASIC) developed by Google specifically for neural network machine learning.
 Tensorflow
 An open source library by Google for building machine learning models.
 Tensor math  use linear algebra on tensors
 Memory management  the process of controlling and coordinating computer memory and avoiding memory leaks
 Loss Functions and Optimizers  A loss function is a method of evaluating how well your algorithm fits the dataset. The optimizer algorithm finds the best parameters (weights) for improving the loss function.
 Keras
 Keras is a minimalist Python library for deep learning that can run on top of TensorFlow. It was developed to make implementing deep learning models as fast and easy as possible for research and development. Using Keras allows for quickly building models by using builtin estimators. See below for decision tree on choosing estimator algorithm.
Neural networks
If a machine learning algorithm returns an inaccurate prediction, then an engineer needs to step in and make adjustments. Deep learning structures algorithms in layers to create a neural network that can learn and make intelligent decisions on its own.
 Feedforward
 Input layer  will take a representation of the data as a tensor.
 Hidden layer  there can be one or many hidden layers. Each layer has neuron unit(s). Every neuron has its own weighting for each of the units of the previous layer. After summing the weights and activations, the neuron goes through an activation function that squashes the value and produces a new tensor. A bias shifts the activation function to the left or right.
 Output layer  the neurons of this layer also use weights and activation functions. If the shape of the input tensor is different than the output tensor, the weighting has the ability to transform the shape.
 Backpropagation
 Shorthand for “the backward propagation of errors,” since an error is computed at the output and distributed backwards throughout the network’s layers in order for it to learn from mistakes. Error is calculated and then a gradient descent optimization algorithm (a differential equation) is used to determine the direction of steepest decline. In order to avoid local minimas, the learning rate should be gradually adjusted higher.
 Neural network analogy
 Input layer = Eyes
 Hidden layer = Brain
 Activation function = Neurons passing signals to other neurons in the brain
 Output layer = Consciousness (how the brain perceived the stimuli from the eyes)
 Optimization function = Learning by updating the weights/biases based on the accuracy of previous outputs
 CNN
 CNNs have applications in image and video recognition, recommender systems, image classification, medical image analysis, and natural language processing. The input features are taken in batch wise like a filter. This will help the network to remember the images in parts and can compute operations like converting from RGB to grayscale. The changes in the pixel value will help detecting the edges.
 RNN
 RNNs are applicable to tasks such as unsegmented, connected handwriting recognition or speech recognition. A class of artificial neural network where connections between nodes form a directed graph along a sequence. This allows it to exhibit temporal dynamic behavior for a time sequence. Unlike feedforward neural networks (connections between the nodes do not form a cycle), RNNs can use their internal state (memory) to process sequences of inputs.
 Unsupervised neural networks
 Self organizing maps  for clustering data and feature detection in higher dimensions
 Boltzmann machine  fits a model that will assign a probability to every possible binary vector
 Used for recommendation systems
 Autoencoder  encodes and decodes itself
 The hidden nodes are a bottleneck that extracts the most important features
Experimental design
 Bias–variance tradeoff

 Bias is an error from erroneous assumptions in the learning algorithm. High bias can cause an algorithm to miss the relevant relations between features and target outputs (underfitting).
 Variance is an error from sensitivity to small fluctuations in the training set. High variance can cause an algorithm to model the random noise in the training data, rather than the intended outputs (overfitting).
 Ensemble learning
 Use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. Use confidence intervals and statistical significance tests for comparing ML algorithms.
 Pvalue for statistical significance
 Chisquared for normally distributed data
 Pearson and Spearman Correlation