# ECE 59500 Machine Learning

Course Description: ECE 59500 Machine Learning

Textbook:

- Pattern Recognition and Machine Learning
- Machine Learning: A Probabilistic Perspective
- Pattern Recognition, 4th Edition

Topics:

- Expectation-Maximization algorithm
- Gaussian mixture model By maximizing the probability of the current state with the EM algorithm, we can estimate the parameters of our Gaussian mixture model.
- Capsule networks and EM rounting By introducing the weight coefficients and rewriting the initial distribution formula, we can derive Hinton’s Capsules with EM routing.
- hidden Markov model By maximizing the probability of the current state with the EM algorithm, we can estimate the parameters of our Hidden Markov Model. To prevent underflow of the estimated parameter values, we introduce intermediate parameters.
- backpropagation, codes Derive the backpropagation rule in matrix form, then implement the backpropagation with both MATLAB and python. Furthermore, the backpropagation rule is integrated in the simple Deep Learning Framwork library (see codes).
- Quasi-Newton optimization methods, codes Derive the formula of Quasi-Newton, such as DFP and BFGS. DFP may fail when the condition number of hessian lambda_max / lambda_min is very big, BFGS is more stable than DFP but someone gave an example that BFGS couldn’t work for non-convex problems. In addition, we implemented the limit memory BFGS Quasi-Newton method.
- Wasserstein GAN When the discriminator is well-trained and there is a huge gap be between the real image distribution && the fake image distribution, the gradient for the generator loss would vanish, thus the generator learns nothing from the optimization. To address the problem, Wasserstein GAN (WGAN) introduces the Wasserstein distance whose gradient does not vanish when the distribution of the real image and that of the fake image are far away from each other. Furthermore, the improved model WGAN-GP added the gradient penalty in the loss function. The experimental results demonstrate an improvement in the learning stability comparing to DCGAN whereas WGAN-GP convergence speed outcompetes WGAN.

References: Convex Optimization – Boyd and Vandenberghe, pdf