Deep Learning Seminar

meets TTH 12:30 p.m. - 2:00 p.m. in GDC 4.304 (zoom mirror; see canvas)

instructor Philipp Krähenbühl
email philkr (at) utexas.edu
office hours Th 2pm-2:30pm (zoom, see canvas)

TA Yue Zhao
email yzhao (at) cs.utexas.edu
TA hours T 2pm-2:30pm (zoom, see canvas)

Please use github for all assignments. Zoom links and final grades are available in canvas.

Unless disaster strikes, the course will be taught in-person with an optional zoom mirror. The zoom mirror will be live, there will not be any recordings.

Prerequisites

Intro Machine learning
  • 391L or equivalent
Discrete math for computer science
  • 311, 311H or equivalent
Proficiency in Python
  • All projects use Python with PyTorch
  • It is recommended to familiarize yourself with additional libraries: numpy, scikit-learn, matplotlib
Basic deep learning background
  • Familiarity with at least one deep learning package (PyTorch, Caffe, Tensorflow, Torch, Matconvnet, ...)
  • You should have trained at least one deep network

We wont enforce strict prerequisites (help of these topics is limited though).

Class overview

We discuss up to 5 recent research papers per class
  • We will try to get through 100+ papers over the semester (in other words 1/10 CVPR or NeurIPS)
  • No individual student will need to read 100 papers, but you'll need to read 100+ 5-minute paper summaries.
Before class: Groups of 4 students ([S1], [S2], [C], [R]) read each paper
  • [S1][S2] write a summary and review
  • [C] codes up the main idea (using starter code we provide)
  • [R] performs a round of peer review on both summaries and code
In class
  • [S1][S2] present the summaries and reviews for each paper (5min per paper at most), [C] briefly presents an overview of the implementation (1-2min)
  • We look at a comparison between papers
  • Discussion

Auditing allowed if there is space (no coding or presentation, but in class participation required)

Schedule

DateTopicPapers
Aug 26 Course Introduction
Aug 31Convolutional Neural Networks
[no code (yet)]
[S1] Gradient-based learning applied to document recognition, LeCun, Bottun, Bengio, Haffner; 1998
[S1] ImageNet Classification with Deep Convolutional Neural Networks, Krizhevsky, Sutskever, Hinton; 2012
[S1] Network In Network, Lin, Chen, Yan; 2013
[S1] Going Deeper with Convolutions, Szegedy, Liu, Jia, Sermanet, Reed, Anguelov, Erhan, Vanhoucke, Rabinovich; 2014
[S1] Very Deep Convolutional Networks for Large-Scale Image Recognition, Simonyan, Zisserman; 2014
Sep 02Convolutional Neural Networks
[no code (yet)]
[S1] Deep Residual Learning for Image Recognition, He, Zhang, Ren, Sun; 2015
[S1] Densely Connected Convolutional Networks, Huang, Liu, Maaten, Weinberger; 2016
[S1] MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications, Howard, Zhu, Chen, Kalenichenko, Wang, Weyand, Andreetto, Adam; 2017
[S1] MobileNetV2: Inverted Residuals and Linear Bottlenecks, Sandler, Howard, Zhu, Zhmoginov, Chen; 2018
[S1] EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks, Tan, Le; 2019
Sep 07Non-linearities (and initialization)
[C]
[S1] [S2] Understanding the difficulty of training deep feedforward neural networks, Glorot, Bengio; 2010S1: Serdjan Rolovic S2: Elias Lampietti C: Matthew Kelleher R: Samantha Hay
[S1] [S2] Deep Sparse Rectifier Neural Networks, Glorot, Bordes, Bengio; 2011S1: Ishank Arora S2: Yeming Wen C: Ayush Chauhan R: Christopher Hahn
[S1] [S2] Maxout Networks, Goodfellow, Warde-Farley, Mirza, Courville, Bengio; 2013S1: Zayne Sprague S2: Zhou Fang C: Reid Ling Tong Li R: Jay Whang
[S1] Exact solutions to the nonlinear dynamics of learning in deep linear neural networks, Saxe, McClelland, Ganguli; 2013S1: Liyan Chen S2: C: Marlan McInnes-Taylor R: Nilesh Gupta
[S1] [S2] Rectifier Nonlinearities Improve Neural Network Acoustic Models, Maas, Hannun, Ng; 2013S1: Kelsey Ball S2: Srinath Tankasala C: Hung-Ting Chen R: Sai Kiran Maddela
Sep 09Non-linearities (and initialization)
[C]
[S1] [S2] Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification, He, Zhang, Ren, Sun; 2015S1: Tongrui Li S2: Jordi Ramos Chen C: R:
[S1] [S2] Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs), Clevert, Unterthiner, Hochreiter; 2015S1: Joshua Papermaster S2: ABAYOMI ADEKANMBI C: Ian Trowbridge R: Tarannum Khan
[S2] Searching for Activation Functions, Ramachandran, Zoph, Le; 2017S1: S2: Jay Liao C: Ishan Shah R: Kiran Raja
[S1] [S2] Mish: A Self Regularized Non-Monotonic Activation Function, Misra; 2019S1: Shivi Agarwal S2: Atreya Dey C: Ojas Patel R: Daniel Almeraz
[S1] [S2] Gaussian Error Linear Units (GELUs), Hendrycks, Gimpel; 2016S1: Marco Bueso S2: Jose Chavez C: Cheng-Chun Hsu R: Shivang Singh
Sep 14Optimizers
[C]
[S1] [S2] Large-Scale Machine Learning with Stochastic Gradient Descent, Bottou; 2010S1: Elias Lampietti S2: Zayne Sprague C: Samantha Hay R: Ojas Patel
[S1] [S2] On the importance of initialization and momentum in deep learning, Sutskever, Martens, Dahl, Hinton; 2013S1: Zhou Fang S2: Kelsey Ball C: Kiran Raja R: Reid Ling Tong Li
[S1] [S2] Cyclical Learning Rates for Training Neural Networks, Smith; 2015S1: Jay Liao S2: Liyan Chen C: Shivang Singh R: Ian Trowbridge
[S1] [S2] SGDR: Stochastic Gradient Descent with Warm Restarts, Loshchilov, Hutter; 2016S1: Atreya Dey S2: Tongrui Li C: Tarannum Khan R:
[S1] [S2] Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates, Smith, Topin; 2017S1: Jose Chavez S2: Marco Bueso C: R: Marlan McInnes-Taylor
Sep 16Optimizers
[C]
[S1] Adaptive Subgradient Methods for Online Learning and Stochastic Optimization, Duchi, Hazan, Singer; 2011S1: Yeming Wen S2: C: Daniel Almeraz R: Ayush Chauhan
[S1] [S2] ADADELTA: An Adaptive Learning Rate Method, Zeiler; 2012S1: Srinath Tankasala S2: Shivi Agarwal C: Nilesh Gupta R: Matthew Kelleher
[S1] [S2] Adam: A Method for Stochastic Optimization, Kingma, Ba; 2014S1: Jordi Ramos Chen S2: Ishank Arora C: Jay Whang R: Cheng-Chun Hsu
[S1] [S2] On the Convergence of Adam and Beyond, Reddi, Kale, Kumar; 2019S1: ABAYOMI ADEKANMBI S2: Joshua Papermaster C: Sai Kiran Maddela R: Hung-Ting Chen
[S2] Decoupled Weight Decay Regularization, Loshchilov, Hutter; 2017S1: S2: Serdjan Rolovic C: Christopher Hahn R: Ishan Shah
Sep 21Normalizations
[C]
[S1] [S2] Dropout: A Simple Way to Prevent Neural Networks from Overfitting, Srivastava, Hinton, Krizhevsky, Sutskever, Salakhutdinov; 2014S1: Ojas Patel S2: Tarannum Khan C: Marco Bueso R:
[S1] Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, Ioffe, Szegedy; 2015S1: Ishan Shah S2: C: Kelsey Ball R: Elias Lampietti
[S1] [S2] Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks, Salimans, Kingma; 2016S1: Marlan McInnes-Taylor S2: Samantha Hay C: Zayne Sprague R: Jay Liao
[S1] [S2] Layer Normalization, Ba, Kiros, Hinton; 2016S1: Ayush Chauhan S2: Jay Whang C: Ishank Arora R: Atreya Dey
[S1] [S2] Instance Normalization: The Missing Ingredient for Fast Stylization, Ulyanov, Vedaldi, Lempitsky; 2016S1: Reid Ling Tong Li S2: Kiran Raja C: Liyan Chen R: ABAYOMI ADEKANMBI
Sep 23Normalizations
[C]
[S1] [S2] Group Normalization, Wu, He; 2018S1: Ian Trowbridge S2: Shivang Singh C: Shivi Agarwal R: Srinath Tankasala
[S1] [S2] High-Performance Large-Scale Image Recognition Without Normalization, Brock, De, Smith, Simonyan; 2021S1: Matthew Kelleher S2: Christopher Hahn C: Serdjan Rolovic R: Jose Chavez
[S2] Micro-Batch Training with Batch-Channel Normalization and Weight Standardization, Qiao, Wang, Liu, Shen, Yuille; 2019S1: S2: Nilesh Gupta C: Joshua Papermaster R: Jordi Ramos Chen
[S1] [S2] Understanding Batch Normalization, Bjorck, Gomes, Selman, Weinberger; 2018S1: Hung-Ting Chen S2: Daniel Almeraz C: Tongrui Li R: Zhou Fang
[S1] [S2] Rethinking "Batch" in BatchNorm, Wu, Johnson; 2021S1: Cheng-Chun Hsu S2: Sai Kiran Maddela C: R: Yeming Wen
Sep 28Sequence models
[C]
[S1] [S2] Sequence to Sequence Learning with Neural Networks, Sutskever, Vinyals, Le; 2014S1: Sai Kiran Maddela S2: Ian Trowbridge C: Srinath Tankasala R: Liyan Chen
[S1] [S2] Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling, Chung, Gulcehre, Cho, Bengio; 2014S1: Christopher Hahn S2: Reid Ling Tong Li C: Jay Liao R: Tongrui Li
[S1] [S2] Neural Machine Translation by Jointly Learning to Align and Translate, Bahdanau, Cho, Bengio; 2014S1: Shivang Singh S2: Ayush Chauhan C: Elias Lampietti R: Serdjan Rolovic
[S1] [S2] Attention Is All You Need, Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Kaiser, Polosukhin; 2017S1: Samantha Hay S2: Ojas Patel C: Zhou Fang R: Ishank Arora
[S1] [S2] End-To-End Memory Networks, Sukhbaatar, Szlam, Weston, Fergus; 2015S1: Tarannum Khan S2: Marlan McInnes-Taylor C: R: Zayne Sprague
Sep 30Sequence models
[C]
[S1] [S2] Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth, Dong, Cordonnier, Loukas; 2021S1: Nilesh Gupta S2: Ishan Shah C: Jordi Ramos Chen R: Kelsey Ball
[S1] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Devlin, Chang, Lee, Toutanova; 2018S1: Daniel Almeraz S2: C: Jose Chavez R: Joshua Papermaster
[S2] A Primer in BERTology: What we know about how BERT works, Rogers, Kovaleva, Rumshisky; 2020S1: S2: Hung-Ting Chen C: Atreya Dey R: Shivi Agarwal
[S1] [S2] Improving Language Understanding by Generative Pre-Training, Radford, Narasimhan, Salimans, Sutskever; 2018S1: Kiran Raja S2: Cheng-Chun Hsu C: Yeming Wen R: Marco Bueso
[S1] [S2] Language Models are Few-Shot Learners, Brown, Mann, Ryder, Subbiah, Kaplan, Dhariwal, Neelakantan, Shyam, Sastry, Askell, Agarwal, Herbert-Voss, Krueger, Henighan, Child, Ramesh, Ziegler, Wu, Winter, Hesse, Chen, Sigler, Litwin, Gray, Chess, Clark, Berner, McCandlish, Radford, Sutskever, Amodei; 2020S1: Jay Whang S2: Matthew Kelleher C: ABAYOMI ADEKANMBI R:
Oct 05Efficient Transformers
[C]
[S1] [S2] Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context, Dai, Yang, Yang, Carbonell, Le, Salakhutdinov; 2019S1: Serdjan Rolovic S2: Elias Lampietti C: Matthew Kelleher R: Samantha Hay
[S1] [S2] Generating Long Sequences with Sparse Transformers, Child, Gray, Radford, Sutskever; 2019S1: Ishank Arora S2: Yeming Wen C: Ayush Chauhan R: Christopher Hahn
[S1] [S2] Compressive Transformers for Long-Range Sequence Modelling, Rae, Potapenko, Jayakumar, Lillicrap; 2019S1: Zayne Sprague S2: Zhou Fang C: Reid Ling Tong Li R: Jay Whang
[S1] Reformer: The Efficient Transformer, Kitaev, Kaiser, Levskaya; 2020S1: Liyan Chen S2: C: Marlan McInnes-Taylor R: Nilesh Gupta
[S1] [S2] Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention, Katharopoulos, Vyas, Pappas, Fleuret; 2020S1: Kelsey Ball S2: Srinath Tankasala C: Hung-Ting Chen R: Sai Kiran Maddela
Oct 07Efficient Transformers
[C]
[S1] [S2] Linformer: Self-Attention with Linear Complexity, Wang, Li, Khabsa, Fang, Ma; 2020S1: Tongrui Li S2: Jordi Ramos Chen C: R:
[S1] [S2] Rethinking Attention with Performers, Choromanski, Likhosherstov, Dohan, Song, Gane, Sarlos, Hawkins, Davis, Mohiuddin, Kaiser, Belanger, Colwell, Weller; 2020S1: Joshua Papermaster S2: ABAYOMI ADEKANMBI C: Ian Trowbridge R: Tarannum Khan
[S2] Longformer: The Long-Document Transformer, Beltagy, Peters, Cohan; 2020S1: S2: Jay Liao C: Ishan Shah R: Kiran Raja
[S1] [S2] Big Bird: Transformers for Longer Sequences, Zaheer, Guruganesh, Dubey, Ainslie, Alberti, Ontanon, Pham, Ravula, Wang, Yang, Ahmed; 2020S1: Shivi Agarwal S2: Atreya Dey C: Ojas Patel R: Daniel Almeraz
[S1] [S2] LambdaNetworks: Modeling Long-Range Interactions Without Attention, Bello; 2021S1: Marco Bueso S2: Jose Chavez C: Cheng-Chun Hsu R: Shivang Singh
Oct 12Vision Transformers
[C]
[S1] [S2] An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, Dosovitskiy, Beyer, Kolesnikov, Weissenborn, Zhai, Unterthiner, Dehghani, Minderer, Heigold, Gelly, Uszkoreit, Houlsby; 2020S1: Elias Lampietti S2: Zayne Sprague C: Samantha Hay R: Ojas Patel
[S1] [S2] Training data-efficient image transformers & distillation through attention, Touvron, Cord, Douze, Massa, Sablayrolles, Jégou; 2020S1: Zhou Fang S2: Kelsey Ball C: Kiran Raja R: Reid Ling Tong Li
[S1] [S2] BEiT: BERT Pre-Training of Image Transformers, Bao, Dong, Wei; 2021S1: Jay Liao S2: Liyan Chen C: Shivang Singh R: Ian Trowbridge
[S1] [S2] LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference, Graham, El-Nouby, Touvron, Stock, Joulin, Jégou, Douze; 2021S1: Atreya Dey S2: Tongrui Li C: Tarannum Khan R:
[S1] [S2] Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions, Wang, Xie, Li, Fan, Song, Liang, Lu, Luo, Shao; 2021S1: Jose Chavez S2: Marco Bueso C: R: Marlan McInnes-Taylor
Oct 14Vision Transformers
[C]
[S2] Swin Transformer: Hierarchical Vision Transformer using Shifted Windows, Liu, Lin, Cao, Hu, Wei, Zhang, Lin, Guo; 2021S1: Yeming Wen S2: C: Daniel Almeraz R: Ayush Chauhan
[S1] [S2] Transformer in Transformer, Han, Xiao, Wu, Guo, Xu, Wang; 2021S1: Srinath Tankasala S2: Shivi Agarwal C: Nilesh Gupta R: Matthew Kelleher
[S1] [S2] Perceiver: General Perception with Iterative Attention, Jaegle, Gimeno, Brock, Zisserman, Vinyals, Carreira; 2021S1: Jordi Ramos Chen S2: Ishank Arora C: Jay Whang R: Cheng-Chun Hsu
[S1] [S2] Perceiver IO: A General Architecture for Structured Inputs & Outputs, Jaegle, Borgeaud, Alayrac, Doersch, Ionescu, Ding, Koppula, Zoran, Brock, Shelhamer, Hénaff, Botvinick, Zisserman, Vinyals, Carreira; 2021S1: ABAYOMI ADEKANMBI S2: Joshua Papermaster C: Sai Kiran Maddela R: Hung-Ting Chen
[S2] MLP-Mixer: An all-MLP Architecture for Vision, Tolstikhin, Houlsby, Kolesnikov, Beyer, Zhai, Unterthiner, Yung, Steiner, Keysers, Uszkoreit, Lucic, Dosovitskiy; 2021S1: S2: Serdjan Rolovic C: Christopher Hahn R: Ishan Shah
Oct 19Implicit functions
[C]
[S1] [S2] DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation, Park, Florence, Straub, Newcombe, Lovegrove; 2019S1: Ojas Patel S2: Tarannum Khan C: Marco Bueso R:
[S1] Occupancy Networks: Learning 3D Reconstruction in Function Space, Mescheder, Oechsle, Niemeyer, Nowozin, Geiger; 2018S1: Ishan Shah S2: C: Kelsey Ball R: Elias Lampietti
[S1] [S2] Implicit Geometric Regularization for Learning Shapes, Gropp, Yariv, Haim, Atzmon, Lipman; 2020S1: Marlan McInnes-Taylor S2: Samantha Hay C: Zayne Sprague R: Jay Liao
[S1] [S2] Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains, Tancik, Srinivasan, Mildenhall, Fridovich-Keil, Raghavan, Singhal, Ramamoorthi, Barron, Ng; 2020S1: Ayush Chauhan S2: Jay Whang C: Ishank Arora R: Atreya Dey
[S1] [S2] Implicit Neural Representations with Periodic Activation Functions, Sitzmann, Martel, Bergman, Lindell, Wetzstein; 2020S1: Reid Ling Tong Li S2: Kiran Raja C: Liyan Chen R: ABAYOMI ADEKANMBI
Oct 21Implicit functions
[C]
[S1] [S2] Learning Continuous Image Representation with Local Implicit Image Function, Chen, Liu, Wang; 2020S1: Ian Trowbridge S2: Shivang Singh C: Shivi Agarwal R: Srinath Tankasala
[S1] [S2] NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis, Mildenhall, Srinivasan, Tancik, Barron, Ramamoorthi, Ng; 2020S1: Matthew Kelleher S2: Christopher Hahn C: Serdjan Rolovic R: Jose Chavez
[S2] NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections, Martin-Brualla, Radwan, Sajjadi, Barron, Dosovitskiy, Duckworth; 2020S1: S2: Nilesh Gupta C: Joshua Papermaster R: Jordi Ramos Chen
[S1] [S2] Baking Neural Radiance Fields for Real-Time View Synthesis, Hedman, Srinivasan, Mildenhall, Barron, Debevec; 2021S1: Hung-Ting Chen S2: Daniel Almeraz C: Tongrui Li R: Zhou Fang
[S1] [S2] GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields, Niemeyer, Geiger; 2020S1: Cheng-Chun Hsu S2: Sai Kiran Maddela C: R: Yeming Wen
Oct 262D recognition
[C]
[S1] [S2] Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, Ren, He, Girshick, Sun; 2015S1: Sai Kiran Maddela S2: Ian Trowbridge C: Srinath Tankasala R: Liyan Chen
[S1] [S2] You Only Look Once: Unified, Real-Time Object Detection, Redmon, Divvala, Girshick, Farhadi; 2015S1: Christopher Hahn S2: Reid Ling Tong Li C: Jay Liao R: Tongrui Li
[S1] [S2] Focal Loss for Dense Object Detection, Lin, Goyal, Girshick, He, Dollár; 2017S1: Shivang Singh S2: Ayush Chauhan C: Elias Lampietti R: Serdjan Rolovic
[S1] [S2] Mask R-CNN, He, Gkioxari, Dollár, Girshick; 2017S1: Samantha Hay S2: Ojas Patel C: Zhou Fang R: Ishank Arora
[S1] [S2] Cascade R-CNN: Delving into High Quality Object Detection, Cai, Vasconcelos; 2017S1: Tarannum Khan S2: Marlan McInnes-Taylor C: R: Zayne Sprague
Oct 282D recognition
[C]
[S1] [S2] Deformable Convolutional Networks, Dai, Qi, Xiong, Li, Zhang, Hu, Wei; 2017S1: Nilesh Gupta S2: Ishan Shah C: Jordi Ramos Chen R: Kelsey Ball
[S1] CornerNet: Detecting Objects as Paired Keypoints, Law, Deng; 2018S1: Daniel Almeraz S2: C: Jose Chavez R: Joshua Papermaster
[S2] Objects as Points, Zhou, Wang, Krähenbühl; 2019S1: S2: Hung-Ting Chen C: Atreya Dey R: Shivi Agarwal
[S1] [S2] End-to-End Object Detection with Transformers, Carion, Massa, Synnaeve, Usunier, Kirillov, Zagoruyko; 2020S1: Kiran Raja S2: Cheng-Chun Hsu C: Yeming Wen R: Marco Bueso
[S1] [S2] Deformable DETR: Deformable Transformers for End-to-End Object Detection, Zhu, Su, Lu, Li, Wang, Dai; 2020S1: Jay Whang S2: Matthew Kelleher C: ABAYOMI ADEKANMBI R:
Nov 023D recognition
[C]
[S1] [S2] PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation, Qi, Su, Mo, Guibas; 2016S1: Serdjan Rolovic S2: Elias Lampietti C: Matthew Kelleher R: Samantha Hay
[S1] [S2] PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space, Qi, Yi, Su, Guibas; 2017S1: Ishank Arora S2: Yeming Wen C: Ayush Chauhan R: Christopher Hahn
[S1] [S2] Dynamic Graph CNN for Learning on Point Clouds, Wang, Sun, Liu, Sarma, Bronstein, Solomon; 2018S1: Zayne Sprague S2: Zhou Fang C: Reid Ling Tong Li R: Jay Whang
[S1] PointCNN: Convolution On $\mathcal{X}$-Transformed Points, Li, Bu, Sun, Wu, Di, Chen; 2018S1: Liyan Chen S2: C: Marlan McInnes-Taylor R: Nilesh Gupta
[S1] [S2] Point Transformer, Zhao, Jiang, Jia, Torr, Koltun; 2020S1: Kelsey Ball S2: Srinath Tankasala C: Hung-Ting Chen R: Sai Kiran Maddela
Nov 043D recognition
[C]
[S1] [S2] VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection, Zhou, Tuzel; 2017S1: Tongrui Li S2: Jordi Ramos Chen C: R:
[S1] [S2] PointPillars: Fast Encoders for Object Detection from Point Clouds, Lang, Vora, Caesar, Zhou, Yang, Beijbom; 2018S1: Joshua Papermaster S2: ABAYOMI ADEKANMBI C: Ian Trowbridge R: Tarannum Khan
[S2] PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud, Shi, Wang, Li; 2018S1: S2: Jay Liao C: Ishan Shah R: Kiran Raja
[S1] [S2] Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving, Wang, Chao, Garg, Hariharan, Campbell, Weinberger; 2018S1: Shivi Agarwal S2: Atreya Dey C: Ojas Patel R: Daniel Almeraz
[S1] [S2] Center-based 3D Object Detection and Tracking, Yin, Zhou, Krähenbühl; 2020S1: Marco Bueso S2: Jose Chavez C: Cheng-Chun Hsu R: Shivang Singh
Nov 09Open world perception
[C]
[S1] [S2] Momentum Contrast for Unsupervised Visual Representation Learning, He, Fan, Wu, Xie, Girshick; 2019S1: Elias Lampietti S2: Zayne Sprague C: Samantha Hay R: Ojas Patel
[S1] [S2] A Simple Framework for Contrastive Learning of Visual Representations, Chen, Kornblith, Norouzi, Hinton; 2020S1: Zhou Fang S2: Kiran Raja C: Kelsey Ball R: Reid Ling Tong Li
[S1] VirTex: Learning Visual Representations from Textual Annotations, Desai, Johnson; 2020S1: Jay Liao S2: Liyan Chen C: Shivang Singh R: Ian Trowbridge
[S1] [S2] Contrastive Learning of Medical Visual Representations from Paired Images and Text, Zhang, Jiang, Miura, Manning, Langlotz; 2020S1: Atreya Dey S2: Tongrui Li C: Marco Bueso R:
[S1] [S2] Learning Transferable Visual Models From Natural Language Supervision, Radford, Kim, Hallacy, Ramesh, Goh, Agarwal, Sastry, Askell, Mishkin, Clark, Krueger, Sutskever; 2021S1: Jose Chavez S2: Tarannum Khan C: R: Marlan McInnes-Taylor
Nov 11Open world perception
[C]
[S1] Towards Open Set Deep Networks, Bendale, Boult; 2015S1: Yeming Wen S2: C: Daniel Almeraz R: Ayush Chauhan
[S1] [S2] Large-Scale Long-Tailed Recognition in an Open World, Liu, Miao, Zhan, Wang, Gong, Yu; 2019S1: Srinath Tankasala S2: Shivi Agarwal C: Nilesh Gupta R: Matthew Kelleher
[S1] [S2] Class-Balanced Loss Based on Effective Number of Samples, Cui, Jia, Lin, Song, Belongie; 2019S1: Jordi Ramos Chen S2: Ishank Arora C: Jay Whang R: Cheng-Chun Hsu
[S1] [S2] Decoupling Representation and Classifier for Long-Tailed Recognition, Kang, Xie, Rohrbach, Yan, Gordo, Feng, Kalantidis; 2019S1: ABAYOMI ADEKANMBI S2: Joshua Papermaster C: Sai Kiran Maddela R: Hung-Ting Chen
[S2] Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax, Li, Wang, Kang, Tang, Wang, Li, Feng; 2020S1: S2: Serdjan Rolovic C: Christopher Hahn R: Ishan Shah
Nov 16Temporal reasoning and Video
[no code (yet)]
[S1] Two-Stream Convolutional Networks for Action Recognition in Videos, Simonyan, Zisserman; 2014
[S1] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset, Carreira, Zisserman; 2017
[S1] SlowFast Networks for Video Recognition, Feichtenhofer, Fan, Malik, He; 2018
[S1] Is Space-Time Attention All You Need for Video Understanding?, Bertasius, Wang, Torresani; 2021
[S1] Multiscale Vision Transformers, Fan, Xiong, Mangalam, Li, Yan, Malik, Feichtenhofer; 2021
Nov 18Temporal reasoning and Video
[no code (yet)]
[S1] Online Model Distillation for Efficient Video Inference, Mullapudi, Chen, Zhang, Ramanan, Fatahalian; 2018
[S1] Long-Term Feature Banks for Detailed Video Understanding, Wu, Feichtenhofer, Fan, He, Krähenbühl, Girshick; 2018
[S1] Long Short-Term Transformer for Online Action Detection, Xu, Xiong, Chen, Li, Xia, Tu, Soatto; 2021
[S1] Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling, Lei, Li, Zhou, Gan, Berg, Bansal, Liu; 2021
[S1] CLEVRER: CoLlision Events for Video REpresentation and Reasoning, Yi, Gan, Li, Kohli, Wu, Torralba, Tenenbaum; 2019
Nov 23 Final Project Q/A
Nov 25 No class - Thanksgiving
Nov 30 Final Project Presentations
S1: Kelsey Ball S2: Zayne Sprague C: Marco Bueso R:
S1: Hung-Ting Chen S2: Jordi Ramos Chen C: Cheng-Chun Hsu R: Marlan McInnes-Taylor
S1: Atreya Dey S2: C: R:
S1: Christopher Hahn S2: C: R:
S1: Jay Liao S2: Elias Lampietti C: Tongrui Li R: Serdjan Rolovic
S1: ABAYOMI ADEKANMBI S2: Ishank Arora C: Ojas Patel R: Kiran Raja
S1: Srinath Tankasala S2: C: R:
S1: Jay Whang S2: C: R:
Dec 02 Final Project Presentations
S1: Nilesh Gupta S2: Ayush Chauhan C: Tarannum Khan R: Shivi Agarwal
S1: Jose Chavez S2: Joshua Papermaster C: Reid Ling Tong Li R: Samantha Hay
S1: Liyan Chen S2: C: R:
S1: Shivang Singh S2: Zhou Fang C: Daniel Almeraz R:
S1: Matthew Kelleher S2: C: R:
S1: Sai Kiran Maddela S2: Ian Trowbridge C: R:
S1: Ishan Shah S2: C: R:
S1: Yeming Wen S2: C: R:

Your role before of class

Weeks 2-12:

Final week:

Your role in class

Weeks 2-12:

  • [S1][S2] Present your paper (<5min)
  • [C] Show and discuss your implementations (as team 15-30 min)
  • [all] Participate in discussion

Final week:

  • [F] Present your final project

Goals of the class

After this class you should be able to

  • Read and understand deep learning papers
  • Implement and execute a research project in deep learning

Grading

Expected workload

Estimates of required effort to pass the class are:

  • 1-2 h / week paper reading
  • 3 h / week class participation
  • 2-10 h / week coding, summary, or review
  • 20-40 h final project

General tips

  • Start coding and final project early
    • most deep neural networks take 1 day to train on a GPU
    • let us know early if you don’t have GPU access (first or second week), colab or google cloud might be options
  • read the assigned papers early, write down questions and discussion topics

What should I do if I get sick?

  • DO NOT COME TO CLASS
  • We will send out a poll before every class to determine potential remote options
  • You may miss two classes without any excuse

Notes

Syllabus subject to change.