Deep Learning Paper
Paper
Image Classification
ALexNet:ImageNet Classification with Deep Convolutional Neural Networks (NIPS 2012)
ZFNet:Visualizing and Understanding Convolutional Networks (ECCV 2014)
GoogLeNet:Going Deeper with Convolutions (CVPR 2015)
Network In Network $1\times1$卷积
Provable Bounds for Learning Some Deep Representations 用稀疏、分散的网络取代以前庞大密集臃肿的网络
InceptionV2:Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift (ICML 2015)
InceptionV3:Rethinking the Inception Architecture for Computer Vision (CVPR 2016)
InceptionV4:Inception-ResNet and the Impact of Residual Connections on Learning (AAAI 2017)
Xception:Xception: Deep Learning with Depthwise Separable Convolutions (CVPR 2017)
VGGNet:Very Deep Convolutional Networks for Large-Scale Visual Recognition (ICLR 2015)
ResNet:Deep Residual Learning for Image Recognition(CVPR 2016)
ResNeXt:ggregated Residual Transformations for Deep Neural Networks-2017
Object Detection
Dense Prediction (one-stage)
anchor based
SSD:SSD: Single Shot MultiBox Detector (ECCV 2016)
YOLO:You Only Look Once:Unified, Real-Time Object Detection (CVPR 2016)
YOLOV2:YOLO9000: Better, Faster, Stronger (CVPR 2017)
YOLOV3:YOLOv3: An Incremental Improvement (CVPR 2018)
YOLOV4:YOLOv4: Optimal Speed and Accuracy of Object Detection (CVPR 2020)
Scaled-YOLOv4:Scaled-YOLOv4: Scaling Cross Stage Partial Network (CVPR 2021)
IOU_Loss(2016)->GIOU_Loss(2019)->DIOU_Loss(2020)->CIOU_Loss(2020)
YOLOX:YOLOX: Exceeding YOLO Series in 2021
YOLOV5:
Alpha-IoU:A Family of Power Intersection over Union Losses for Bounding Box Regression (NIPS 2021)
RetinaNet:Focal Loss for Dense Object Detection (ICCV 2017)
anchor free
CornerNet:CornerNet: Detecting Objects as Paired Keypoints](https://arxiv.org/abs/1808.01244) (ECCV 2018)
CornerNet-Lite: Efficient Keypoint Based Object Detection (BMVC 2020)
CenterNet:CenterNet: Keypoint Triplets for Object Detection (ICCV 2019)
MatrixNet:Matrix Nets: A New Deep Architecture for Object Detection (ICCV 2019)
FCOS:FCOS: Fully Convolutional One-Stage Object Detection (ICCV 2019)
Grounding DINO: Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection (2023)
Sparse Prediction (two-stage)
anchor based
R-CNN:[Rich feature hierarchies for accurate object detection and semantic segmentation (CVPR 2014)
Selective Search for Object Recognition (IJCV 2012)
[Path-aggregation blocks-FPN](####Path-aggregation blocks)
[Additional blocks-SPP](####Additional blocks)
Fast R-CNN:Fast R-CNN (ICCV 2015)
Faster R-CNN:Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks (NIPS 2015)
R-FCN:R-FCN: Object Detection via Region-based Fully Convolutional Networks (NIPS 2016)
Mask R-CNN:Mask R-CNN (ICCV 2017)
Libra R-CNN: Libra R-CNN: Towards Balanced Learning for Object Detection (CVPR 2019)
Sparse R-CNN:Sparse R-CNN: End-to-End Object Detection with Learnable Proposals (CVPR 2021)
anchor free
RepPoints:RepPoints: Point Set Representation for Object Detection (ICCV 2019)
Neck
Additional blocks
SPP:Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition (TPAMI 2015)
ASPP:DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs (TPAMI 2017)
RFB:Receptive Field Block Net for Accurate and Fast Object Detection (ECCV 2018)
SAM:CBAM: Convolutional Block Attention Module (ECCV 2018)
Path-aggregation blocks
FPN:Feature Pyramid Networks for Object Detection (CVPR 2017)
PAN:Path Aggregation Network for Instance Segmentation (CVPR 2018)
NAS-FPN:NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection (CVPR 2019)
BiFPN:EfficientDet: Scalable and Efficient Object Detection (CVPR 2020)
ASFF:Learning Spatial Fusion for Single-Shot Object Detection (2019)
SFAM: M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network (AAAI 2019)
轻量化CNN
SqueezeNet:SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size (2016)
MobileNet:MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications (2017)
MobileNetV2:MobileNetV2: Inverted Residuals and Linear Bottlenecks (2018)
MobileNetV3:Searching for MobileNetV3 (2019)
MnasNet: Platform-Aware Neural Architecture Search for Mobile (CVPR 2019)
ShuffleNet:ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices (2017)
ShuffleNetV2:ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design (2018)
PeleeNet:Pelee: A Real-Time Object Detection System on Mobile Devices (2018)
Shift-A:Shift: A Zero FLOP, Zero Parameter Alternative to Spatial Convolutions (2018)
GhostNet: GhostNet: More Features from Cheap Operations (2020)
Generative Models
GAN:Generative Adversarial Networks (2014)
Diffusion-models:High-Resolution Image Synthesis with Latent Diffusion Models (CVPR 2022)
DIT:Scalable Diffusion Models with Transformers (ICCV 2023)
SDXL:SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis (2023)
Flux:FLUX.1 Kontext: Flow Matching for In-Context Image Generation and Editing in Latent Space (2025)
Wan: Wan: Open and Advanced Large-Scale Video Generative Models (2025 Alibaba)
document Parsing
MinerU2.5:MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing (2025)
Recommender System
wide&deep:Wide & Deep Learning for Recommender Systems (2016)
Autonomous Driving
MultiPath:MultiPath: Multiple Probabilistic Anchor Trajectory Hypotheses for Behavior Prediction
UniAD:Planning-oriented Autonomous Driving (CVPR 2023)
CMT:Cross Modal Transformer: Towards Fast and Robust 3D Object Detection (ICCV 2023)
GameFormer:GameFormer: Game-theoretic Modeling and Learning of Transformer-based Interactive Prediction and Planning for Autonomous Driving (ICCV 2023)
DriveDreamer:DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving
FlashOcc:FlashOcc: Fast and Memory-Efficient Occupancy Prediction via Channel-to-Height Plugin
LLM训练/推理优化
FSDP:PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel (2023)
MARLIN: MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models (2024)
拓展
ComfyUI:基于节点流程的 Stable Diffusion 高级图形界面。
LightX2V:轻量级图像与视频生成推理框架。
mppp:多精度数值计算库 (C++)。
TFCC:腾讯微信团队开发的服务端深度学习通用推理框架。
如何读论文
第一遍:关注标题和摘要;结论。实验部分和方法的图表;看看适不适合。海选
第二遍:全过一遍,图表、流程图具体到每个部分;相关文献圈出来。精选
第三遍:知道每句话,每段话在说什么,换位思考。脑补过程。重点研读