Deep Learning

Deep Learning Paper

Deep Learning Paper

4 分钟阅读2000 字

Paper

Image Classification

ALexNetImageNet Classification with Deep Convolutional Neural Networks (NIPS 2012)

ZFNetVisualizing and Understanding Convolutional Networks (ECCV 2014)

GoogLeNetGoing Deeper with Convolutions (CVPR 2015)

Network In Network $1\times1$卷积

Provable Bounds for Learning Some Deep Representations 用稀疏、分散的网络取代以前庞大密集臃肿的网络

InceptionV2Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift (ICML 2015)

InceptionV3Rethinking the Inception Architecture for Computer Vision (CVPR 2016)

InceptionV4Inception-ResNet and the Impact of Residual Connections on Learning (AAAI 2017)

XceptionXception: Deep Learning with Depthwise Separable Convolutions (CVPR 2017)

VGGNetVery Deep Convolutional Networks for Large-Scale Visual Recognition (ICLR 2015)

ResNetDeep Residual Learning for Image Recognition(CVPR 2016)

ResNeXt:ggregated Residual Transformations for Deep Neural Networks-2017

DenseNet:Densely Connected Convolutional Networks

Object Detection

Dense Prediction (one-stage)

anchor based

SSDSSD: Single Shot MultiBox Detector (ECCV 2016)

YOLOYou Only Look Once:Unified, Real-Time Object Detection (CVPR 2016)

YOLOV2YOLO9000: Better, Faster, Stronger (CVPR 2017)

YOLOV3YOLOv3: An Incremental Improvement (CVPR 2018)

YOLOV4YOLOv4: Optimal Speed and Accuracy of Object Detection (CVPR 2020)

Scaled-YOLOv4Scaled-YOLOv4: Scaling Cross Stage Partial Network (CVPR 2021)

IOU_Loss(2016)->GIOU_Loss(2019)->DIOU_Loss(2020)->CIOU_Loss(2020)

YOLOXYOLOX: Exceeding YOLO Series in 2021

YOLOV5

Alpha-IoU:A Family of Power Intersection over Union Losses for Bounding Box Regression (NIPS 2021)

RetinaNetFocal Loss for Dense Object Detection (ICCV 2017)

anchor free

CornerNet:CornerNet: Detecting Objects as Paired Keypoints](https://arxiv.org/abs/1808.01244) (ECCV 2018)

CornerNet-Lite: Efficient Keypoint Based Object Detection (BMVC 2020)

CenterNetCenterNet: Keypoint Triplets for Object Detection (ICCV 2019)

MatrixNet:Matrix Nets: A New Deep Architecture for Object Detection (ICCV 2019)

FCOSFCOS: Fully Convolutional One-Stage Object Detection (ICCV 2019)

Grounding DINOGrounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection (2023)

Sparse Prediction (two-stage)

anchor based

R-CNN:[Rich feature hierarchies for accurate object detection and semantic segmentation (CVPR 2014)

Selective Search for Object Recognition (IJCV 2012)

[Path-aggregation blocks-FPN](####Path-aggregation blocks)

[Additional blocks-SPP](####Additional blocks)

Fast R-CNNFast R-CNN (ICCV 2015)

Faster R-CNNFaster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks (NIPS 2015)

R-FCNR-FCN: Object Detection via Region-based Fully Convolutional Networks (NIPS 2016)

Mask R-CNNMask R-CNN (ICCV 2017)

Libra R-CNN: Libra R-CNN: Towards Balanced Learning for Object Detection (CVPR 2019)

Sparse R-CNNSparse R-CNN: End-to-End Object Detection with Learnable Proposals (CVPR 2021)

anchor free

RepPointsRepPoints: Point Set Representation for Object Detection (ICCV 2019)

Neck

Additional blocks

SPPSpatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition (TPAMI 2015)

ASPPDeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs (TPAMI 2017)

RFBReceptive Field Block Net for Accurate and Fast Object Detection (ECCV 2018)

SAMCBAM: Convolutional Block Attention Module (ECCV 2018)

Path-aggregation blocks

FPNFeature Pyramid Networks for Object Detection (CVPR 2017)

PANPath Aggregation Network for Instance Segmentation (CVPR 2018)

NAS-FPNNAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection (CVPR 2019)

BiFPNEfficientDet: Scalable and Efficient Object Detection (CVPR 2020)

ASFFLearning Spatial Fusion for Single-Shot Object Detection (2019)

SFAMM2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network (AAAI 2019)

轻量化CNN

SqueezeNetSqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size (2016)

SqueezeNext: Hardware-Aware Neural Network Design (2018)

MobileNetMobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications (2017)

MobileNetV2MobileNetV2: Inverted Residuals and Linear Bottlenecks (2018)

MobileNetV3Searching for MobileNetV3 (2019)

MnasNet: Platform-Aware Neural Architecture Search for Mobile (CVPR 2019)

ShuffleNetShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices (2017)

ShuffleNetV2ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design (2018)

PeleeNetPelee: A Real-Time Object Detection System on Mobile Devices (2018)

Shift-AShift: A Zero FLOP, Zero Parameter Alternative to Spatial Convolutions (2018)

GhostNetGhostNet: More Features from Cheap Operations (2020)

Generative Models

GANGenerative Adversarial Networks (2014)

Diffusion-modelsHigh-Resolution Image Synthesis with Latent Diffusion Models (CVPR 2022)

The Principles of Diffusion Models

DITScalable Diffusion Models with Transformers (ICCV 2023)

SDXLSDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis (2023)

FluxFLUX.1 Kontext: Flow Matching for In-Context Image Generation and Editing in Latent Space (2025)

Wan: Wan: Open and Advanced Large-Scale Video Generative Models (2025 Alibaba)

document Parsing

MinerU2.5MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing (2025)

Recommender System

wide&deepWide & Deep Learning for Recommender Systems (2016)

Autonomous Driving

MultiPathMultiPath: Multiple Probabilistic Anchor Trajectory Hypotheses for Behavior Prediction

UniADPlanning-oriented Autonomous Driving (CVPR 2023)

CMTCross Modal Transformer: Towards Fast and Robust 3D Object Detection (ICCV 2023)

GameFormerGameFormer: Game-theoretic Modeling and Learning of Transformer-based Interactive Prediction and Planning for Autonomous Driving (ICCV 2023)

DriveDreamerDriveDreamer: Towards Real-world-driven World Models for Autonomous Driving

FlashOccFlashOcc: Fast and Memory-Efficient Occupancy Prediction via Channel-to-Height Plugin

DVGT: DVGT: Driving Visual Geometry Transformer (2025 xiaomi)

LLM训练/推理优化

FSDPPyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel (2023)

MARLINMARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models (2024)

拓展

ComfyUI:基于节点流程的 Stable Diffusion 高级图形界面。

LightX2V:轻量级图像与视频生成推理框架。

mppp:多精度数值计算库 (C++)。

TFCC:腾讯微信团队开发的服务端深度学习通用推理框架。

如何读论文

李沐

第一遍:关注标题和摘要;结论。实验部分和方法的图表;看看适不适合。海选

第二遍:全过一遍,图表、流程图具体到每个部分;相关文献圈出来。精选

第三遍:知道每句话,每段话在说什么,换位思考。脑补过程。重点研读

0%