網易首頁 > 網易號 > 正文申請入駐

AI 工程師轉型路線圖！資源匯總

2025-05-16 22:15:55　來源: 機器學習與Python社區

北京舉報

分享至

大家好，我是i學習的老章

周末了，推薦一個新項目——AI工程師轉型路線圖

tips：搭配之前我推薦的幾個工具一起食用，效果更佳

項目地址：https://github.com/InterviewReady/ai-engineering-resources

Tokenization 分詞處理

Byte-pair Encoding
https://arxiv.org/pdf/1508.07909
Byte Latent Transformer: Patches Scale Better Than Tokens
https://arxiv.org/pdf/2412.09871

Vectorization 向量化處理

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
https://arxiv.org/pdf/1810.04805
IMAGEBIND: One Embedding Space To Bind Them All
https://arxiv.org/pdf/2305.05665
SONAR: Sentence-Level Multimodal and Language-Agnostic Representations
https://arxiv.org/pdf/2308.11466
FAISS library
https://arxiv.org/pdf/2401.08281
Facebook Large Concept Models
https://arxiv.org/pdf/2412.08821v2

Infrastructure 基礎設施

TensorFlow
https://arxiv.org/pdf/1605.08695
Deepseek filesystem
https://github.com/deepseek-ai/3FS/blob/main/docs/design_notes.md
Milvus DB
https://www.cs.purdue.edu/homes/csjgwang/pubs/SIGMOD21_Milvus.pdf
Billion Scale Similarity Search : FAISS
https://arxiv.org/pdf/1702.08734
Ray
https://arxiv.org/abs/1712.05889

Core Architecture 核心架構

Attention is All You Need
https://papers.neurips.cc/paper/7181-attention-is-all-you-need.pdf
FlashAttention
https://arxiv.org/pdf/2205.14135
Multi Query Attention
https://arxiv.org/pdf/1911.02150
Grouped Query Attention
https://arxiv.org/pdf/2305.13245
Google Titans outperform Transformers
https://arxiv.org/pdf/2501.00663
VideoRoPE: Rotary Position Embedding
https://arxiv.org/pdf/2502.05173

Mixture of Experts 專家混合模型

Sparsely-Gated Mixture-of-Experts Layer
https://arxiv.org/pdf/1701.06538
GShard
https://arxiv.org/abs/2006.16668
Switch Transformers
https://arxiv.org/abs/2101.03961

RLHF 基于人類反饋的強化學習

Deep Reinforcement Learning with Human Feedback
https://arxiv.org/pdf/1706.03741
Fine-Tuning Language Models with RHLF
https://arxiv.org/pdf/1909.08593
Training language models with RHLF
https://arxiv.org/pdf/2203.02155

Chain of Thought 思維鏈

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
https://arxiv.org/pdf/2201.11903
Chain of thought
https://arxiv.org/pdf/2411.14405v1/
Demystifying Long Chain-of-Thought Reasoning in LLMs
https://arxiv.org/pdf/2502.03373

Reasoning 推理

Transformer Reasoning Capabilities
https://arxiv.org/pdf/2405.18512
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
https://arxiv.org/pdf/2407.21787
Scale model test times is better than scaling parameters
https://arxiv.org/pdf/2408.03314
Training Large Language Models to Reason in a Continuous Latent Space
https://arxiv.org/pdf/2412.06769
DeepSeek R1
https://arxiv.org/pdf/2501.12948v1
A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods
https://arxiv.org/pdf/2502.01618
Latent Reasoning: A Recurrent Depth Approach
https://arxiv.org/pdf/2502.05171
Syntactic and Semantic Control of Large Language Models via Sequential Monte Carlo
https://arxiv.org/pdf/2504.13139

Optimizations 優化方案

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
https://arxiv.org/pdf/2402.17764
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
https://arxiv.org/pdf/2407.08608
ByteDance 1.58
https://arxiv.org/pdf/2412.18653v1
Transformer Square
https://arxiv.org/pdf/2501.06252
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps
https://arxiv.org/pdf/2501.09732
1b outperforms 405b
https://arxiv.org/pdf/2502.06703
Speculative Decoding
https://arxiv.org/pdf/2211.17192

Distillation 蒸餾

Distilling the Knowledge in a Neural Network
https://arxiv.org/pdf/1503.02531
BYOL - Distilled Architecture
https://arxiv.org/pdf/2006.07733
DINO
https://arxiv.org/pdf/2104.14294

SSMs 狀態空間模型

RWKV: Reinventing RNNs for the Transformer Era
https://arxiv.org/pdf/2305.13048
Mamba
https://arxiv.org/pdf/2312.00752
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
https://arxiv.org/pdf/2405.21060
Distilling Transformers to SSMs
https://arxiv.org/pdf/2408.10189
LoLCATs: On Low-Rank Linearizing of Large Language Models
https://arxiv.org/pdf/2410.10254
Think Slow, Fast
https://arxiv.org/pdf/2502.20339

Competition Models 競賽模型

Google Math Olympiad 2
https://arxiv.org/pdf/2502.03544
Competitive Programming with Large Reasoning Models
https://arxiv.org/pdf/2502.06807
Google Math Olympiad 1
https://www.nature.com/articles/s41586-023-06747-5

Hype Makers

Can AI be made to think critically
https://arxiv.org/pdf/2501.04682
Evolving Deeper LLM Thinking
https://arxiv.org/pdf/2501.09891
LLMs Can Easily Learn to Reason from Demonstrations Structure
https://arxiv.org/pdf/2502.07374

Hype Breakers

Separating communication from intelligence
https://arxiv.org/pdf/2301.06627
Language is not intelligence
https://gwern.net/doc/psychology/linguistics/2024-fedorenko.pdf

Image Transformers 圖像轉換器

Image is 16x16 word
https://arxiv.org/pdf/2010.11929
CLIP
https://arxiv.org/pdf/2103.00020
deepseek image generation
https://arxiv.org/pdf/2501.17811

Video Transformers 視頻轉換器

ViViT: A Video Vision Transformer
https://arxiv.org/pdf/2103.15691
Joint Embedding abstractions with self-supervised video masks
https://arxiv.org/pdf/2404.08471
Facebook VideoJAM ai gen
https://arxiv.org/pdf/2502.02492

Case Studies 案例分析

Automated Unit Test Improvement using Large Language Models at Meta
https://arxiv.org/pdf/2402.09171
Retrieval-Augmented Generation with Knowledge Graphs for Customer Service Question Answering
https://arxiv.org/pdf/2404.17723v1
OpenAI o1 System Card
https://arxiv.org/pdf/2412.16720
LLM-powered bug catchers
https://arxiv.org/pdf/2501.12862
Chain-of-Retrieval Augmented Generation
https://arxiv.org/pdf/2501.14342
Swiggy Search
https://bytes.swiggy.com/improving-search-relevance-in-hyperlocal-food-delivery-using-small-language-models-ecda2acc24e6
Swarm by OpenAI
https://github.com/openai/swarm
Netflix Foundation Models
https://netflixtechblog.com/foundation-model-for-personalized-recommendation-1a0bd8e02d39
Model Context Protocol
https://www.anthropic.com/news/model-context-protocol
uber queryGPT
https://www.uber.com/en-IN/blog/query-gpt/

最后推薦一個最近我在學習的大模型課程

特別聲明：以上內容(如有圖片或視頻亦包括在內)為自媒體平臺“網易號”用戶上傳并發布，本平臺僅提供信息存儲服務。

Notice: The content above (including the pictures and videos if any) is uploaded and posted by a user of NetEase Hao, which is a social media platform and only provides information storage services.