4 min read

Ilya's AI Paper List for John Carmack

John Carmack’s interview regarding AI and AGI shares an interesting story:

“So I asked Ilya Sutskever, OpenAI’s chief scientist, for a reading list. He gave me a list of like 40 research papers and said, ‘If you really learn all of these, you’ll know 90% of what matters today.’ And I did. I plowed through all those things and it all started sorting out in my head.”

Here is the list:

(The higher-ranked items represent fundamental breakthroughs, critical architectures, or core theories that one must understand to grasp the evolution of AI.)

  1. Attention Is All You Need - Introduces the transformer architecture, foundational for most modern NLP models, including GPT.
  2. CS231n: Convolutional Neural Networks for Visual Recognition - Fundamental course and resource for understanding CNNs, critical in computer vision tasks.
  3. Deep Residual Learning for Image Recognition - Introduces ResNets, a groundbreaking architecture for deep learning and addressing vanishing gradients.
  4. ImageNet Classification with Deep Convolutional Neural Networks (2012.12) - A pioneering paper demonstrating the power of CNNs in large-scale image classification.
  5. Scaling Laws for Neural Language Models - Critical for understanding how neural networks scale and generalize as model sizes increase, including the behavior of large models like GPT.
  6. Understanding LSTM Networks - Explains LSTMs, which revolutionized sequence-based tasks before transformers took over.
  7. The Annotated Transformer: Attention is All You Need - A detailed walkthrough of the transformer model, essential for understanding modern NLP.
  8. GPipe: Easy Scaling with Micro-Batch Pipeline Parallelism - Important for understanding model parallelism, especially for scaling large models.
  9. Neural Turing Machines - Proposes models with memory capabilities, stepping toward models that can simulate algorithmic reasoning.
  10. Recurrent Neural Network Regularization - Advances in regularizing RNNs, relevant to improving the robustness of sequence-based models.
  11. VARIATIONAL LOSSY AUTOENCODER - A critical work in unsupervised learning and latent variable models.
  12. Kolmogorov Complexity and Algorithmic Randomness - Important theoretical background for complexity theory in AI.
  13. Pointer Networks - Key for understanding models that output sequences of discrete elements, relevant to tasks such as combinatorial optimization.
  14. Deep Speech 2: End-to-End Speech Recognition in English and Mandarin - Advances in speech recognition using end-to-end learning, essential for voice-based AI applications.
  15. Neural Message Passing for Quantum Chemistry - Important for applications of neural networks to graph structures and quantum chemistry.
  16. MULTI-SCALE CONTEXT AGGREGATION BY DILATED CONVOLUTIONS - Relevant for understanding improvements in convolutions, particularly in image segmentation tasks.
  17. Machine Super Intelligence (2008.6) - Philosophical and long-term view of superintelligence, important for understanding AGI.
  18. A Tutorial Introduction to the Minimum Description Length Principle (2004.6) - Theoretical foundation on model selection and complexity.
  19. The Unreasonable Effectiveness of Recurrent Neural Networks - A blog post that made RNNs more accessible and impactful.
  20. Relational recurrent neural networks - Focuses on extending RNNs with relational structures, useful in more complex AI reasoning tasks.
  21. ORDER MATTERS: SEQUENCE TO SEQUENCE FOR SETS - Provides improvements to sequence-to-sequence models, vital for many NLP applications.
  22. Identity Mappings in Deep Residual Networks - Builds on the ResNet paper, further improving deep learning models.
  23. Keeping Neural Networks Simple by Minimizing the Description Length of the Weights (1993.8) - Focuses on regularization in neural networks, relevant for generalization and simplicity.
  24. Quantifying the Rise and Fall of Complexity in Closed Systems: The Coffee Automaton - A more theoretical work focusing on complexity in closed systems, which can be related to AGI.
  25. The First Law of Complexodynamics (2011.9) - Explores concepts related to complexity in computation, with a broad theoretical impact.
  26. A simple neural network module for relational reasoning - Focuses on improving relational reasoning in neural networks.
  27. NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE - Early transformer-based approaches to machine translation, influential but now overtaken by newer models.

I will add