AI Safety Fundamentals: Alignment

Un podcast de BlueDot Impact

Catégories:

83 Épisodes

  1. Public by Default: How We Manage Information Visibility at Get on Board

    Publié: 12/05/2024
  2. Writing, Briefly

    Publié: 12/05/2024
  3. Being the (Pareto) Best in the World

    Publié: 04/05/2024
  4. How to Succeed as an Early-Stage Researcher: The “Lean Startup” Approach

    Publié: 23/04/2024
  5. Become a Person who Actually Does Things

    Publié: 17/04/2024
  6. Planning a High-Impact Career: A Summary of Everything You Need to Know in 7 Points

    Publié: 16/04/2024
  7. Working in AI Alignment

    Publié: 14/04/2024
  8. Computing Power and the Governance of AI

    Publié: 07/04/2024
  9. Emerging Processes for Frontier AI Safety

    Publié: 07/04/2024
  10. Challenges in Evaluating AI Systems

    Publié: 07/04/2024
  11. AI Control: Improving Safety Despite Intentional Subversion

    Publié: 07/04/2024
  12. AI Watermarking Won’t Curb Disinformation

    Publié: 07/04/2024
  13. Interpretability in the Wild: A Circuit for Indirect Object Identification in GPT-2 Small

    Publié: 01/04/2024
  14. Towards Monosemanticity: Decomposing Language Models With Dictionary Learning

    Publié: 31/03/2024
  15. Zoom In: An Introduction to Circuits

    Publié: 31/03/2024
  16. Weak-To-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision

    Publié: 26/03/2024
  17. Can We Scale Human Feedback for Complex AI Tasks?

    Publié: 26/03/2024
  18. Machine Learning for Humans: Supervised Learning

    Publié: 13/05/2023
  19. Four Background Claims

    Publié: 13/05/2023
  20. Biological Anchors: A Trick That Might Or Might Not Work

    Publié: 13/05/2023

2 / 5

Listen to resources from the AI Safety Fundamentals: Alignment course!https://aisafetyfundamentals.com/alignment

Visit the podcast's native language site