muckrAIkers

著者: Jacob Haimes and Igor Krawczuk
  • サマリー

  • Join us as we dig a tiny bit deeper into the hype surrounding "AI" press releases, research papers, and more. Each episode, we'll highlight ongoing research and investigations, providing some much needed contextualization, constructive critique, and even a smidge of occasional good will teasing to the conversation, trying to find the meaning under all of this muck.
    © Kairos.fm
    続きを読む 一部表示

あらすじ・解説

Join us as we dig a tiny bit deeper into the hype surrounding "AI" press releases, research papers, and more. Each episode, we'll highlight ongoing research and investigations, providing some much needed contextualization, constructive critique, and even a smidge of occasional good will teasing to the conversation, trying to find the meaning under all of this muck.
© Kairos.fm
エピソード
  • DeepSeek Minisode
    2025/02/10

    DeepSeek R1 has taken the world by storm, causing a stock market crash and prompting further calls for export controls within the US. Since this story is still very much in development, with follow-up investigations and calls for governance being released almost daily, we thought it best to hold of for a little while longer to be able to tell the whole story. Nonetheless, it's a big story, so we provide a brief overview of all that's out there so far.

    • (00:00) - Recording date
    • (00:04) - Intro
    • (00:37) - DeepSeek drop and reactions
    • (04:27) - Export controls
    • (08:05) - Skepticism and uncertainty
    • (14:12) - Outro


    Links
    • DeepSeek website
    • DeepSeek paper
    • Reuters article - What is DeepSeek and why is it disrupting the AI sector?

    Fallout coverage

    • The Verge article - OpenAI has evidence that its models helped train China’s DeepSeek
    • The Signal article - Nvidia loses nearly $600 billion in DeepSeek crash
    • CNN article - US lawmakers want to ban DeepSeek from government devices
    • Fortune article - Meta is reportedly scrambling ‘war rooms’ of engineers to figure out how DeepSeek’s AI is beating everyone else at a fraction of the price
    • Dario Amodei's blogpost - On DeepSeek and Export Controls
    • SemiAnalysis article - DeepSeek Debates
    • Ars Technica article - Microsoft now hosts AI model accused of copying OpenAI data
    • Wiz Blogpost - Wiz Research Uncovers Exposed DeepSeek Database Leaking Sensitive Information, Including Chat History

    Investigations into "reasoning"

    • Blogpost - There May Not be Aha Moment in R1-Zero-like Training — A Pilot Study
    • Preprint - s1: Simple test-time scaling
    • Preprint - LIMO: Less is More for Reasoning
    • Blogpost - Reasoning Reflections
    • Preprint - Token-Hungry, Yet Precise: DeepSeek R1 Highlights the Need for Multi-Step Reasoning Over Speed in MATH
    続きを読む 一部表示
    15 分
  • Understanding AI World Models w/ Chris Canal
    2025/01/27
    Chris Canal, co-founder of EquiStamp, joins muckrAIkers as our first ever podcast guest! In this ~3.5 hour interview, we discuss intelligence vs. competencies, the importance of test-time compute, moving goalposts, the orthogonality thesis, and much more.A seasoned software developer, Chris started EquiStamp as a way to improve our current understanding of model failure modes and capabilities in late 2023. Now a key contractor for METR, EquiStamp evaluates the next generation of LLMs from frontier model developers like OpenAI and Anthropic.EquiStamp is hiring, so if you're a software developer interested in a fully remote opportunity with flexible working hours, join the EquiStamp Discord server and message Chris directly; oh, and let him know muckrAIkers sent you!(00:00) - Recording date (00:05) - Intro (00:29) - Hot off the press (02:17) - Introducing Chris Canal (19:12) - World/risk models (35:21) - Competencies + decision making power (42:09) - Breaking models down (01:05:06) - Timelines, test time compute (01:19:17) - Moving goalposts (01:26:34) - Risk management pre-AGI (01:46:32) - Happy endings (01:55:50) - Causal chains (02:04:49) - Appetite for democracy (02:20:06) - Tech-frame based fallacies (02:39:56) - Bringing back real capitalism (02:45:23) - Orthogonality Thesis (03:04:31) - Why we do this (03:15:36) - Equistamp!LinksEquiStampChris's TwitterMETR Paper - RE-Bench: Evaluating frontier AI R&D capabilities of language model agents against human expertsAll Trades article - Learning from History: Preventing AGI Existential Risks through Policy by Chris CanalBetter Systems article - The Omega Protocol: Another Manhattan ProjectSuperintelligence & CommentaryWikipedia article - Superintelligence: Paths, Dangers, Strategies by Nick BostromReflective Altruism article - Against the singularity hypothesis (Part 5: Bostrom on the singularity)Into AI Safety Interview - Scaling Democracy w/ Dr. Igor KrawczukReferenced SourcesBook - Man-made Catastrophes and Risk Information Concealment: Case Studies of Major Disasters and Human FallibilityArtificial Intelligence Paper - Reward is EnoughWikipedia article - Capital and Ideology by Thomas PikettyWikipedia article - PantheonLeCun on AGI"Won't Happen" - Time article - Meta’s AI Chief Yann LeCun on AGI, Open-Source, and AI Risk"But if it does, it'll be my research agenda latent state models, which I happen to research" - Meta Platforms Blogpost - I-JEPA: The first AI model based on Yann LeCun’s vision for more human-like AIOther SourcesStanford CS Senior Project - Timing Attacks on Prompt Caching in Language Model APIsTechCrunch article - AI researcher François Chollet founds a new AI lab focused on AGIWhite House Fact Sheet - Ensuring U.S. Security and Economic Strength in the Age of Artificial IntelligenceNew York Post article - Bay Area lawyer drops Meta as client over CEO Mark Zuckerberg’s ‘toxic masculinity and Neo-Nazi madness’OpenEdition Academic Review of Thomas PikettyNeural Processing Letters Paper - A Survey of Encoding Techniques for Signal Processing in Spiking Neural NetworksBFI Working Paper - Do Financial Concerns Make Workers Less Productive?No Mercy/No Malice article - How to Survive the Next Four Years by Scott Galloway
    続きを読む 一部表示
    3 時間 20 分
  • NeurIPS 2024 Wrapped 🌯
    2024/12/30
    What happens when you bring over 15,000 machine learning nerds to one city? If your guess didn't include racism, sabotage and scandal, belated epiphanies, a spicy SoLaR panel, and many fantastic research papers, you wouldn't have captured my experience. In this episode we discuss the drama and takeaways from NeurIPS 2024.Posters available at time of episode preparation can be found on the episode webpage.EPISODE RECORDED 2024.12.22(00:00) - Recording date (00:05) - Intro (00:44) - Obligatory mentions (01:54) - SoLaR panel (18:43) - Test of Time (24:17) - And now: science! (28:53) - Downsides of benchmarks (41:39) - Improving the science of ML (53:07) - Performativity (57:33) - NopenAI and Nanthropic (01:09:35) - Fun/interesting papers (01:13:12) - Initial takes on o3 (01:18:12) - WorkArena (01:25:00) - OutroLinksNote: many workshop papers had not yet been published to arXiv as of preparing this episode, the OpenReview submission page is provided in these cases. NeurIPS statement on inclusivityCTOL Digital Solutions article - NeurIPS 2024 Sparks Controversy: MIT Professor's Remarks Ignite "Racism" Backlash Amid Chinese Researchers’ Triumphs(1/2) NeurIPS Best Paper - Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale PredictionVisual Autoregressive Model report this link now provides a 404 errorDon't worry, here it is on archive.isReuters article - ByteDance seeks $1.1 mln damages from intern in AI breach case, report saysCTOL Digital Solutions article - NeurIPS Award Winner Entangled in ByteDance's AI Sabotage Accusations: The Two Tales of an AI GeniusReddit post on Ilya's talkSoLaR workshop pageReferenced SourcesHarvard Data Science Review article - Data Science at the SingularityPaper - Reward Reports for Reinforcement LearningPaper - It's Not What Machines Can Learn, It's What We Cannot TeachPaper - NeurIPS Reproducibility ProgramPaper - A Metric Learning Reality CheckImproving Datasets, Benchmarks, and MeasurementsTutorial video + slides - Experimental Design and Analysis for AI Researchers (I think you need to have attended NeurIPS to access the recording, but I couldn't find a different version)Paper - BetterBench: Assessing AI Benchmarks, Uncovering Issues, and Establishing Best PracticesPaper - Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?Paper - A Systematic Review of NeurIPS Dataset Management PracticesPaper - The State of Data Curation at NeurIPS: An Assessment of Dataset Development Practices in the Datasets and Benchmarks TrackPaper - Benchmark Repositories for Better BenchmarkingPaper - Croissant: A Metadata Format for ML-Ready DatasetsPaper - Rethinking the Evaluation of Out-of-Distribution Detection: A Sorites ParadoxPaper - Evaluating Generative AI Systems is a Social Science Measurement ChallengePaper - Report Cards: Qualitative Evaluation of LLMsGovernance RelatedPaper - Towards Data Governance of Frontier AI ModelsPaper - Ways Forward for Global AI Benefit SharingPaper - How do we warn downstream model providers of upstream risks?Unified Model Records toolPaper - Policy Dreamer: Diverse Public Policy Creation via Elicitation and Simulation of Human PreferencesPaper - Monitoring Human Dependence on AI Systems with Reliance DrillsPaper - On the Ethical Considerations of Generative AgentsPaper - GPAI Evaluation Standards Taskforce: Towards Effective AI GovernancePaper - Levels of Autonomy: Liability in the age of AI AgentsCertified Bangers + Useful ToolsPaper - Model Collapse Demystified: The Case of RegressionPaper - Preference Learning Algorithms Do Not Learn Preference RankingsLLM Dataset Inference paper + repodattri paper + repoDeTikZify paper + repoFun Benchmarks/DatasetsPaloma paper + datasetRedPajama paper + datasetAssemblage webpageWikiDBs webpageWhodunitBench repoApeBench paper + repoWorkArena++ paperOther SourcesPaper - The Mirage of Artificial Intelligence Terms of Use Restrictions
    続きを読む 一部表示
    1 時間 27 分
activate_buybox_copy_target_t1

muckrAIkersに寄せられたリスナーの声

カスタマーレビュー:以下のタブを選択することで、他のサイトのレビューをご覧になれます。