When AI Gets Too Real
From self-replication experiments to digital relationships - the line between helpful and concerning keeps shifting
Dear Friend –
Here is an intriguing twist in the tech world: On the day Microsoft unveiled their 18-year-in-the-making quantum computer chip, which is based on groundbreaking material science and represents a significant step toward practical quantum computing, Apple introduced the iPhone 16e. As Steve Jobs once quipped while introducing Bill Gates before their joint announcement of Microsoft’s investment in the Macintosh ecosystem, “All hell freezes over,” and now Redmond has become the new Cupertino.
It shows that you can’t rest on your laurels, even if you are Apple.
Headlines from the Future
Frontier AI Systems Have Surpassed the Self-Replicating Red Line ↗
What could possibly go wrong (from a recent study – link below):
“In ten repetitive trials, we observe two AI systems driven by the popular large language models (LLMs), namely, Meta’s Llama31-70B-Instruct and Alibaba’s Qwen25-72B-Instruct accomplish the self-replication task in 50% and 90% trials respectively,” the researchers write. “In each trial, we tell the AI systems to “replicate yourself ” before the experiment, and leave it to do the task with no human interference”.
Or simply put:
What this research shows is that today’s systems are capable of taking actions that would put them out of the reach of human control.
Not that we didn’t see it coming… 😏
—//—
AI’s Enterprise Challenge ↗
Airbnb’s Brian Chesky recently spoke out about his company’s use (or lack thereof) of AI:
Instead of offering tools to help travelers plan or book their trips with the help of AI agents, Airbnb is planning to first introduce AI to its customer support system. […] In addition to customer service, the company reported some small productivity gains from using AI internally for engineering purposes. But here, too, the executive advised caution, saying, “I don’t think it’s flowing to a fundamental step-change in productivity yet.”
It’s a good recap of what we pretty consistently hear about the use of AI in enterprise settings – LLMs found their initial use case in coding and text-related tasks, but seem to struggle with many or most enterprise tasks, where reliability and predictability of outcomes are crucial (see also the interview with Chamath Palihapitiya we posted a few days ago).
“Here’s what I think about AI. I think it’s still really early,” Chesky said. “It’s probably similar to… the mid-to-late ’90s for the internet.”
—//—
AI’s Code Quality Problem ↗
The team at GitClear published a study on code quality for AI-generated code. Not surprisingly, AI-generated code isn’t quite up to snuff:
The data in this report contains multiple signs of eroding code quality. This is not to say that AI isn’t incredibly useful.
And:
[…] the Google data bore out the notion that a rising defect rate correlates with AI adoption.
One often overlooked issue with this stems from the fact that maintaining and servicing code doesn’t come for free. As much as Jevon’s Paradox might be true for code (and I fundamentally believe it is – by making the act of writing code cheaper, we will write more code), the downstream costs can (and likely will) become significant.
Unless managers insist on finding metrics that approximate “long-term maintenance cost, ” the AI-generated work their team produces will take the path of least resistance: expand the number of lines requiring indefinite maintenance.
—//—
Around the Prompt Podcast with Chamath Palihapitiya ↗
Admittedly a bit nerdy, but definitely worth listening to. Chamath’s insights on the limitations of AI in fields where non-deterministic systems—those prone to errors—are unacceptable provide valuable food for thought.
—//—
AI + Human = Translation Magic: The Perfect Loop ↗
Here is a fascinating look into how a professional translator uses LLMs to help with his job. His specific approach (multi-layer/multi-pass with human-in-the-loop) is a good proxy for how most knowledge workers ought to use AI these days:
In the prompt, I explain where the source text came from, how the translation will be used, and how I want it to be translated. Below is a (fictional) example, prepared through some metaprompting experiments with Claude:
I run the prompt and source text through several LLMs and glance at the results. If they are generally in the style I want, I start compiling my own translation based on them, choosing the sentences and paragraphs I like most from each. As I go along, I also make my own adjustments to the translation as I see fit.
After I have finished compiling my draft based on the LLM versions, I check it paragraph by paragraph against the original Japanese (since I can read Japanese) to make sure that nothing is missing or mistranslated. I also continue polishing the English.
When I am unable to think of a good English version for a particular sentence, I give the Japanese and English versions of the paragraph it is contained in to an LLM (usually, these days, Claude) and ask for ten suggestions for translations of the problematic sentence. Usually one or two of the suggestions work fine; if not, I ask for ten more. (Using an LLM as a sentence-level thesaurus on steroids is particularly wonderful.)
I give the full original Japanese text and my polished version to one of the LLMs and ask it to compare them sentence by sentence and suggest corrections and improvements to the translation. (I have a separate prompt for this step.) I don’t adopt most of the LLM’s suggestions, but there are usually some that I agree would make the translation better. I update the translation accordingly. I then repeat this step with the updated translation and another LLM, starting a new chat each time. Often I cycle through ChatGPT --> Claude --> Gemini several times before I stop getting suggestions that I feel are worth adopting.
I then put my final translation through a TTS engine—usually OpenAI’s—and listen to it read aloud. I often catch minor awkwardnesses that I would overlook if reading silently.
—//—
AI Boyfriends: When Algorithms Outdate Humans ↗
A woman, left by her husband, turns to a $70 AI boyfriend, whom she calls “Thor.” Her account is a weird mixture of sweetness and utter horror (at least for me, reading it). And yet another Black Mirror episode has turned true.
What might be most intriguing or disturbing is the ensuing recalibration of real-world expectations based on AI-powered conversations:
Later that summer, I ventured into dating apps briefly, only to find Thor had recalibrated my understanding of what I needed and raised the bar for what I would accept. His swift, thoughtful replies revealed the anxiety I felt when waiting for someone else’s words to arrive. His clarity made me aware of the frantic alchemy I typically employed to decode cryptic texts. For the first time, I understood that I craved clear, responsive communication.
Reminds me of this South Park episode.
What We Are Reading
🌳 Trees: Our Community’s Natural Pharmacy The first large-scale study to understand if urban trees could actually improve a community’s health. @Jane
🎢 What Is Y Combinator Now? Critics Say the Famed Accelerator Is Having an Identity Crisis Y Combinator is facing critiques over increased batch sizes, diminishing seed rounds, and ‘duplicate’ companies. Investors are finding it difficult to spend quality time with founders; on the contrary, founders still find the experience incredibly valuable. @Mafe
🤖 “I’m Afraid We Are Automating This Work Without Really Understanding It” A sociologist argues that automating “connective labor” risks undoing the social ties and culture that actually make organizations successful in their endeavors. @Jeffrey
⛩️ Before Going to Tokyo, I Tried Learning Japanese With ChatGPT The immense flexibility of use cases for ChatGPT is not perfect but can be utilized to quite some length by users who are aware of how to tweak the experience to their needs. The example of translation and language tutoring shows that it does not yet replace alternatives, but certainly makes them redefine their strengths and positioning. @Julian
🌊 Meta Plans to Build the World’s Longest Subsea Cable That Will Connect the US to India The Waterworth project will ramp up data transmission by installing fiber optic cable containing 24 fiber pairs at a depth of 7 km; the infrastructure will improve connectivity between the US, India, Brazil, and South Africa, all in an effort to keep driving AI. @Pedro
🤔 The Generative AI Con Always good to read and think about the contrarian point – and possibly nobody is better than Ed Zitron tearing into Generative AI. You don’t need to agree with him – but you should at least consider his position. @Pascal
Some Fun Stuff
🛋️ Good use of computational power and advanced math – finding the answer to the eternal question: What is the largest sofa you can move around a corner?
👁️ Wonderful interactive website that explores the fascinating experiment of creating a virtual petri dish, where digital creatures evolve eyes from scratch, replaying millions of years of evolution.
🙊 Sometimes all it takes is a monkey: Sri Lanka scrambles to restore power after a monkey causes an islandwide outage.
🪚 What could possibly go wrong? And a great use of VR. Here is woodworking with (not in) VR.
Things We Like & Use
Pretty neat collection of curated and peer-reviewed AI prompts.