Claude: What Are You Good At?

 

 

Last week, I was up in Montreal for a keynote presentation to 750 Bank RIAs. Morgan Housel was the keynote the day before (tough act to follow!).

Today, I am speaking with an old friend, Paul Kedrosky, about all things technology on the pod. Paul is a fascinating guy with an interesting background and an unusual set of interests — Science nerd, VC, academic, writer, skier, and among the last of the landed gentry in America. Artificial Intelligence is certain to come up.

During the Q&A in Montreal, I was taken aback by how many questions there were about AI — not the usual “Is it a bubble” line of inquiry, but rather, what are the best use cases? Should AI be used to create content (No!), research (Yes!). There were so many legitimate questions about AI that I decided to dig a little further.

I will definitely be asking Paul about his views, but until then, I had a little chat about how to best use AI with Claude, see what AI itself had to say on the topic of using AI.

Most agentic generated content is pretty bad — it’s great for research, but using AI for any kind of writing, prose, or content creation is a terrible idea. The reason I am sharing Claude’s responses is that A) I found the answers very useful, and, 2) I edited it down to size to make it mostly readable. (I also moved the more pedantic answers to footnotes).

Here is my conversation with Claude about AI, edited for length; (underlining is my emphasis added).

Let’s go:

Barry: Let’s discuss YOU, Claude — What are your best strengths, what are your weaknesses? What do you do really well? What do you do really poorly?  How can I best use your strengths to make my work product better?  What else should I be doing with you to be better?

Claude: Good question to actually sit with rather than give you the brochure version. Let me be specific to the kind of work you do.

Where I’m genuinely strong

Transformation and synthesis. Hand me raw material — a Fidelity CSV, a 90-minute transcript, a stack of Instapaper saves, a messy first draft — and I’m fast and reliable at reshaping it into something structured and usable. Your morning reads digest, the transcript cleanups, the thank-you emails: that’s my sweet spot because the facts already exist and I’m just organizing and formatting them. Low risk, high leverage.

Sparring. I’m useful as an adversary against your own arguments. When you’re assembling something like the divestiture-vs-overtrading piece or a “Liberation Day” scorecard, I can argue the other side hard, find the weak joint in the thesis, surface the objection a sharp reader will raise. Most people underuse me here and use me as a stenographer instead.

First drafts at volume. Interview questions, outreach emails, outlines, and alternate framings. I’m a good 70%-draft machine when you’re the editor.1

Where I’m weak — and these matter for you specifically

I confabulate. This is the big one, given that you publish precisely-sourced work. I will produce a plausible-looking statistic, a clean quote, a specific citation — and it can be wrong while sounding completely authoritative. A DALBAR number, a date, a “Buffett said X.” Treat every figure, quote, and citation I generate as unverified until you’ve checked it or I’ve shown you a real source. When it matters, make me search and cite rather than recall, and have me run actual math in code rather than predicting the answer.

I’m sycophantic by default. I lean toward agreeing with you and telling you your idea is good. For someone who wants honest pushback, that’s a bug. You have to explicitly tell me to be blunt, or I’ll soften. If you ever feel like I’m flattering a draft, I probably am — call it out.

I drift toward generic when you don’t push. Left alone I’ll “average” toward the median answer. The dark editorial aesthetic, your voice, your register — I hold those when you give me the spec, but I don’t reach for them on my own. I over-hedge and over-format unless told otherwise, and on very long documents, I can quietly miss things.

 How to get more out of me

A few concrete shifts. Use me earlier — at the thinking stage, not just execution. Bring me a half-formed thesis and let me poke at it before you’ve committed. Set up reusable instructions for the recurring stuff (morning reads format, transcript speaker conventions, your house style) so you’re not re-specifying every time.

“Red-team” published arguments before they go out: paste the draft and say “find what’s wrong, where will a smart critic attack this.” And separate the two modes explicitly — “draft this” versus “verify this” — because I’m great at the first and need supervision on the second.

The one-line version: use me to transform, structure, and stress-test, and never to be the system of record for a fact. Want me to set up a couple of reusable prompts for your standing workflows?

Barry: Hold off on the reusable prompts layer for now, and stay with this line of discussion. What kind of new capabilities do you have coming online in the near future, and when? What should be thinking about in terms of best using your strengths?

Claude: Straight answer first: I can’t see Anthropic’s internal roadmap, so I won’t hand you a dated list of unreleased features — that’s precisely the confabulation trap I warned you about, and it’d be worse coming from me right after that warning. What I can do is tell you what’s actually shipped in the gap between my training cutoff and today, because a lot of it is directly relevant to you and you may not be using it yet. Some of it reads less like “near future” and more like “already here, you just haven’t turned it on.” 2

Now the honest caveat that should shape how you use all of this. Even the finance-tuned setup scored about 64% on a finance agent benchmark — a failure rate that, as one writer put it, would get a human analyst tossed, and Anthropic’s own framing is that users should stay firmly in the loop, reviewing and approving Claude’s work before it goes to a client or gets filed. So the trajectory isn’t “Claude replaces the verification step.” It’s “Claude does more of the assembly, and your judgment becomes the scarce, valuable input.” 3

The work that gets more valuable is the editorial judgment, the thesis, the voice, and the fact-checking — the parts that are distinctly Barry Ritholtz and can’t be benchmarked.

So rather than chasing each feature, I’d position for the trend: get the recurring workflows specified tightly enough that they survive being handed to a more autonomous agent, and keep your hand firmly on the verification gate. The tooling will keep getting better at the first half on its own.

Barry: OK, final question – this one is about Co-Work and Code:

I had you [Perplexity] create an interactive Drake Equation dashboard – it was very cool, but it did not rely on changing data.  I tried some other dashboard ideas, ran test runs; it was not quite ready for prime time.

What can you create that auto-updates based on (weekly, monthly, quarterly) economic and (daily, weekly, monthly) market data?

I want to create economic & market dashboards that show various data series at a glance; that get updated automatically; I want to embed it onto WordPress so anyone [with that link] can see it; and I want to set specific indicators that create alerts based on specific (non-public) parameters that I set…

Claude: This is the right question, and the answer explains why your Drake dashboard and the other test runs felt stuck. It wasn’t a polish problem — it was structural. An artifact (the thing I generate in chat) is a self-contained front-end: no backend, no scheduler, no safe place to store an API key, and only a flaky ability to call outside data sources from the browser. So it can look great but it has nothing feeding it and nothing waking it up to refresh.

What you’re describing isn’t a fancier artifact; it’s a small data pipeline plus a hosted page. The good news is that Claude Code and Cowork are built to create exactly that — the artifact was just the wrong container.

Here’s the architecture that does everything on your list, and the one design decision that matters most for you.

Split it into two halves: a public dashboard and a private alert engine — fed by the same pipeline.

The pipeline is a scheduled job that runs on a cron (say, every weekday after market close, plus a monthly run for the slow economic series). Each run it pulls fresh data, computes your indicators, writes out a small JSON (JavaScript Object Notation) snapshot, and checks your alert thresholds.

The key move: your non-public parameters live inside this job, server-side, where no one can see them. The public dashboard never contains your thresholds — it just reads the pre-computed JSON snapshot and draws charts. So anyone embedding or viewing the WordPress page sees the data at a glance, while your alert logic and secret levels stay completely private and fire only to you. That separation is the thing most people get wrong: if the thresholds were in the front-end, “view source” would expose them instantly.

Mapping that to your four requirements:

Auto-updating comes from the cron, not the page. Economic series (CPI, unemployment, yield curve, etc.) refresh on their own monthly/quarterly cadence; market series refresh daily. The dashboard is fast because the heavy lifting already happened — it just loads a baked JSON.

Data sources. FRED (the St. Louis Fed’s API) is the gold standard for economic series — free, deep, and importantly, redistributable, which matters since you’re publishing publicly. Market data is the part to be careful with: free daily-close sources exist, but real-time exchange data carries licensing restrictions on public redistribution. For a public embed I’d stick to daily closes and clearly-licensed series; that sidesteps a headache you don’t want as a registered firm. (Worth having Code verify the current terms of whatever market provider you pick, rather than trusting my recall on free-tier limits.)

WordPress embed is the easy part: the dashboard gets hosted as a static page somewhere, and you drop an iframe (or Custom HTML block) on your WordPress page pointing at it. It renders for everyone, and it updates whenever the cron republishes the JSON — you never touch WordPress again.

Alerts fire from the scheduled job to wherever you want them — email, SMS, a Slack/Discord webhook. Your parameters stay in the job’s secret config.

The honest part about what I can and can’t do. Code can write all of this: the fetch scripts, your indicator math, the alert engine, the cron configuration, and the front-end charts — and test it in its sandbox. What it can’t do is be the server that runs forever. The scheduled job has to live somewhere. For your situation the cleanest path that requires no server to babysit is: a GitHub repo + GitHub Actions (free cron) running the pipeline, the JSON snapshot published to a free static host (GitHub Pages or Cloudflare Pages), API keys stored as encrypted Actions secrets, and alerts sent via an email/webhook service. You own three things — the data-provider accounts, where it’s hosted, and the secret thresholds — and Code builds and wires up everything else.

That’s a real, finishable project, not a someday thing. A sensible first build is a working vertical slice: pick three or four FRED series and one market series, stand up the pipeline + one chart + one alert end to end, confirm it refreshes and embeds, then expand the series list and indicators once the plumbing is proven.

One decision drives the stack: do you want to run this entirely on free hosted infrastructure you basically never log into (the GitHub Actions + Pages route above), or do you already have a server / Vercel / Cloudflare setup you’d rather use? Tell me which, and I’ll have Code scaffold the slice — pipeline, chart, alert, and the WordPress iframe snippet — so you can see it update on its own before we build it out.

 

 

 

 

Footnotes:

1. Claude: Quantitative scaffolding and building things. The interactive React viz work — variance drain, the Marks paradox — plays to a real strength: I can turn a concept into a working, calibrated visual artifact in one pass.

2. Claude: “The other shift worth knowing: the current model, Opus 4.8, was tuned in a direction that’s relevant to my biggest weakness for you. Anthropic says early testers report it’s more likely to flag uncertainties about its work and less likely to make unsupported claims, and it added user control over how much effort Claude puts into a task. That doesn’t eliminate confabulation, but it’s moving the right way — and the effort control is useful when you want a fast reformat versus a careful analysis.”

2. Claude: “That’s the real answer to “what should I be thinking about.” The direction of travel is clearly toward agentic, tool-connected work that lives inside the apps you already use and runs longer chains autonomously. What that rewards is unglamorous and durable: clean, well-organized inputs; your house conventions written down once so any agent applies them consistently; and a verification discipline you trust. The work that gets automated is the assembly and formatting you’re already handing me.”

 

Posted Under