Essays about game development, thinking and books

Prompt engineering: building prompts from business cases

Ponies are building prompts (c) ChatGPT

Ponies are building prompts (c) ChatGPT

As you know, one of the features of my news reader is automatic tag generation using LLMs. That's why I periodically do prompt engineering — I want tags to be better and monthly bills to be lower.

So, I fine-tuned the prompts to the point where everything seems to work, but there's still this nagging feeling that something's off: correct tags are determined well, but in addition to them, many useless ones are created, and sometimes even completely wrong ones.

There are a few options in such cases:

  1. Collect training data and fine-tune the model to generate only correct tags.
  2. Build a chain of actors, where one will create tags, and the other will filter out the unnecessary ones.
  3. Try to radically rework the prompt.

Options 1 and 2 were out of the question due to lack of time and money. Also, my current strategy is to rely exclusively on ready-made AI solutions since keeping up with the industry alone is impossible. So, I had no choice but to go with the third option.

Progress was slow, but after a recent post about generative knowledge bases, something clicked in my head, the problem turned inside out, and over the course of the morning, I drafted a new prompt that’s been performing significantly better so far.

So, let's look at the problem with the old prompt and how the new one fixed it.

Read more

AI notes 2024: Generative knowledge base

I continue my notes on AI at the end of 2024.

  1. Industry transparency
  2. Generative knowledge base

Today, I want to discuss the disruptive technology that underlies modern AI achievements. Or the concept, or the meta-technology — whichever is more convenient for you.

You’ve probably never come across the logic described below on the internet (except for the introduction about disruptive technologies) — engineers and mathematicians might get a bit annoyed by the oversimplifications and cutting corners. But this is the lens through which I view the industry, assess what’s possible and less likely, and so on. My blog, my rules, my dictionary :-D

So, keep in mind, this is my personal view, not a generally accepted one.

Read more

AI notes 2024: Industry transparency

Nearly a year and a half ago, I published a major forecast on artificial intelligence [ru]. Read it if you haven't already — it's still looking good.

Recently, I've decided to expand on the prognosis, but a sizeable comprehensive post isn't coming together, so there will be a series of smaller notes.

  1. Industry transparency
  2. Generative knowledge base

I'll start with industry transparency: the current AI movement has several impressive aspects that I'd like to discuss.

Read more

In-depth analysis of Goodhart's Law

I found an excellent in-depth analysis of Goodhart's Law (when a measure becomes a target, it ceases to be a good measure).

Cedric Chin breaks down the law into its components and provides examples from Amazon's practice on how they work with them.

In brief: everything is not so straightforward and not so bad.

When people are under pressure from a target metric, they have three behavioral strategies:

  1. They can work to improve the system.
  2. They can sabotage/distort the system's work.
  3. They can sabotage/distort the data.

For example, if you have a factory producing things and a production plan for them.

Then possible strategies for your employees:

  1. Improve technologies and processes to meet the plan.
  2. Hide excess production in one month to attribute it to another (sabotage of production).
  3. Take ready-made products from the warehouse to put them back on the conveyor (data falsification).

Therefore, the manager's goals:

  1. Create conditions that make improving the system possible, comfortable, and beneficial. For example, set realistic, objective deadlines and plans.
  2. Make it difficult to sabotage/distort the system.
  3. Make it difficult to falsify data.

The original post contains interesting examples of Amazon's adaptation to these principles.

For example, they switched from optimizing output metrics to optimizing input metrics through the evolutionary refinement of heuristics about them. Because it is more difficult to falsify input metrics, and their impact on the output can be empirically evaluated.

Oversimplifying, instead of optimizing the "number of sales" metric, it may be better to optimize the "number of cold calls", "number of ads", etc. by iteratively refining the formulations based on business data.

As an example, here is the evolution of the metric for one of Amazon's teams:

  • number of detail pages, which we refined to
  • number of detail page views (you don’t get credit for a new detail page if customers don’t view it), which then became
  • the percentage of detail page views where the products were in stock (you don’t get credit if you add items but can’t keep them in stock), which was ultimately finalized as
  • the percentage of detail page views where the products were in stock and immediately ready for two-day shipping, which ended up being called Fast Track In Stock.

For details, I recommend visiting the original post.

Notes on backend metrics in 2024

How metrics are collected in Feeds Fun. Loki is added to demonstrate the possible next step in infrastructure development.

How metrics are collected in Feeds Fun. Loki is added to demonstrate the possible next step in infrastructure development.

Once in 2-3 years, I start a new project and have to "relearn" how this time to collect and visualize metrics. It is not a single technological thing that changes over time, but it is guaranteed to change.

I sent metrics via UDP [ru] to Graphite (in 2024, a post from 2015 looks funny), used SaaS solutions like Datadog and New Relic, aggregated metrics in the application for Prometheus, and wrote metrics as logs for AWS CloudWatch.

And there were always nuances:

  • The features of the project technologies and architecture impose sudden restrictions.
  • Technical requirements for metrics completeness, correctness, and accuracy collide with business constraints on the cost of maintaining infrastructure.
  • Specialized databases for storing time series emerge, with which backend developers rarely deal directly.
  • Not to mention the ideology and personal preferences of colleagues.

Therefore, there is no single ideal way to collect metrics. Moreover, the variety of approaches, together with the rapid evolution of the entire field, has produced a vast number of open-source bricks that can be used to build any Frankenstein.

So, when the time came to implement metrics in Feeds Fun, I spent a few days updating my knowledge and organizing my thoughts.

In this essay, I will share some of my thoughts on the metrics as a whole and the solution I have chosen for myself. Not in the form of a tutorial but in the form of theses on topics I am passionate about.

Read more