Essays about game development, thinking and books

Notes on ai in 2024: industry transparency

Nearly a year and a half ago, I published a major forecast on artificial intelligence [ru]. Read it if you haven't already — it's still looking good.

Recently, I've decided to expand on the prognosis, but a sizeable comprehensive post isn't coming together, so there will be a series of smaller notes.

I'll start with industry transparency: the current AI movement has several impressive aspects that I'd like to discuss.

Read more

In-depth analysis of goodhart's law

I found an excellent in-depth analysis of Goodhart's Law (when a measure becomes a target, it ceases to be a good measure).

Cedric Chin breaks down the law into its components and provides examples from Amazon's practice on how they work with them.

In brief: everything is not so straightforward and not so bad.

When people are under pressure from a target metric, they have three behavioral strategies:

  1. They can work to improve the system.
  2. They can sabotage/distort the system's work.
  3. They can sabotage/distort the data.

For example, if you have a factory producing things and a production plan for them.

Then possible strategies for your employees:

  1. Improve technologies and processes to meet the plan.
  2. Hide excess production in one month to attribute it to another (sabotage of production).
  3. Take ready-made products from the warehouse to put them back on the conveyor (data falsification).

Therefore, the manager's goals:

  1. Create conditions that make improving the system possible, comfortable, and beneficial. For example, set realistic, objective deadlines and plans.
  2. Make it difficult to sabotage/distort the system.
  3. Make it difficult to falsify data.

The original post contains interesting examples of Amazon's adaptation to these principles.

For example, they switched from optimizing output metrics to optimizing input metrics through the evolutionary refinement of heuristics about them. Because it is more difficult to falsify input metrics, and their impact on the output can be empirically evaluated.

Exaggerating, instead of optimizing the "number of sales" metric, it may be better to optimize the "number of cold calls", "number of ads", etc. by iteratively refining the formulations based on business data.

As an example, here is the evolution of the metric for one of Amazon's teams:

  • number of detail pages, which we refined to
  • number of detail page views (you don’t get credit for a new detail page if customers don’t view it), which then became
  • the percentage of detail page views where the products were in stock (you don’t get credit if you add items but can’t keep them in stock), which was ultimately finalized as
  • the percentage of detail page views where the products were in stock and immediately ready for two-day shipping, which ended up being called Fast Track In Stock.

For details, I recommend visiting the original post.

Notes on backend metrics in 2024

How metrics are collected in Feeds Fun. Loki is added to demonstrate the possible next step in infrastructure development.

How metrics are collected in Feeds Fun. Loki is added to demonstrate the possible next step in infrastructure development.

Once in 2-3 years, I start a new project and have to "relearn" how this time to collect and visualize metrics. It is not a single technological thing that changes over time, but it is guaranteed to change.

I sent metrics via UDP [ru] to Graphite (in 2024, a post from 2015 looks funny), used SaaS solutions like Datadog and New Relic, aggregated metrics in the application for Prometheus, and wrote metrics as logs for AWS CloudWatch.

And there were always nuances:

  • The features of the project technologies and architecture impose sudden restrictions.
  • Technical requirements for metrics completeness, correctness, and accuracy collide with business constraints on the cost of maintaining infrastructure.
  • Specialized databases for storing time series emerge, with which backend developers rarely deal directly.
  • Not to mention the ideology and personal preferences of colleagues.

Therefore, there is no single ideal way to collect metrics. Moreover, the variety of approaches, together with the rapid evolution of the entire field, has produced a vast number of open-source bricks that can be used to build any Frankenstein.

So, when the time came to implement metrics in Feeds Fun, I spent a few days updating my knowledge and organizing my thoughts.

In this essay, I will share some of my thoughts on the metrics as a whole and the solution I have chosen for myself. Not in the form of a tutorial but in the form of theses on topics I am passionate about.

Read more

Top llm frameworks may not be as reliable as you think

Nearly a month ago, I decided to add Gemini support to Feeds Fun and did some research on top LLM frameworks — I didn't want to write my own bicycle.

As a result, I found an embarrassing bug (in my opinion, of course) in the integration with Gemini in LLamaIndex. Judging by the code, it is also present in Haystack and in the plugin for LangChain. And the root of the problem is in the Google SDK for Python.

When initializing a new client for Gemini, the framework code overwrites/replaces API keys in all clients created before. Because the API key, by default, is stored in a singleton.

It is death-like, if you have a multi-tenant application, and unnoticeable in all other cases. Multi-tenant means that your application works with multiple users.

For example, in my case, in Feeds Fun, a user can enter their API key to improve the quality of the service. Imagine what a funny situation could happen: a user entered an API key to process their news but spent tokens (paid for) for all service users.

I reported this bug only in LLamaIndex as a security issue, and there has been no reaction for 3 weeks. I'm too lazy to reproduce and report for Haystack and LangChain. So this is your chance to report a bug to a top repository. All the info will be below, reproducing is not difficult.

This error is notable for many reasons:

  1. The assessment of the criticality of the error depends a lot on taste, experience, and context. For me, in the projects I worked on, this is a critical security issue. However, it seems that this is not critical at all for most current projects that use LLMs. Which leads to some thoughts about mainstream near-LLM development.
  2. This is a good indicator of a low level of code quality control: code reviews, tests, all processes. After all, this is an integration with one of the major API providers. The problem could have been found in many different ways, but none worked.
  3. This is a good illustration of the vicious approach to development: "copy-paste from a tutorial and push to prod". To make such a mistake, you had to ignore both the basic architecture of your project and the logic of calling the code you are copying.

Ultimately, I gave up on these frameworks and implemented my own client over HTTP API.

My conclusion from this mess is: you can't trust the code under the hood of modern LLM frameworks. You need to double-check and proofread it. Just because they state that they are "production-ready" doesn't mean they are really production-ready.

Let me tell you more about the bug.

Read more

Unexpectedly participated in a class action lawsuit in the usa

Recently, I unexpectedly encountered a justice system in the USA.

  • In 2017-2018, when there was a crypto boom, I invested a little in a mining startup: I purchased their tokens and one hardware unit.
  • The startup went up and began to build a mega farm, but it didn't work out — the fall of Bitcoin coincided with their spending peak, the money ran out, and the company went bankrupt. It's funny that a month or two after filing for bankruptcy, bitcoin played back everything. Sometimes you're just unlucky :-)
  • I had already written off the lost money, of course. I acted on the rule "invest only 10% of the income you don't mind losing."
  • Since everything legally happened in the USA, people gathered there and filed a class action lawsuit.
  • I received a letter stating that I would be automatically among the plaintiffs if I did not refuse. I did not refuse; when else would I get an opportunity to participate in a class action?
  • Everything calmed down until 2024.
  • In the spring, another letter came: "Confirm the ownership of the tokens and indicate their quantity. We won and will share the remaining among all token holders proportionally, minus a healthy commission to the lawyers."
  • But how do I confirm? More than five years have passed. The Belarusian bank account is closed, the company's admin panel is unavailable, and there was no direct transaction in the blockchain—I paid in Bitcoin directly from some exchange (although it is not recommended to do so).
  • I found an email from the company confirming I bought tokens (without the amount) and printed it as a PDF. I attached it to the application with screenshots of the transactions from the exchange for the related period. I gave the address of my current wallet, where these tokens lie dead weight. I sent everything.
  • Today, I received $700 in my bank account. Of course, this is not all the lost money, nearly 25%, maybe slightly more.

What conclusions can be drawn from this:

  • Sometimes, you just don't get lucky in your business.
  • Keep all emails. You never know what and when will come in handy.
  • Class action lawsuits work and do it in an interesting way.
  • Justice in the USA works slowly but, apparently, inevitably and unexpectedly (for me) loyally to minor participants in the conflict. At least sometimes.