Life is a Multi-Armed Bandit
(5 Minute Read) Know What You're Optimizing For, Lower Costs of Exploration, and Design Your Day
👋🏼 I’m Alek, a repeat founder. I’ve built and sold one company so far. I share what I’ve learned from building companies in 5-minute reads.
Life is a Multi-Armed Bandit
I think of my life through the lens of a multi-armed bandit.
A multi-armed bandit (MAB) is a problem where you need to optimize resources across choices. The term originates from slot machines.
Here's an example. Imagine you're presented with 2 slot machines (or "one-armed bandits"). Each machine has a different expected payout. One slot machine will, on average, lose money. Another will, on average, earn money. You don't know which is which. You need to play to learn. The features of a multi-armed bandit are:
Choices (Arms). These represent the different options or actions available. In my example, I presented you with 2 slot machines. But there can be 5, 10, or more! All of them with different expected payouts.
Rewards. Each arm provides a reward drawn from an unknown probability distribution.
Exploration vs. Exploitation. The fundamental trade-off in the MAB problem is deciding when to:
Explore. Try different arms to gather information about their reward distributions.
Exploit. Choose the arm with the best expected reward, based on your current information.
The goal of a MAB problem is to find the strategy that maximizes the total reward paid over time. MAB problems apply to many aspects of life and work:
Do you order your favorite item on the menu (exploit), or do you try something new (explore)?
Do you go with the tried-and-true marketing (exploit) or try new messaging (explore)?
Do you stay in your current job (exploit) or look for new job opportunities (explore)?
Today on FLFM, I’ll explain how self-employment has allowed me to optimize my life in ways I never had before.
First, know what you are optimizing for
Solutions to MAB problems assume you know what you are optimizing. Traditional MAB problems assume you are optimizing for one thing. This is an oversimplification. At any point, we are optimizing for many different things. In our lives and careers we optimize for:
earning more money
learning
working on things that interest us
enjoying hobbies and activities outside of work
building & enjoying relationships
Each choice we make yields different balances of each of the above:
We can quit our jobs…
creating the time to enjoy hobbies outside of work and build relationships
sacrificing money, learning, and working on interesting things
We can work ourselves to the bone on projects we love…
making money and learning
sacrificing time for hobbies and loved ones
In the traditional MAB problem you’re “optimizing the amount of money you earn per pull of a slot machine arm.” Competing priorities add tradeoffs and complexity.
You need to know what you are optimizing for to even begin solving this problem. Once you know what you are solving for, you need to “weight” each priority to build a mental model to decide which options are “better.”
Minimize the cost of exploration
By default, MAB problems presumes that you can switch between arms. This isn’t always the case. In life, there are switching costs. We can’t explore 1,000 jobs and pick the best one.
In our careers, full-time employment makes exploring hard. We commit ourselves to one job for a long time. Over the course of your career, you can’t do very much “exploring.” So the optimal strategy, is to “exploit” the roles you know.
Lower switching costs in my MAB problem has been the single-best-thing about self-employment. I can explore many different paths at the same time. Today, I spend parts of my day on:
BI, dashboards, and analytics
advanced AI and data science
building my own software
I can "explore" all these "arms" without multi-year commitments. I quickly learn what I do (and don’t!) enjoy about each.
Prepare for your priority function to change
I also don’t need to solve for everything with any one job. I can spend different parts of day, optimized for each thing I care about. I spend part of my day on work that:
doesn’t teach me much, but pays well.
isn’t interesting, but teaches me new skills.
is interesting, but doesn’t pay well.
Collectively, I can optimize my day by tuning these categories to solve for whatever I want at the time. Some months I want to dial up on the money, so I do more work of the Type #1 work. Some months I want to dial up the learning, so I do more of the Type #2 work. Some months I get bored, so I do more of the Type #3 work.
Life is a Multi-Armed Bandit
All solutions to the MAB problem involve balancing exploration with exploitation. Successfully finding that balance in your life requires you:
know what priorities you are optimizing for with a decision
minimize the cost of exploration
Every person’s solution depends on what is important to them.
People may stay in the same job for years because it gives them everything they need. Or, they may be optimizing for things outside of work.
People may constantly try new hobbies because they love learning new things.
I’ve loved the balance that my self-employed life has provided me. Self-employment has forced me to understand what I’m optimizing for, and design my day to optimize for those things.
Subscribe below to stay tuned! By sharing my experiences, I hope to provide advice to entrepreneurs facing similar challenges. Feel free to email me with any questions.