Neuro-dynamic Programming: Building human curiosity into artificial intelligence

The media and watercooler chatter alike increasingly focus on how advances in machine learning and artificial intelligence (AI) are boosting the ability of predictive analytics to benefit businesses’ bottom lines. Some of that talk ponders the potential for smart machines to replace humans in higher-complexity jobs. No doubt, smart machines are getting smarter. But even the smartest machines still lack fundamental human characteristics that are absolutely critical to enabling people to solve problems. One of these key capabilities is curiosity – surely a computer can’t replicate that, can it?

Well, welcome to the evolving world of neuro-dynamic programming. It’s an analytic methodology for learning and anticipating how current and future actions are likely to contribute to a long-term cumulative reward. This technique is related to advanced AI reinforcement learning methods, which take inspiration from behaviorist psychology to attribute future reward/penalty back to earlier steps in a decision sequence, whereas traditional supervised learning attributes reward only to the current decision. These advanced methods focus on experimentation and prediction. They mimic the way the brain learns complex task sequences through pleasurable or painful feedback signals that may occur later in time – essentially, how humans seek and achieve long-term positive results.

Clearly, analytics that can “think” well ahead and focus on the most favorable outcomes are most welcome, since many operational decisions about customers have long-term consequences. High customer lifetime value and healthy, sustainable business cash flow are both produced by a series of interactions: the business takes an action, the customer reacts, the business responds to the new state of the relationship with another action, the customer reacts … and so on. In this way, neuro-dynamic programming enables smart machines to think ahead – potentially making moves early in the decision chain that do not appear optimal in the short run but in the view of the long-term future outcome represent better decisions.

Another way to think about this concept is to consider a group of dumb software agents (like individual ants). The agents interact with their environment and are rewarded or penalized around a small set of success criteria. Gradually “genes” of successful behavior emerge as the agents begin to map out the risk of various interrelated activities. Those agents with few successful genes receive a low “fitness” score and die out, whereas those with many successful genes score high and are allowed to reproduce, mutate or combine with other high-scoring agents. In this way, the overall performance of the group increases.

Because the environment is changing, these agents not only act in the optimal way based on their current best “map of the world,” they also experiment. Using probabilities, they make slight variations and mutate around the optimal strategy and associated genes, and as they receive rewards and penalties, learn from these experiments and adjust to a changing fitness landscape continually.

As you can see in Figure 1, at any point in the sequence, the current state of the customer relationship is the result not only of the just-taken action, but also of the string of previous actions. Just as in a chess game, where a checkmate could be rooted 10 moves back – or even in the first move – the loss of a valuable customer may have started with actions taken months ago. To be successful, a business needs to understand this dynamic.

Figure 2 depicts how these analytics learn about long-term effects by assigning credits for successful outcomes and penalties for unsuccessful ones. Although the action immediately before the outcome may receive a larger share of the credits or penalties, reinforcement learning distributes some amount of rewards/penalties across the entire sequence of actions.

 

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

Using ‘Faked’ Data is Key to Allaying Big Data Privacy Concerns

16 May, 2017

MIT is out of the blocks first once again with a technological development designed to fix some of the privacy …

Read more

Evolution to the Data Lakehouse

27 Jul, 2022

With the proliferation of applications came the problem of data integrity. The problem with the advent of large numbers of …

Read more

Artificial Intelligence Is Still A Science Project In Most Companies

31 Jan, 2020

If you feel your organization is a laggard with artificial intelligence, don’t feel bad — it turns out everyone else …

Read more

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.