#OFN Key Concepts of Reward Maximization

Agent and Environment Interaction:

The AI system (agent) interacts with its environment, receiving feedback in the form of rewards for its actions.

Example in #OpenfabricA I: A trading bot in a financial market takes actions (buy, sell, hold) and receives profits or losses as rewards.

Reward Function:

A function that maps each action taken in a given state to a numerical reward.

Example: In a recommendation engine, a reward could be assigned based on whether a user clicks on a suggested item or makes a purchase.

Cumulative Reward:

The goal is not to maximize immediate rewards but the total expected reward over time.

Formula:

=

+

1

+

+

2

+

2

+

3

+

G

t

=R

t+1

+γR

t+2

2

R

t+3

+…

G_t is the total return starting from time step t,

γ (gamma) is the discount factor controlling the importance of future rewards.

Policy (π):

A strategy that defines the actions to take in each state to maximize rewards.

Example: A chatbot's policy determines how to respond to user inputs to keep users engaged and satisfied.