In this paper, we continue to talk about an algorithm of reinforcement learning, called Thompson sampling algorithm. The mathematical basis of this algorithm is Bayesian Inference. Let's first talk about the basic principles of this algorithm.
Node.js is similar in design to, and influenced by, systems like Ruby's Event Machine and Python's Twisted. Node.js takes the event model a bit further. Node.js takes the event model a bit further. It presents an event loop as a runtime construct instead of as a library. In cases of more complex software, ports were achieved by first designing and implementing a one off virtual machine, then simply porting the virtual machine to the different platforms. See ScummVM, Z-machine, Another World, etc. Surely that threw away some notion of efficiency.
Node.js Slot Machine Game
Basic Principles of Thompson Sampling Algorithms
We still have the problem of using the old dobby slot machine. As shown in the figure, the horizontal axis represents reward, and the more to the right, the more reward. The three vertical lines represent the average reward for three different slot machines.
Before the algorithm starts, we don't know anything, so we need to get some basic data. There are four blue data in the picture, which indicate the reward of pressing the blue slot machine. According to these rewards, a mathematical distribution can be obtained. The same green tiger machine can get a distribution, the same as yellow.
These three distributions predict the probability distributions of the mathematical expectations that the three machines give us rewards. Next, based on these three random distributions, we get several random samples and select the machine to get the maximum sample value to press down. However, because it is random, although the actual expectation of yellow is the highest, we may still choose a green data result larger than the Yellow data result.
Node.js Slot Machine Software
When we press down, we will get a new observed reward value. When we get a new reward value, we will adjust the distribution of green machines.
Obviously, the green distribution has become higher and narrower. The next steps are the same as here, and the machine with the highest reward value is still selected to press down and continue to adjust the distribution through the results obtained.
When the game goes through many steps, these distributions become very narrow, especially when the basic yellow color matches the actual expectations.
At this time, because we have been choosing the machine with the highest reward value, the probability of pressing yellow will be higher, resulting in yellow will become narrower and narrower, while blue rarely plays, so it is relatively wider.
How To Use Node Js
Thompson sampling algorithm vs. confidence interval upper bound algorithm
In this paper, we continue to talk about an algorithm of reinforcement learning, called Thompson sampling algorithm. The mathematical basis of this algorithm is Bayesian Inference. Let's first talk about the basic principles of this algorithm.
Node.js is similar in design to, and influenced by, systems like Ruby's Event Machine and Python's Twisted. Node.js takes the event model a bit further. Node.js takes the event model a bit further. It presents an event loop as a runtime construct instead of as a library. In cases of more complex software, ports were achieved by first designing and implementing a one off virtual machine, then simply porting the virtual machine to the different platforms. See ScummVM, Z-machine, Another World, etc. Surely that threw away some notion of efficiency.
Node.js Slot Machine Game
Basic Principles of Thompson Sampling Algorithms
We still have the problem of using the old dobby slot machine. As shown in the figure, the horizontal axis represents reward, and the more to the right, the more reward. The three vertical lines represent the average reward for three different slot machines.
Before the algorithm starts, we don't know anything, so we need to get some basic data. There are four blue data in the picture, which indicate the reward of pressing the blue slot machine. According to these rewards, a mathematical distribution can be obtained. The same green tiger machine can get a distribution, the same as yellow.
These three distributions predict the probability distributions of the mathematical expectations that the three machines give us rewards. Next, based on these three random distributions, we get several random samples and select the machine to get the maximum sample value to press down. However, because it is random, although the actual expectation of yellow is the highest, we may still choose a green data result larger than the Yellow data result.
Node.js Slot Machine Software
When we press down, we will get a new observed reward value. When we get a new reward value, we will adjust the distribution of green machines.
Obviously, the green distribution has become higher and narrower. The next steps are the same as here, and the machine with the highest reward value is still selected to press down and continue to adjust the distribution through the results obtained.
When the game goes through many steps, these distributions become very narrow, especially when the basic yellow color matches the actual expectations.
At this time, because we have been choosing the machine with the highest reward value, the probability of pressing yellow will be higher, resulting in yellow will become narrower and narrower, while blue rarely plays, so it is relatively wider.
How To Use Node Js
Thompson sampling algorithm vs. confidence interval upper bound algorithm
We use Thompson sampling algorithm and UCB algorithm to deal with the problem of multi-arm slot machines. Now let's compare the two algorithms. Take a look at the basic principles of these two algorithms.
Firstly, this UCB algorithm is a deterministic algorithm. When we get the same reward, we make the decision when we decide, so the total revenue and total revenue of each round are deterministic. The decision made in each round is only related to the upper bound of the confidence interval, which is only related to all the observations of the machine. So when the observations of all machines are the same, we will always make the same decision. For Thompson algorithm, it is a stochastic algorithm. One or several steps of Thompson algorithm are controlled by a stochastic function, which is related to luck. It relies on random events, such as when we select points above, although the actual expectation of yellow is greater than that of green, we may still choose data points with green being greater than that of yellow. So it's a random algorithm.
Node Js Update
So for UCB, it also has a feature that it needs to update the upper bound in real time, which can be seen in the previous article when describing the principle of UCB algorithm. For Thompson sampling algorithm, it allows delayed updates or even batch updates. For example, we put a batch of advertisements on the Internet, which allows it to get delayed results. Finally, it is found that Thompson sampling algorithm has better practical application effect than confidence interval algorithm in recent years'practical application and research.
code implementation
Texas holdem poker deluxe pro apk. First, look at the calculation logic of Thompson sampling algorithm:
Node.js Slot Machine Machines
The code is directly posted here:
The final total reward is much higher than the previous confidence interval algorithm, and the best advertisement is ad5, so Thompson sampling algorithm is better than the confidence interval algorithm.
Above is the basic knowledge of Thompson sampling algorithm in reinforcement learning.