Notes:
reinforcement learning: there is no signal how large the error is