Automated negotiation plays a crucial role in the decision support for bilateral energy transactions. In fact, an adequate analysis of past actions of opposing negotiators can improve the decision-making process of market players, allowing them to choose the most appropriate parties to negotiate with in order to increase their out- comes. This paper proposes a new model to estimate the expected prices that can be achieved in bilateral contracts under a specific context, enabling adequate risk management in the negotiation process. The proposed approach is based on an adaptation of the Q-Learning reinforcement learning algorithm to choose the best scenario (set of forecast contract prices) from a set of possible scenarios that are determined using several forecasting and estimation methods. The learning process assesses the probability of occurrence of each scenario, by comparing each expected scenario with the real scenario. The final chosen scenario is the one that presents the higher expected utility value. Besides, the learning method can determine which is the best scenario for each context, since the behaviour of players can change according to the negotiation environment. Consequently, these conditions influence the final contract price of negotiations. This approach allows the supported player to be prepared for the negotiation scenario that is the most probable to represent a reliable approximation of the actual negotiation environment.