Wireless Sensor Networks (WSNs) provide noteworthy advantages over conventional methods for various real-time applications, i.e., healthcare, temperature sensing, smart homes, homeland security, and environmental monitoring. However, limited resources, short life-time network constraints, and security vulnerabilities are the challenging issues in the era of WSNs. Besides, WSNs performance is susceptible to network anomalies, particularly to misdirection attacks. The above-mentioned issues pose our attentions to produce a security-aware application. In this work, therefore, we present a Reinforcement Learning (RL) algorithm for Misdirection Attack Detection and Prevention (RL-MADP) in WSNs. In our proposed approach, other than the flat architecture configuration for WSN, Markov Decision Process (MDP) from RL is considered. Where, each sensor node is fully aware of its environment. It is an online method and incurs minimal computation cost, and performs load-balancing with higher residual energy to prolong the network lifetime.