This paper proposes a new framework to control the traffic signal lights by<br />applying the automated goal-directed learning and decision making scheme, namely<br />the reinforcement learning (RL) method, to seek the best possible traffic signal ac-<br />tions upon changes of network state modelled by the signalised cell transmission model<br />(CTM). This paper employs the Q-learning which is one of the RL tools in order to<br />find the traffic signal solution because of its adaptability in finding the real time solu-<br />tion upon the change of states. The goal is for RL to minimise the total network delay.<br />Surprisingly, by using the total network delay as a reward function, the results were<br />not necessarily as good as initially expected. Rather, both simulation and mathemat-<br />ical derivation results confirm that using the newly proposed red light delay as the RL<br />reward function gives better performance than using the total network delay as the<br />reward function. The investigated scenarios include the situations where the summa-<br />tion of overall traffic demands exceeds the maximum flow capacity. Reported results<br />show that our proposed framework using RL and CTM in the macroscopic level can<br />computationally efficiently find the proper control solution close to the brute-forcely<br />searched best periodic signal solution (BPSS). For the practical case study conducted<br />by AIMSUN microscopic traffic simulator, the proposed CTM-based RL reveals that<br />the reduction of the average delay can be significantly decreased by 40% with bus<br />lane and 38% without bus lane in comparison with the case of currently used traffic<br />signal strategy. Therefore, the CTM-based RL algorithm could be a useful tool to<br />adjust the proper traffic signal light in practice.