Chip attach is the bottleneck operation in semiconductor assembly. Chip attach scheduling is in nature unrelated parallel machine scheduling considering practical issues, for example, machine-job qualification, sequence-dependant setup times, initial machine status, and engineering time. The major scheduling objective is to minimize the total weighted unsatisfied Target Production Volume in the schedule horizon. To apply Q-learning algorithm, the scheduling problem is converted into reinforcement learning problem by constructing elaborate system state representation, actions, and reward function. We select five heuristics as actions and prove the equivalence of reward function and the scheduling objective function. We also conduct experiments with industrial datasets to compare the Q-learning algorithm, five action heuristics, and Largest Weight First (LWF) heuristics used in industry. Experiment results show that Q-learning is remarkably superior to the six heuristics. Compared with LWF, Q-learning reduces three performance measures, objective function value, unsatisfied Target Production Volume index, and unsatisfied job type index, by considerable amounts of 80.92%, 52.20%, and 31.81%, respectively.