BACKGROUND Around the world, and for many years, students have struggled to learn to program computers. The reasons for this are poorly understood by their lecturers. PURPOSE When the intuitions of many skilled lecturers have failed to solve a pedagogical problem, then a systematic research programme is needed. We have implemented a research programme based on three elements: (1) a theory that provides an organising conceptual framework, (2) representative data on how the class performs on formative assessment tasks, and (3) microgenetic data from one-on-one think aloud sessions, to establish why students struggle with some of the formative tasks. DESIGN / METHOD We have adopted neo-Piagetian theory as our organising framework. We collect data by two methods. The first method is a series of small tests that we have students complete during lectures, at roughly two week intervals. These tests did not count toward the studentsâ final grade, which affords us the opportunity to ask unusual questions that probe at the boundaries of student understanding. Think aloud sessions are the second data collection method, in which a small number of selected, volunteer students attempt problems similar to the problems in the in-class tests. RESULTS The results in this paper serve to illustrate our research programme rather than answer a single, tight research question. These illustrative results focus upon one very simple type of programming question that was put to students, very early in their first programming subject. That simple question required students to write code to swap the values in two variables (e.g., temp = a; a = b; b = temp). The common intuition among programming lecturers is that students should be able to easily solve such a problem by, say, week 4 of semester. On the contrary, we found that 40% of students in a class at one of the participating institutions answered this question incorrectly in week 4 of semester. CONCLUSIONS What is emerging from this research programme is evidence for three different ways in which students reason about programming, which correspond to the first three neo-Piagetian stages (Lister, 2011). In the lowest and least sophisticated stage, known as the sensorimotor stage, novices exhibit two types of problems: (1) misconceptions that are already well known in the literature on novice programmers (e.g., Du Boulay, 1989),and/or (2) an approach to manually executing (âtracingâ ) code that is poorly organized and thus error prone. Novices at the next stage, known as the preoperational stage, can correctly trace code, but they cannot reliably reason about a program in terms of abstractions of the code (e.g., diagrams). It is only at the third stage, the concrete operational stage, where students begin to exhibit some capacity to reason about code abstractions. However, traditional approaches to teaching programming implicitly assume that students begin at the concrete operational stage.