Mobile Learning for Science and Mathematics School Education: A Systematic Review of Empirical Evidence

The ubiquity, flexibility, ease of access and diverse capabilities of mobile technologies make them valuable and a necessity in current times. However, they are under-utilized assets in mathematics and science school education. This article analyses the high quality empirical evidence on mobile learning in secondary school science and mathematics education. Our study employed a Systematic Literature Review (SLR) using well-accepted and robust guidelines. The SLR resulted in the detailed analysis of 49 studies (60 papers) published during 2003 – 2016. Content and thematic analyses were used to ascertain pedagogical approaches, methodological designs, foci, and intended and achieved outcomes of the studies. The apps and technologies used in these studies were further classified for domain, type and context of use. The review has highlighted gaps in existing literature on the topic and has provided insights that have implications for future research.


Introduction
The proliferation of mobile technologies is a commonly observed phenomenon around the globe as the number of mobile subscriptions has shown an exponential growth (Tsinakos, 2013). The availability of smart phones at affordable prices has led to an increase in the use of applications (apps) for various aspects of life such as communication, travelling, entertainment, productivity and learning. In the last decade, a significant number of initiatives have been launched that aim to utilise mobile technologies and apps for educational purposes (Kearney, Burden, & Rai, 2015).
School education is being exhorted to build a creative, well-informed, digitally capable society with flexible knowledge and skills (Ainley, 2010;Sharples et al., 2016). However, there is a disjunction between such exhortations and actual practice in schools. It is clear that while trials in schools demonstrate benefits in utilising learning technologies such as mobile devices, their widespread effective application in schools has not been realized (Milrad et al., 2013;Rushby, 2012;Selwyn, 2010). Numerous barriers and challenges have been identified, for example, lack of resources due to financial limitations, lack of effective educational policies for mobile learning (m-Learning), lack of human resources and skilled personnel for effective implementation of mobile pedagogies, lack of hardware resources such as infrastructure and bandwidth, and reservations of parents due to the perceptions of health and psychological issues associated with the prolonged use of mobile devices by students (Tsinakos, 2013;Yu, Lee, & Ewing, 2014). Other challenges include changing teachers' pedagogical beliefs (Ertmer & Ottenbreit-Leftwich, 2010); lack of trained educators (Cheon, Lee, Crooks, & Song, 2012;Crompton, Olszewski, & Bielefeldt, 2016;Milrad et al., 2013) and a lack of pre-service and in-service teacher education in mobile pedagogies (Goktas, Yildirim, & Yildirim, 2009). Further, adoption of mobile technologies in school education is occurring without an empirical understanding of the complex, dynamic relationship between these technologies and the epistemological and pedagogical systems that underpin teaching and learning. It is important to have a thorough understanding of the effectiveness of the mobile technology use in education before mandating m-Learning in educational policies and practices. The research community bears the responsibility for conducting high quality research to provide evidence for the effectiveness of m-Learning pedagogies.
Concurrently with the imperative to better understand the use of mobile devices for learning, there is strong political will in many countries to improve teaching and learning in mathematics and science education to underpin innovation driving economic growth and to build the capability of tomorrow's workforce for future job markets (e.g. Office of the Chief Scientist, 2014 1 ). One problem experienced in the education research community is that numerous small-scale studies are conducted but the findings of these studies are not aggregated and synthesized to guide further work. It is important to learn from previous studies that have been conducted in this area, as existing empirical evidence can help educators and policy-makers in making more informed decisions. This article responds to this need by providing an analysis of the empirical research that has occurred on m-learning in mathematics and science. This analysis comprised a Systematic Literature Review (SLR) that is a component of a three-year Australian government funded research project aiming to optimise the use of mobile learning in mathematics and science in secondary schools.
In contrast to the traditional literature reviews where researchers often use ad hoc literature selection, Systematic Literature Reviews are a methodical, rigorous and transparent way to search, select, extract and synthesise the information from published empirical evidence on a topic of interest in order to answer research questions. SLRs provide high quality scientific (empirical) evidence on a specific topic. The aim of an SLR is to be as unbiased as possible, auditable, and repeatable for other researchers. SLRs are a well-established tool of the evidence based paradigm that has gained increasing credibility in many research disciplines in recent years such as medicine, engineering, social sciences and education.
Our SLR is based on the following three research questions regarding studies of m-learning in secondary school science and mathematics education: RQ1: What are the reported research methodologies, foci and outcomes in existing literature? RQ2: What mobile technologies and/or apps are investigated in these studies? RQ3: What are the pedagogical approaches being reported? What are the contextual settings under which these technologies are investigated? This article is structured as follows: section 2 provides background and an overview of the existing reviews relevant to the topic. Section 3 details the SLR planning and execution. Section 4 describes the results of analysis. Section 5 discusses the results and section 6 provides conclusions and suggestions for future research.

Background
This paper focuses on the use of mobile devices and apps for learning in mathematics and science. One reason for the focus on these two disciplines is that the current article reports on a component of a funded research project conducted by the authors, which investigates ways in which learning in mathematics and science can be optimised through the use of mobile technologies. Mathematics and science teaching and learning are currently strong government priority areas in Australia and many other developed countries (Office of the Chief Scientist, 2014). The diversity within science and mathematics pedagogical approaches (ranging from extended open inquiries to drill and practice), makes the interaction of mobile learning in mathematics and science practices of interest.
Underpinning current practice in mathematics and science learning are two dominant and interrelated theories: those of social constructivism and socio-cultural theory (McRobbie & Tobin, 1997;Salomon & Perkins, 1998).
However, in practice, many mathematics classrooms follow more transmissive ways of teaching, which incorporate drill and rote learning rather than investigative approaches (Gill & Boote, 2012). In contrast, science practice can often be observed to use inquiry methods, learners' questions and generative approaches which are underpinned by the theories above (Burden & Kearney, 2016;Krajcik, 2002). Given that mobile devices are well suited to support learning underpinned by socio-cultural perspectives, such as authenticity, collaboration and personalization (Kearney, Schuck, Burden, & Aubusson, 2012), it is of interest to investigate what the literature tells us about their use for mathematics and science learning and to examine the contrast in approaches used.
With the increased interest in m-Learning over the past decade, several authors have reviewed its history (e.g. (Crompton, 2013;Parsons, 2014)), with dedicated literature reviews seeking to capture specific facets of this field (see (Frohberg, Göth, & Schwabe, 2009;Naismith, Lonsdale, Vavoula, & Sharples, 2004;Parsons, 2014;Wingkvist & Ericsson, 2011)). A number of literature reviews on m-learning apps exist. Table 1 shows literature reviews that have provided integrated results from analysis of research papers published on m-Learning, and Table 2 lists the studies that have evaluated use of educational apps for learning.  Our systematic review differs from existing literature reviews on the relevant topic in the following aspects: 1) Scope of educational level: In our review, we have focused on the studies where apps are examined for use only at the secondary school level (students of ages 12-18). Our project focuses on this level of schooling due to the evidence of the lack of engagement of students in mathematics and science in these years (Ainley, Kos, & Nicholas, 2008;Palmer, Burke, & Aubusson, 2017). The importance of understanding student engagement in mathematics and science in secondary schools provided the motivation for applying for and being awarded a large competitive national research grant. Consequently, investigating how the use of mobile devices can optimise learning in these disciplines in secondary schools is critical. This limits our scope in comparison to existing reviews.
2) Scope of subject domain: Our focus is to find the empirical studies that report on m-Learning in science and/or mathematics education only. The existing reviews either investigate all disciplines (Barbosa, 2013;Hung & Zhang, 2012;G. J. Hwang & Tsai, 2011;Naismith et al., 2004;Pereira & Rodrigues, 2013;Y.-T. Sung et al., 2016;Wingkvist & Ericsson, 2011;Wu et al., 2012), or investigate only mathematics (Crompton & Burke, 2015), or only science (Crompton, Burke, et al., 2016;Zydney & Warner, 2016) learning contexts. 3) Focus of review: We provide extensive cross analysis of various attributes of the results from aggregated reviews of our SLR, e.g. study foci and outcomes, study contexts, and the pedagogical approaches reported in the selected studies. 4) Review timeline: Our review covers publications from 2000 -2016. 5) Type of included studies: We have strictly followed evidence-based guidelines in our review (Budgen & Brereton, 2006;Dwan, Gamble, Williamson, & Kirkham, 2013). The included studies are those that have followed rigorous empirical research designs for investigating mathematics or science learning/teaching with mobile apps/technologies. Evidence-based reviews require the researchers to follow conscientious and judicious methods to collect and collate the best available evidence in order to present more accurate results.
Some of the existing reviews on the topic have provided conflicting observations. Hwang and Wu reviewed studies published during 2008 -2012 that investigated m-learning apps/technologies (G.-J. Hwang & Wu, 2014). They found that 83% of the studies that aimed to measure improvements in student learning reported achieving their stated objectives. Hsu and Ching analysed the experimental studies on mobile computer supported collaborative learning published during 2004 -2011 (Hsu & Ching, 2013). They found that six out of nine studies found improvements in students' learning. In contrast, the review provided by Schmitz et al. on game based m-Learning did not find significant evidence of improvements in students' learning (Schmitz, Klemke, & Specht, 2012). Similarly, Cheung and Hew reported that there was no significant difference in students' learning in studies that compared m-Learning to traditional learning (Cheung & Hew, 2009). Given the varied findings of past reviews it is appropriate to include more recent research in the field to determine the extent to which consensus is evident.

Systematic review method and execution
The primary focus of studies in this review is mathematics or science learning/teaching with mobile apps/technologies. We conducted a systematic review by following the guidelines of the Evidence Based paradigm (Budgen & Brereton, 2006;Dwan et al., 2013). An SLR follows a rigorous and scrupulous procedure to search and select the sample studies for coding and analysis. It is a methodical and meticulous process of collecting and collating the published empirical studies of acceptable quality with systematic criteria for selection to reduce researcher bias and provide transparency to the process. An SLR is well suited for providing a summative overview of existing empirical research undertaken within the field. To ensure the authenticity and reliability of the results, the systematic review process is validated by multiple researchers. In this paper, the following team members participated in the review process in different roles: To make the review transparent and repeatable, it is important to provide sufficient details of the review process.
In the following section we briefly describe our search, selection, thematic coding and analysis process.

Search strategy
Accurately searching all possible relevant primary empirical studies is the most crucial step in a systematic review. We took the following steps in order to thoroughly search for the relevant studies: 1. Extracting major search terms from the Research Question(s). 2. Identifying the relevant terms, synonyms and alternative spellings for the major search terms that are used in published literature. 3. Constructing a search string from major search terms to be used in online digital libraries for abstract based search 4. Selecting a range of online databases, journals and conference proceedings for searching. The search string was customized for different interfaces of digital libraries. 5. Managing the results (citations and abstracts) using Endnote 2 . Based on our main research questions, we had the four major search terms i.e. Mobile Learning, Maths, Science, Secondary Education. From the major search terms, we identified the synonyms and alternative terms (Table 4). Concatenating the terms, we got the following search string that was used to search on abstracts of the relevant papers: (("Mobile learning" OR "mLearn*" OR m-Learn*" OR "Mobile pedagog*" OR "ubiquitous learning" OR "wireless learning" OR "seamless learning" OR "iPad trials" or iPad project" OR mobile technology enhanced learning) AND ("Math"* OR "Science" OR "Sciences" OR "Biology" OR "Chemistry" OR "Physics" OR "Science Education" OR "STEM" or "Geology" OR "Environmental education") AND ("secondary education" OR "high school" OR "middle school" OR "7-12")) The string was modified for different online databases as per requirement while keeping the logical order consistent. We applied the search string on a range of databases to ensure that we did not miss any relavent studies. The following databases were selected for our searches: As mobile learning is an emerging and dynamic discipline, significant research into the field may be found in conference publications that may or may not necessarily be included within the online databases. The following specific conferences and journals were also included in this research:

Study selection
Once all the results were obtained from all selected sources, we applied the selection criteria to filter out the irrelevant studies. First irrelevant papers that were retrieved due to poor performance of search engines were excluded from the results by reading their titles and abstracts. Duplicate citations (papers) were discarded prior to applying the selection filter. The remaining papers were filtered with the criteria described in Table 6.
The first author applied the criteria to all papers for study selection. The second and third authors randomly checked among the results to reduce selection bias. Any issues related to selection of a paper were resolved by first three authors in discussion at that stage. The included papers were given an identification number. If more than one paper was using/describing results from the same empirical study, or if multiple publications from one study existed as conference and extended journal versions, they were treated as one study and hence given one identification number. The study does not provide sufficient details of empirical research design and data analysis The result is a thesis/ editorial/book review The paper is focused on special needs education or special cases The paper is conceptual or discursive in nature e.g. focusing on personal opinions, theory or conceptual work Based on the retrieved results, we performed secondary searches by scanning and reviewing the references and citations at the end of the included papers. We also scanned the references provided in the following existing systematic reviews to find any study that we might have missed in our results (Barbosa, 2013;Crompton & Burke, 2015;Crompton, Burke, et al., 2016;Hung & Zhang, 2012;G. J. Hwang & Tsai, 2011;Naismith et al., 2004;Parsons, 2014;Pereira & Rodrigues, 2013;Y.-T. Sung et al., 2016;Vogel et al., 2006;Wingkvist & Ericsson, 2011;Wu et al., 2012;Zydney & Warner, 2016). Those papers that appeared to be eligible for consideration were treated with the same study selection criteria set for the primary search selection as given in Table 6. The complete search and selection process is summarized in Figure 2. The searches on online databases resulted in a total of 3131 papers. After scanning the titles and abstracts, 559 were initially included for further scrutiny. Based on reading abstracts, 103 papers were able to pass through all inclusion criteria checks for full text review. Multiple publications from the same authors and studies were grouped under one study ID. Reading the full text of 103 papers, 53 studies (64 papers) were selected for final inclusion. Appendix A provides the list of 53 studies (64 papers) that are included in our review. At data extraction stage, another 4 papers (S23, S27, S40, S53) were excluded from the sample after validity review from third, fourth and fifth authors. The rejection at this stage was mainly due to the low quality of publications (e.g. research design not suitable for the study) or due to the fact that the paper did not provide answers to any of our research questions. This left us with 49 studies (60 papers) for synthesis and analysis. The inter-rater reliability for study selection among all authors is thus 92.5%.

Quality assessment
The quality of the included papers is an important aspect of an SLR. We assessed the quality of the study on following three criteria: 3.3.1 Research design description If during data extraction a paper was found to not provide sufficient details of the empirical design and analysis, it was discarded from the included sample. The included studies were scrutinized to ensure they had the description and details of research objectives, design or methodology, participants' demographics, participant selection procedure, intervention (an app, device or technology), analysis and results.

3.3.2
Publication outlet To appraise the quality of the publication outlets we checked for the Impact factor 3 of the journals, and SCImago journal ranking (SJR) 4 . The dominant majority of our included studies are journal articles (see Figure 3). Journal of Computers and Education provided the highest number of publications on the topic compared to the other publication outlets (see Table 7).  3.3.3 Impact of the papers Google scholar citations are an indication of the impact a paper makes in the research community. We checked the citation count for the papers to gauge their influence on the published work. Our included set of studies ranges from acceptable quality to very good quality publications based on research design, and publication outlet. Table 8 presents the publications with high quality rankings based on combination of highest Google scholar citation count and additional data regarding the impact factor and SCImago Journal Ranking (SJR) to provide a general overview of the quality of the results.

Data extraction
The following demographic data were extracted from the included publications to be later used for cross analysis: title, authors, type of outlet (journal or conference), name of the outlet, publication year, full citation, geographical location (where the study was conducted), research methodology, types of participants (teachers and/or students), number of participants, duration of study, and types of apps/technologies (mathematics and/or science).
The following details were used for coding data in order to answer our research questions; The first author performed the first level of data extraction and coding in Nvivo with continuous verification from the second author. At the end of data coding stage, the results were discussed with the third, fourth and fifth authors and the differences of opinions were resolved in discussion.

Data synthesis and analysis
We used NVivo for coding, synthesis and analysis of the studies. We first extracted demographic attributes for all studies which were later used for classification during analysis. All the PDF files of included studies were coded in NVivo for extracting answers to the research questions according to the guidelines provided in Table 9. The study foci and the outcomes were analysed iteratively by first author in discussion with second author and later validated and verified by the other three team members. The following steps were used for synthesizing the data: Step 1 Coding the text that was relevant to the objectives and outcomes of the paper directly from the source as stated by the authors Extracting demographic/classification attributes from all studies such as domain of study, sample type and size, type of app, research methodology and duration of trial, geographical location of study, year of publication etc.
Step 2 Summarizing the coded text for each of the papers Frequency analysis for demographic attributes Step 3 Identifying multiple themes for the foci of the studies using "thematic analysis technique" and outcomes with "content analysis technique" Step 4 Classifying outcomes into four categories i.e. -Achieved (if the stated outcomes of the study were achieved using the given app/technology), -Not Achieved (if the study reported that stated anticipated outcomes were not achieved), -Inconclusive (if the study did not provide any information about anticipated outcomes or it was not clear that the anticipated outcomes had been achieved) and -Mixed (if some outcomes were achieved and others were not) Step 5 Synthesizing similar themes and categories of Foci in results and create a list Cross analyzing with demographic/classification attributes Following is the timeline for our review

Limitation of this SLR
Though we have succeeded to follow a very rigorous search and selection strategy following the guidelines of evidence based paradigm to ensure the completeness of our sample, there would still be some papers that may not have been included in our data collection that we are not aware of due to their unavailability in electronic resources. While analysing the study methodologies, pedagogies, foci and outcomes, the names given to the themes that emerged were derived from the usage by the authors of the papers included in the SLR. By following the SLR guidelines meticulously we tried to ensure that the results of the SLR would be unbiased, although it does not protect against publication bias within the primary studies.

Results
Our systematic review resulted in 49 studies (60 papers). In this section we present the results from data synthesis and analysis.

Frequency analysis of study attributes 4.1.1 Year of publication
The included 49 studies are published during period 2003-2016. From 2010 onwards there is a sudden increase in empirical research on the topic, which may be attributed to the increased usage of smart phones and other mobile devices in every aspect of daily routines (Tsinakos, 2013). Figure 5 shows the studies with respect to their publication timeline.

Research paradigms and methodologies
Our included set of papers contains diverse approaches to research design with both qualitative and quantitative research methodologies. Figure 7 shows the research paradigms and methodologies adopted in the included studies with 63% of studies conducted with qualitative research methods. Figure 7 also shows further classifications of research methodologies under each paradigm as they are explicitly stated by the authors in included studies. A low incidence of mixed methods studies (6%) was noteworthy. In 70% of the included studies, school students were the participants/respondents (Figure 8). Teachers participated in 14% of the studies whereas 16% had a sample consisting of both teachers and students as participants/respondents.  Figure 9a shows the cross analysis of the duration of studies and the sample size of the participants. We have put the two diagrams together to show the same color legend of study duration reflected on sample size of respondents. The duration of the studies is classified into three categories: • Long Term: if the study was conducted over a period more than three months; • Medium Term: if the study was conducted over a period between one to two months; • Short-term: if the study was conducted in less than a month. The sample size of students that were involved in the included studies is classified in six categories based on their numbers (shown in Figure 9b). The cross analysis of study duration and sample size of participants indicates the dominant number of studies has been conducted with less than 50 participants in short-term projects. Only the Long-term studies have engaged more than 200 participants.

Foci of the studies
The main focus of all these studies was to investigate the effectiveness of mobile technologies/apps for student learning. However, there were additional foci mentioned in some of the studies as shown in Table 11. One focus was on how a particular app might be effective for learning (Effectiveness of Using the App). Other foci which considered pedagogical aspects included: Collaborative Learning, Student Engagement and Constructivist Learning. Effectiveness of Using the App and Design of App were the main points of investigation for the majority of the studies followed by assessment of technology implementation (Technology Implementation category).
It is interesting to note that these three most frequent research foci in our included studies are somewhat more related to the technological rather than pedagogical aspects of m-Learning. This is in agreement with the findings of the review of (Wu et al., 2012). They reported that evaluating the effectiveness of mobile learning was the focus of 58% of the included studies (n=164) and 32% focused on investigating the design of mobile learning systems.

Outcomes of the studies
In our analysis, we use the term 'outcomes of the studies' to report whether or not the included studies reported the achievement of the stated objectives. These outcomes were analysed through content analysis and thus were classified into the following: o Achieved (if the stated objectives of the study were reported to have been achieved); o Not Achieved (if the stated objectives of the study were reported to have not been achieved); o Inconclusive (if the paper was not clear about the achievements of the stated objectives); and o Mixed (if some of the stated objectives were achieved while others were not). From Figure 10 we can observe that 65% of the studies have reported Achieved, 28% were Inconclusive and only 7% reported Mixed and/or Not Achieved study outcomes.  Table 12 provides an analysis of outcomes from included studies in our SLR in comparison to other reviews that performed similar analysis in relation to achieved outcomes from the studies. Our review indicates a lower number of achieved outcomes than the others, but it is hard to generalize any trend from this data. There are numerous factors to be considered here that can impact the outcomes of the included empirical studies such as the subject domain, context and settings, adopted pedagogical approaches and the cognitive skills required for using the app. 4.1.6 Domains of the studies There were more empirical studies in our SLR investigating use of science apps than mathematics apps as shown in Figure 11. Liu et al. reported that in their review (of all domains) natural sciences has dominated studies of m-Learning apps (Liu et al., 2014). Previously, Wu et al. made similar observations when they analysed the domains of apps for m-Learning and found that applied sciences dominated the results with the majority of m-Learning studies in health and environmental sciences (Wu et al., 2012).
Contrary to the trend in science, Table 13 shows that the majority of the studies in mathematics did not specify the sub-domain of investigation. In those that did specify the sub-domain, geometry and algebra are the subdomains investigated more times in comparison to other sub-domains. In science, environmental sciences is the sub-domain that was the focus of most of the studies, followed by geography and physics.

Types of apps / technologies
The included studies evaluated the use of two types of apps or technologies for learning; either developed by the authors/researchers or by third party developers. In software development, the evaluation of software quality is mainly done to assess whether the software is achieving the aims for which it was designed and is typically carried out by at least three stages of testing that involves people other than the programmer who wrote the code. These three levels of testing are performed to ensure that as many defects as possible are found and fixed and, more importantly, to increase the likelihood that the final software product satisfies the real needs of the actual users, in these papers, learning needs of students. When a mobile app is developed by the researchers/teachers and evaluated solely by the developer(s), it is possible that the software quality assurance processes have not been followed rigorously enough. The pedagogical affordances of the app may not have been fully explored or tested properly if no one else was involved in the evaluation. Hence dividing the apps into these two categories makes a significant contribution to informing us of the quality and relevance of the developed app in relation to the pedagogical mindset that is our focus in this paper. Throughout the paper, we will be referring to these two types as follow: • Self-developed: where the authors/researchers developed an app or technology and then investigated its use for learning. Of the studies included, 59% empirically investigated the design and/or use of an app that was proposed and developed by the researchers for a particular study; • Third party: where authors/researchers used a third party or commercially developed app or technology and investigated its use for learning. We were additionally interested to use the above mentioned differentiation of the apps for the following reasons: • Self-developed apps are developed for specific contexts and it is difficult to assess the generalisability of the use of the app or technology in another similar context (as there are not any replicated studies available); • In Self-developed app studies, researchers' bias may influence the reporting study outcomes. Figure 12: Type of mobile apps/technologies (Self-developed/Third party) Figure 12 shows that 59% of the included studies investigated use of self-developed apps for learning. This is in agreement with the findings from the review of (Zydney & Warner, 2016) where they reported that the majority of the studies evaluated in house 'customised apps' (self-developed by researchers) in their studies rather than using 'off-the-shelf apps' (Third party). They pointed out that the self-developed apps can restrict the availability of the apps to the public in comparison to the commercial mobile apps. This is due to the specificity of the platforms used by the researchers to design/develop the apps or the lack of availability of apps over internet or in apps stores.
The included studies evaluated the use of a wide range of apps and technologies for learning (see Table 14). 'Math4Mobile' 6 is the only app investigated in two different studies (S3, S8). The most investigated technology for the apps in included studies is 'Augmented Reality' (S4, S20, S21, S44, S46, and S47). It is noteworthy that all these six studies investigated the use of self-developed apps/technologies.
The columns in Table 14 include: unique study IDs, names of the apps/technologies used in the studies, the domains of mathematics or science, domain specificity or generality of the app, self-developed or Third party, and the context of the use of app/technology in the study. This table shows a comprehensive profile of the used apps and technologies in the reviewed studies. In particular, we have determined whether the app/technology is domain specific or generic. Among the self-developed mathematics app used in the studies, only one is generic; while in science, there are eight generic apps. Furthermore, in the three studies that focused on both mathematics and science, the self-developed app was domain specific, whereas the Third party developed apps were generic. Flyer messaging app S G N A mobile peer-to-peer messaging tool that provides meta-cognitive and procedural support, while tutors and a nature guide provided more dynamic scaffolding in order to support argumentative discussions between groups of students during the co-creation of knowledge claims.

S36
Minecraft on iPad S G N The simulated universe of Minecraft was used with aim to allow experiential learning , where emphasis was on knowledge construction rather than transmission, to more easily take place through experiencing and interacting with the environment S51 WebQuest S G N Students learned about resource recycling and classification through an instructional website based on the teaching tool of WebQuest

Figure 13: Apps and Technologies in included studies
There was a stronger emphasis on investigating the use of self-developed apps in science education studies. It would be interesting to investigate reasons for this trend in future research.

Pedagogical approaches
Previous reviews have indicated that much of the research in m-Learning is not grounded in pedagogical theory (Cheung & Hew, 2009;Zydney & Warner, 2016). In our results, a number of studies do not explicitly define the pedagogical approach underlying the design or use of the app or technology they investigated. However, in many of these studies the approach can be inferred from the way the app was used. Table 16 enlists and classifies the studies for which the pedagogical stance was reported or inferred (or what some authors referred to as the theoretical underpinnings). The papers excluded from this table did not discuss pedagogical approaches used and none could be inferred from readings of these papers.
The most frequently reported pedagogical approach in our included set of studies is Collaborative Learning, followed by Inquiry-based Learning (IBL), and Project and Problem-based Learning. The pedagogical approaches used and inferred from analysis of the papers are further discussed in Section 5. The themes of Collaborative Learning, IBL and Project and Problem-based Learning were a major focus of mlearning studies in this review, in line with the strong emphasis on these pedagogies in mathematics and science teaching (e.g. (Plass et al., 2013;H.-Y. Sung & Hwang, 2013)). There was also a notable dearth of studies focusing on transmissive or instructionist pedagogies. This low frequency is noteworthy given the prevalence of these more traditional pedagogies in mathematics education (Brousseau, 1997), the long-term influence of behaviourist theories on education software development (Wiburg, 1995) and the subsequent dominance of drill and practice and tutorial style education apps available in app repositories that are underpinned by these theories (Goodwin & Highfield, 2013). Table 17 shows the main settings in which participants in each study were using mobile technologies to support their mathematics and/or science learning. Context/setting FREQ Study ID Formal 20 S1, S5, S6, S7, S11, S12, S13, S14, S15, S17, S22, S25, S29, S32, S35, S36, S37, S38, S46*, S52 Informal 1 S30 Semi-Formal 12 S3, S4, S21, S26, S31, S34, S39, S45, S46*, S47, S50, S51 Multiple 12 S2, S8, S9, S10, S18, S19, S20, S28, S33, S41, S42, S48

Settings/contexts used in studies
For the purpose of this paper (and our wider project), formal settings are defined as traditional institution-based learning environments such as high school classrooms and laboratories; semi-formal settings are out-of-classroom contexts pre-determined by a teacher, such as school playgrounds, museums and field trips; and informal settings are recreational or everyday spaces chosen by learners, such as trains, cafes and parks. Finally, the multiple settings category is defined as participants in the study using their mobile devices in more than one setting, across at least two (physical) learning spaces and contexts. In the analysis of our sample, only one study needed to be allocated to more than one category (see Table 17). This study (S46) included two settings that were mutually exclusive, with participants choosing to participate in the task in only one of these settings (not both/multiple). Only one study (S30) was specifically conducted in an informal setting which is surprising as mobility is a key advantage of m-Learning where students can learn beyond the traditional classroom environment (Kukulska-Hulme & Traxler, 2005;Schuck, Kearney, & Burden, 2017). This finding is consistent with the trend in m-Learning studies as observed by other reviews. Crompton and Burke focused only on mathematics and found that 83% of their sample studies were conducted in formal educational contexts and the others did not indicate that any settings were informal but rather described the educational activities as outdoors in the natural environment (Crompton & Burke, 2015). Zydney and Warner have raised similar concerns about the dominance of formal contexts of the empirical studies in their sample as they argued that science explorative learning should largely take place outside the formal classroom settings in the natural environment (Zydney & Warner, 2016).
Another surprising result was the lower number of studies focusing on multiple settings compared to formal settings. Given the recent emphasis in the m-Learning literature on 'seamless learning' across contexts (Burden & Kearney, 2016;Rushby, 2012;Schuck et al., 2017;Toh, So, Seow, Chen, & Looi, 2013), we expected to find more studies in the multiple settings category in these mathematics and science education studies.

Cross analysis of the studies' attributes
Cross analysis of variables provides a richer picture of the collected data and their relationships. We have performed this additional analysis to identify potential gaps and patterns that exists. In this section, we present the cross analysis of the relevant variables, highlighting those patterns and gaps that were identified. Table 18 presents the studies in cross analysis with their research paradigm, foci and outcomes. The three interesting observations from this table are:

Research paradigms versus studies' foci and outcomes
(1) The studies that focused on Evaluation of Student Perceptions in m-Learning were using predominantly quantitative paradigms; (2) The studies with a focus on Effectiveness of Using App in m-Learning reported mostly achieved outcomes, independent of the research paradigm used; (3) Most studies in the Technology Implementation category, Student Engagement and Collaborative Learning categories were qualitative.  Table 19 presents the studies in cross analysis with their research paradigm, contexts and outcomes. The main interesting observations from this table are:

Research paradigms versus studies' contexts and outcomes
(1) The one study in informal settings is conducted using a quantitative paradigm.

Contexts and pedagogical approaches of the studies
(1) Only one study was conducted in an informal context (S30), which is surprising, given the mobile attributes of the technologies; (2) The one study conducted in an informal context used an instructionist learning pedagogical approach, which is unexpected given that informal settings are thought to privilege more diverse pedagogies; (3) The dominant pedagogical approaches in the formal, semi-formal and multiple contexts settings are collaborative and inquiry-based learning. This is to be expected given the theoretical underpinnings of these approaches encourage investigation in a range of settings. However, it is surprising that none of the studies adopting these pedagogical approaches were conducted in informal settings.  Table 21 shows the cross analysis of pedagogical approaches of the studies against their foci and outcomes. The main observations are:

Pedagogical approaches versus studies' foci and outcomes
(1) Studies that used the pedagogical approach of Collaborative Learning have reported more achieved outcomes than others.
(2) Effectiveness of Using App and Design of App were the foci of studies across various pedagogical approaches and these studies have reported more achieved outcomes. (3) All the studies with the Game-based Learning pedagogy theme and Realistic/Context-aware Ubiquitous Learning theme have reported their outcomes as achieved. This observation concurs with the findings of (Ke, 2009) who performed a qualitative meta-analysis of the studies that investigated games-based learning tools and reported that games-based pedagogies in m-Learning generally produce positive outcomes. (4) Studies with the reported pedagogical approach of Inquiry-based Learning had a range of study foci and produced contrasting outcomes. (1) In the domain of science, more achieved outcomes are reported under most of the pedagogies when the app is self-developed, whereas the opposite can be observed in mathematics; (2) There are more studies in science using inquiry based learning, collaborative learning and game based learning, whereas in mathematics the dominant pedagogies in reported studies are collaborative and project/problem based learning.

Summary of SLR results
In summary, the following findings hold for each research question: (RQ1) What are the reported research methodologies, foci, outcomes in existing literature? Quasi experimental and experimental designs are the most frequently used quantitative research methodologies in the studies reported. The qualitative research studies in the SLR employed a greater variety of research methods. The most frequently stated foci in our included studies are 'effectiveness of using app', 'design of the app' and 'technology implementation'. Sixty-five percent of the included studies reported achieved outcomes of their stated foci, whereas 28% were inconclusive and 7% were either mixed or not achieved.
(RQ2) What mobile technologies and/or apps are used in these studies? Fifty-nine percent of the studies investigated use of self-developed apps/technologies and 41% investigated use of 3 rd party apps/technologies. There was a stronger trend of investigating domain specific self-developed apps in science, whereas in mathematics more generic third party apps were used in the studies. Augmented Reality was the most frequently applied technology in included studies.
(RQ3) What are the pedagogical approaches being adopted? What are the contextual settings under which these technologies are investigated?
The collaborative learning, inquiry based learning, and project/problem based learning pedagogical approaches were the most frequently reported in the included SLR studies. The majority of the studies were conducted in formal and semi-formal contexts; in comparison, informal contexts have largely been neglected.

Discussion
The discussion elaborates on the findings by grouping identified pedagogies according to their underlying theoretical frameworks. Three overarching pedagogical approaches are discussed, with particular reference to the discipline areas of mathematics and science, given these were the focus of the SLR. We then examine the relationship between mobile learning and these pedagogies. The discussion goes on to articulate the strengths and weaknesses of the reported methodologies in the SLR. We conclude the discussion with a consideration of the implications for future studies of mobile learning in mathematics and science, noting the silences in the current literature.
This study examined literature concerning use of mobile apps in mathematics and science in secondary schools for reasons stated earlier (the need to understand student engagement in these disciplines in schools and the role of mobile learning in facilitating this engagement). It addressed a gap in the secondary education literature as most previous analyses of mobile learning research have focused on tertiary education contexts. It is likely that in the future, a systematic literature review of mobile learning studies in primary school education will be needed as practice expands with mobile devices in primary schools.

Pedagogical approaches
On examination of the pedagogies mentioned in the papers in the SLR, it could be seen that there were links and commonalities amongst various approaches identified in Table 16. It became clear that three overarching pedagogical themes were: Collaboration, Inquiry-based Learning (IBL) and Realistic Learning. These approaches were not mutually exclusive and many papers fitted in more than one category. In what follows we provide broad descriptions of these three themes that align with the way they were used in the articles in this SLR.
The Collaboration theme indicated students working together in groups, sharing goals, understandings and discussions to achieve agreed objectives (Martin del Pozo, Gómez-Pablos, & Muñoz-Repiso, 2017). On reading the five papers in the SLR that reported or suggested the use of social constructivist pedagogies, it was clear that these too aligned with the above definition of collaboration. Consequently, Collaboration as an overarching theme described the approach in 18 papers. However, given that three papers fitted in both Collaboration and social constructivism, the total number of papers was 15 in this overarching category.
The overarching Inquiry-based Learning theme focuses on questioning, investigating, critical thinking and problem solving where evidence is gathered, findings reported and explanations elicited and negotiated (Marshall, Horton, Igo, & Switzer, 2009). Consequently, Project-based Learning and problem solving fitted under the IBL category. There were nine Project and Problem-based Learning papers but one was also in the IBL category. Consequently, the total number of papers fitting in this overarching theme was 19.
When considering the second set of papers, one of the authors, a mathematics educator, was reminded of the similarity of the approaches to the Realistic Mathematics movement. This pedagogical approach was developed by Freudenthal in the 1970s and has enjoyed varying popularity in mathematics education (Freudenthal, 2006). Consequently, we constructed a Realistic Learning pedagogical theme, in which we grouped studies indicating mathematics or science education located in authentic contexts, with an aim of enabling students to make meaning of the subject matter. In the Realistic Mathematics approach, the context may not be real but must be able to be readily imagined by the student as a real scenario (Van den Heuvel-Panhuizen & Drijvers, 2014). We suggest that the ideas proposed by the Realistic Mathematics movement link well to developments in the mobile learning area, such as situated learning and context-aware ubiquitous learning. One reason for suggesting this fit is that mobile learning contexts are able to locate students in virtual contexts that are easily imagined and understood as real. We therefore developed a somewhat innovative grouping of pedagogies in a Realistic Learning theme. Fitting with this description of Realistic Learning pedagogies are the tabulated sub-categories of Experiential Learning (3), Knowledge Building (1), Situated learning (1) and Realistic/Context-Aware Ubiquitous Learning (7). There was no overlap within this category, so a total of 12 papers fitted in this overarching theme. See Figure 15 below. Three papers did not fit into these three overarching themes, two of which were instructionist in nature and one that was concerned with student self-evaluation. Therefore, it is clear that principles of social and authentic learning underpinned the pedagogical approaches of a large majority of papers in the SLR. It is interesting that all papers, except for these three, fitted into at least one of the three overarching themes.
It is noteworthy that many papers were identified under the Games-based Learning category (7). However, within this group of papers (see Table 16), different pedagogical approaches informed or underpinned the game designs. There was one paper within this group that was instructionist in nature, two used realistic pedagogies, three used IBL, one other used IBL and realism; and the game in one study was categorised within all three overarching themes. For this reason, GBL was not identified as an overarching approach. The pedagogies used were dispersed amongst the other approaches.
A number of the papers described as adopting Realistic pedagogies stemmed from a mathematics tradition, which is to be expected as this is the discipline in which this pedagogy was initially proposed and developed (Freudenthal, 2006). Similarly, in science education there is a tradition of inquiry-based learning and IBL is less frequently described in the mathematics education papers. However, it can be said that for both IBL and realism approaches, that although they stem from different disciplines, they are similar in nature and both emphasise more progressive, authentic and social characteristics.
While mathematics teaching in secondary schools is often characterised as being focused on drill and practice and other transmissive pedagogies (Brousseau, 1997), it is noteworthy that this focus did not appear in the pedagogical approaches reported or inferred in the studies in the SLR. This disparity in the literature indicates that further research on current pedagogical approaches utilised in mathematics education would be beneficial. It would be interesting to investigate whether the more Realistic pedagogies are restricted to mathematics tasks using mobile devices or whether they now occur more broadly in mathematics education. If indeed, these approaches are occurring more frequently in mobile learning, this is an argument for strongly encouraging the use of mobile devices in mathematics education.
Given that previous research has identified the proliferation of drill and practice and transmissive apps in the app stores (Goodwin & Highfield, 2013;Murray & Olcese, 2011), the findings in this SLR possibly point to a change in approaches of software designers. It is possible that the interest of teachers in more authentic and social approaches has encouraged developers to design apps that meet emerging market needs. Alternatively, educational researchers' interest in more progressive mobile pedagogical approaches may have influenced researchers to conduct studies in which such approaches were used. Further research could investigate whether today's teachers are still using apps with transmissive underpinnings, and whether this use tends to be largely unreported by researchers. This is possibly a limitation of the body of work reported in the SLR.

Researchers' bias
Studies incorporating more critical analysis are needed to address the perception of researchers' bias towards achieved objectives, especially in studies involving self-developed apps. The results showed a significant trend by researchers to investigate the use of self-developed mobile apps for science learning (Figure 12). This also contributes to the issues of platform mismatch and lack of availability of these apps to the public (Zydney & Warner, 2016). The field of mobile learning research could be improved if there were more independent educational studies into the use of apps for learning. Further, as noted in 5.1, researchers may have chosen to study practices that they perceived to be progressive, thus contributing to a possibly skewed emphasis in the SLR regarding common pedagogical approaches.

Reports on achieving stated objectives
The SLR results show a small percentage of studies that did not achieve their stated objectives. There are insufficient data to make any conclusive remarks on the reasons for this occurrence. It is not clear if this is due to the long observed mindset in quantitative paradigms where authors are pushed to publish favourable results to win research funds. It has long been acknowledged that published empirical work is more skewed towards statistically significant (P<0.05) research (Dickersin & Min, 1993;Dwan et al., 2013) as well as those that report successful achievement of stated objectives. This study publication bias makes it difficult for reviews and metaanalyses to present an accurate picture of the effectiveness of m-Learning (Dwan et al., 2013).

No replication
None of these empirical studies have been replicated to increase the confidence and reliability of the results of the studies, leading to doubts that the results may be useful to school mathematics and science learning. Although there has not been much replication of specific studies, the overall trend in the SLR results indicating achieved objectives across a range of apps in both mathematics and science in varied contexts lends weight to the generalisability of the overarching positive impact of m-Learning. However, the SLR provides limited data on the generalisability of findings regarding the use of specific apps. In the case of qualitative studies, there are limited data on the transferability of the findings due to the idiosyncratic nature of the foci of the studies in the SLR.

Dearth of mixed-methods studies
The SLR revealed a gap in the literature concerning mixed methods studies. There were few such studies identified. We argue that a range of qualitative and quantitative methods are key to fully interrogating mobile learning phenomena. The challenge of effective adoption and utilisation of mobile technologies in schools can only be addressed if we understand the interactions between the complex social dynamics of the learning environment and the technology (Salomon & Perkins, 1998;Wertsch, 1993). Teacher and student decisions about adoption and use of technologies vary according to a wide range of interacting factors such as: pedagogical beliefs and confidence levels (Ertmer & Ottenbreit-Leftwich, 2010); socio-economic gaps between student cohorts that affect access to technology (Somekh, 2004); user choices that trade off various benefits and costs; ease of use; and school contextual factors that promote or inhibit innovation (Burke, Schuck, Aubusson, Kearney, & Frischknecht, 2017).
To understand this complex socio-technological-educational environment of school mathematics and science education, a bricolage of research methodologies is essential. Facer (2003, p. 226) recognised that there is "no single theoretical framework available sufficiently rich to allow us to prise open all of the complexities" inherent in educational technology innovation (Facer, 2003). Hence, in addition to the qualitative and quantitative methodologies listed in Figure 7, we recommend further complementary approaches such as choice methodologies (Aubusson, Burke, Schuck, Kearney, & Frischknecht, 2014), user-centred design, and sentiment analysis (Bano, Zowghi, & Kearney, 2017). This combination of research methodologies will enable more effective and rich investigations of school-based innovation in mobile technology-enhanced learning.

Focus on student learning outcomes
The majority of studies investigated processes associated with teaching and learning, for example, collaboration, constructivist learning or investigated the design and features of apps used to enhance learning. There was a gap in the literature concerning discipline knowledge. Only a minority of studies reported on learning outcomes related to specific mathematics or science knowledge. We recognise that the measurement of learning in mathematics and science is not unproblematic (Duit & Treagust, 2003;Yi & Lee, 2017). Nevertheless, the impact of studies in this field would be increased if more studies reported on mathematics or science learning outcomes arising from specific pedagogical interventions combined with the use of apps.

Seamless learning and Third Space learning: A gap in the literature
As evidenced in Figure 14, most reported studies focused on m-Learning in formal, arguably contrived learning spaces such as school mathematics classrooms and science laboratories. Apart from the obvious focus of learning in more authentic informal settings, such as 'in-situ' investigations in everyday, out-of-school learner-generated contexts, future research needs to probe the rich possibilities of mathematics and science learning extending across a wider range of contexts, including recreational spaces and informal virtual spaces such as social media or multi-player game sites, or what  describe as 'Third Space learning'. Most teachers are faced with the situation of teaching with mobile devices in a scheduled, face-to-face classroom environment to meet formal curriculum requirements. Future studies of Third Space mathematics and science learning need to be mindful that the classroom is, of necessity, one of the spaces in which learning takes place. However, there is a need to exploit the mobility offered by devices and therefore it is of interest to explore ways of bridging learning experiences across different spaces (Sharples, 2015;Wong, 2015). Such studies could consider the use of apps that might encourage more subtle, seamless contextual boundary crossing, for example between virtual and physical spaces (Wong & Looi, 2011), amongst traditional school classrooms and laboratories and more contemporary break-out or maker spaces, or across disciplines, such as in inter-disciplinary projects .

Conclusion
In this paper we have presented the results from the analysis of 49 empirical studies (60 papers) published from 2003-2016 that focused on investigating mathematics or science learning and teaching with mobile apps and technologies in secondary school education. The SLR was conducted with the aim of providing insights into the nature of recent research that has been implemented using mobile learning in secondary school science and mathematics. Such insights are valuable in suggesting new directions for research studies and in providing a coherent and larger-scale understanding of contemporary research trends. This broader investigation is in contrast to most educational research studies, which tend to be small-scale in nature (see figure 9).
Insights gained from this SLR concerned the foci of studies, the pedagogical approaches and the context in which the studies took place. The most stated foci in our included studies were: Effectiveness of Using Apps; Design of the Apps; and Technology Implementation. Most of the identified pedagogical approaches fitted into three overarching themes which could be characterised as emphasising authenticity and social learning: Inquiry-based Learning, Collaboration and Realistic Learning. Few approaches were characterised as instructionist. This could suggest that educators' and researchers' interest in progressive pedagogies are now driving app design, development and use, after decades of behaviourist traditions dominating these endeavours.
Future research needs to include longer-term studies and more varied mixed methods approaches, and focus on more diverse mathematics and science learning contexts across a variety of formal and informal spaces. An interesting observation that we draw from this SLR relates not to what is reported in the studies but to what is almost absent from the studies. Many of the studies in this SLR do not appear to consider one of the defining features of mobile learning: the ability to learn across contexts using the device. Most of the studies were located in formal learning contexts where the mobile learning did not exhibit the flexibility possible in the Third Space, highlighting the fact that seamless learning needs to be further scrutinized in studies of secondary school mathematics and science.