Towards more replicable content analysis for learning analytics

Kitto, K; Manly, CA; Ferguson, R; Poquet, O

Towards more replicable content analysis for learning analytics

Kitto, K

Manly, CA Ferguson, R Poquet, O

Permalink

Publisher:: ACM
Publication Type:: Conference Proceeding
Citation:: LAK2023: LAK23: 13th International Learning Analytics and Knowledge Conference, 2023, pp. 303-314
Issue Date:: 2023-03-13

Closed Access

	Filename	Description	Size
	3576050.3576096.pdf	Published version	529.92 kB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Kitto, K https://orcid.org/0000-0001-7642-7121
dc.contributor.author	Manly, CA
dc.contributor.author	Ferguson, R
dc.contributor.author	Poquet, O
dc.date	2023-03-13
dc.date.accessioned	2024-03-14T21:52:43Z
dc.date.available	2024-03-14T21:52:43Z
dc.date.issued	2023-03-13
dc.identifier.citation	LAK2023: LAK23: 13th International Learning Analytics and Knowledge Conference, 2023, pp. 303-314
dc.identifier.isbn	9781450398657
dc.identifier.uri	http://hdl.handle.net/10453/176726
dc.description.abstract	Content analysis (CA) is a method frequently used in the learning sciences and so increasingly applied in learning analytics (LA). Despite this ubiquity, CA is a subtle method, with many complexities and decision points affecting the outcomes it generates. Although appearing to be a neutral quantitative approach, coding CA constructs requires an attention to decision making and context that aligns it with a more subjective, qualitative interpretation of data. Despite these challenges, we increasingly see the labels in CA-derived datasets used as training sets for machine learning (ML) methods in LA. However, the scarcity of widely shareable datasets means research groups usually work independently to generate labelled data, with few attempts made to compare practice and results across groups. A risk is emerging that different groups are coding constructs in different ways, leading to results that will not prove replicable. We report on two replication studies using a previously reported construct. A failure to achieve high inter-rater reliability suggests that coding of this scheme is not currently replicable across different research groups. We point to potential dangers in this result for those who would use ML to automate the detection of various educationally relevant constructs in LA.
dc.language	en
dc.publisher	ACM
dc.relation.ispartof	LAK2023: LAK23: 13th International Learning Analytics and Knowledge Conference
dc.relation.ispartof	International Learning Analytics and Knowledge Conference
dc.relation.ispartofseries	ACM International Conference Proceeding Series
dc.relation.isbasedon	10.1145/3576050.3576096
dc.rights	info:eu-repo/semantics/closedAccess
dc.title	Towards more replicable content analysis for learning analytics
dc.type	Conference Proceeding
utslib.location.activity	Austin, USA
pubs.organisational-group	University of Technology Sydney
pubs.organisational-group	University of Technology Sydney/DVC (Teaching and Learning)
pubs.organisational-group	University of Technology Sydney/DVC (Teaching and Learning)/Connected Intelligence Centre
pubs.organisational-group	University of Technology Sydney/Strength - CREDS - Centre for Research on Education in a Digital Society
utslib.copyright.status	closed_access	*
pubs.consider-herdc	false
dc.date.updated	2024-03-14T21:52:41Z
pubs.finish-date	2023-03-17
pubs.place-of-publication	USA
pubs.publication-status	Published
pubs.start-date	2023-03-13
dc.location	USA

Abstract:

Content analysis (CA) is a method frequently used in the learning sciences and so increasingly applied in learning analytics (LA). Despite this ubiquity, CA is a subtle method, with many complexities and decision points affecting the outcomes it generates. Although appearing to be a neutral quantitative approach, coding CA constructs requires an attention to decision making and context that aligns it with a more subjective, qualitative interpretation of data. Despite these challenges, we increasingly see the labels in CA-derived datasets used as training sets for machine learning (ML) methods in LA. However, the scarcity of widely shareable datasets means research groups usually work independently to generate labelled data, with few attempts made to compare practice and results across groups. A risk is emerging that different groups are coding constructs in different ways, leading to results that will not prove replicable. We report on two replication studies using a previously reported construct. A failure to achieve high inter-rater reliability suggests that coding of this scheme is not currently replicable across different research groups. We point to potential dangers in this result for those who would use ML to automate the detection of various educationally relevant constructs in LA.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/176726