Student data collection: helpful history, harmful possibilities?
08 Feb 2021
Editor’s note: The CDL is proud to launch a series of occasional perspectives offered by our student affiliates. As part of their work in the center, CDL student affiliates are encouraged to explore matters related to digital learning that interest them, and to develop and express their own research-informed opinions on these matters.
The spring and fall semesters of 2020 presented many challenges to students and faculty, and we were resilient: we changed in response to the mounting challenges, and kept changing. We adapted to modes of digital learning that have become familiar to us since the outbreak of the novel coronavirus last spring, like video-conferencing on Zoom or Google Meet and interacting with courses primarily through Canvas. Now that we are more or less acclimated to digital learning during the pandemic, we have a moment to pause and ask questions. Perhaps it would be more accurate to say we need this moment to pause and ask questions before we proceed further along the path of digital pedagogy, lest we be faced with an unexpected turn, the inevitable fork in the path we must be prepared for.
As a student and intern for the CDL, I am curious about how students’ data is being collected and used during this time, particularly through the college’s learning management system, Canvas. Most of us are aware, to varying degrees, that websites and other platforms we interact with track our data. We may not be fully aware of the scope and means of data tracking and collection, but we make up for this with a suspended sense of wariness surrounding our actions on the internet. We are advised to proceed with caution in arenas we do not have a clear understanding of, like the overlap between “big data” and the internet. Being aware of where your data is going (and why) is key to maintaining a sense of autonomy over your digital self. So, what about data collection on an institutional level? This fall semester, almost every SUNY student submitted assignments, wrote discussion posts, took quizzes, or otherwise interacted with material on Canvas or a similar learning management platform to fulfill their degree requirements. Given our increasing reliance on and trust in Canvas, we should take a moment to explore some important questions: How is student data collected, both on an institutional level and through third-party applications like Canvas? What is the data being collected? And, perhaps most important, to what end is students’ data being collected? As students navigating digital learning, it’s important that we address these questions and their implications now, both for certainty in our own digital autonomy and for that of future college students.
First, how is student data collected? There is some data we give to Geneseo with our implicit consent, like the contents of our college application, SAT and ACT scores, and midterm/final grades. The data collected here is straightforward and anticipated: colleges require this data to admit students. But other data is collected without our explicit or implicit consent, such as data available through student interaction-monitoring on Canvas. When we apply to colleges and take exams, we expect certain data to be collected in order to reflect our academic achievements and who we are as students. However, many students are not aware that their data is being collected and monitored by Canvas with a similar aim. For example, a professor may review the number of times a student clicked on a module or page in their course as well as the amount of time a student spent on the module or page. Canvas automatically collects and analyzes this student data, which is then made accessible to faculty and administration. Most of the time, this information is helpful: if a student is not engaging with a course, this is reflected in the data collected by Canvas and communicated to the professor, who can make the necessary changes to help the student succeed. More specific algorithms applied to this data might indicate whether the student needs more individual assistance or if there is a larger issue with the professor’s teaching style. On the other hand, this information could be misleading. Does a certain number of clicks and a certain amount of time spent on material accurately convey a student’s engagement with a course? Students might engage with course material in ways that Canvas cannot accurately depict, like doing independent research online or participating in lively discourse on Zoom. If your learning style deviates from the norm established by Canvas, you might be unfairly viewed as an unengaged student. Student data monitoring systems come with an embedded assumption that unengaged students (judged as such using Canvas data) are irresponsible and less likely to succeed. The monitoring system does not take into account different approaches to learning course material, extenuating circumstances in an individual’s life, and other forms of human complexity.
Despite its shortfalls, automated data collection on third-party applications like Canvas is often used to assist students and improve academic institutions through the incorporation of an early alert system. Early alert systems collect and generate data that serves to fulfill the college’s ethical obligation to help students succeed. SUNY Geneseo employs an early alert system through Navigate, a product of the higher education research company EAB. Navigate collects data on students’ educational progress and performance. It applies algorithms to the collected data and presents it in a more complete and helpful form, such as using statistical probability to predict a student’s success rate or their likelihood to change majors. As always, there are good and bad applications of this data. If a college doesn’t care whether its students succeed, it might ignore data that indicates a struggling student and do nothing to intervene. SUNY Geneseo collects and frequently reviews students’ data in order to ensure student success by staging student interventions when necessary. For example, Geneseo administration can view which students have registered for the next semester and which students have not. As a university committed to student success, Geneseo reaches out to the unregistered students to find what support or guidance they may need. The algorithms used in early alert systems give faculty and administration insight that might be beneficial to share with struggling students; for example, a biology major who recieves mostly C’s and is uncertain about their future in the major might be better suited as an anthropology major, based on data collected for that particular student as well as data collected for all biology majors at the institution. In this sense, the collection of student data in all forms helps an institution make important decisions and see what areas they need to invest in to better support students.
Since this data collection occurs at an institutional level, it is also beneficial for reviewing and improving the processes of the institution itself. EAB collects student data from colleges nationwide in order to judge institutional success rates. This information is essential in its ability to inform needed institutional shifts and community-based change. Recently, EAB created the Moonshot for Equity initiative in response to data collected from higher education institutions that indicated overwhelming equity gaps faced by students of color. This is an example of institutional data collection put to a great purpose. But we should remain somewhat skeptical of data collection on a more individualized scale.
Since our data is being collected without our explicit or implicit consent, why is it permissible for an outside party to make personal inferences about students based on this data? Can the data collected by learning management platforms truly reflect who we are as individuals? Learning analytics, which is an advanced form of data-mining not used by Canvas at this time, strives to compose a “data-double” of the individual, a move that flattens our human complexity: “in doing so, individuals are taken from a corporeal whole and transformed into binary code as ‘data doubles’ with the purpose of changing ‘the body into pure information, such that it can be rendered more mobile and comparable.’ The problem is that the data double fails to be a ‘comprehensive or representative’ reflection of human life, yet powerful actors use it to influence a person’s behavior” (Jones 2019). Ascribing identity based on data collected without our awareness or consent seems unethical, regardless of its ends. For one, the data collected by LMS’s may yield an inaccurate profile that “follows” us from one institution to another, impacting our future education and career opportunities, especially if we use (and are used) by LMS’s from an early age. This is especially concerning right now, when LMS’s are being used in the K-12 setting more than ever before. More generally, “students may rightfully be worried that the data and insights mined from [Canvas] will become a part of their permanent educational record and lead to decontextualized decision making” (Jones 2019). We have limited control over the algorithmic calculations that prescribe us identities on these platforms, especially when it is unclear precisely which data wields the most weight. We have even less control over where this information ends up. Recently, Canvas was purchased by a private investment equity firm for two billion dollars, rendering our data more vulnerable than ever: “With no federal privacy laws governing student data brokers, student data can be collected, sold, and bought without any apparent legal protections from widespread exploitation” (Mariachi and Quill 2020).
Since the outbreak of the novel coronavirus last spring, we depend on Canvas as a learning tool more than ever, and rejecting its widespread use is not a viable option. Despite a seeming lack of control over our own data within this system, we shouldn’t feel powerless. In fact, now that this information is gradually becoming more transparent, we should feel empowered to make informed decisions based on our new awareness. Part of this awareness is recognizing and questioning the current opacity concerning information practices in higher education. To start, many students are not aware of the extent to which their data is gathered, not to mention to what end the institution is gathering this information. Since it’s their data being collected, students retain the right to question institutional data collection policies, and demand a clearer communication of these policies. Further, students should exercise their individual agency by holding the hard line on questions concerning consent and control. If the system was created with improvement in mind, why is consent absent from the model? If we are not extended the ability to consent to data collection and analysis within a learning management system like Canvas, it’s up to us to demand answers to questions regarding our personal privacy. Students must remember that their voice, both as individuals and as a collective, matters, and is a powerful force for addressing institutional oversights.
Moving forward, we should remain cognizant of the tension between modes of data collection: its ability to both help and harm students. Improvements to the learning management model are needed to reduce its potential for causing harm. Chief among these improvements would be informing students and faculty how and why their data is being collected, and then offering them the ability to consent to this collection, or otherwise decide how their data will be used. We might also envision alternative, open-source platforms that offer a space for digital learning without the threat of student data being collected and sold by third-party applications. It is our right as students to address the policies that unfairly affect us, but it is also our ethical duty to exercise agency for ourselves and our information when our autonomy is threatened. We can recognize the good that comes from institutional data collection while simultaneously questioning the ethics of nonconsenusal, automated data collection on an individual level. Asking questions is the humble beginning of enacting lasting changes. If we ignore our opportunity to ask these questions, if we are silent in this moment, we risk power going unchecked, incurring changes that may prove more harmful than helpful to college students nationwide.
The author thanks SUNY Geneseo Dean of Academic Planning and Advising Celia Easton and Associate Provost for Academic Success Joe Cope for information on the early alert system and other forms of data collection on an institutional level.
Jones, K.M.L. “Learning analytics and higher education: a proposed model for establishing informed consent mechanisms to promote student privacy and autonomy.” International Journal of Educational Technology in Higher Education, vol 16, no. 24, 2019. https://doi.org/10.1186/s41239-019-0155-0
Marachi, Roxana and Lawrence Quill. “The case of Canvas: Longitudinal datafication through learning management systems.” Teaching in Higher Education, vol. 25, no.4, 2020, pp. 418-434.