Danish researchers want to use AI to identify children struggling with writing

17. januar kl. 15:31
Danish researchers want to use AI to identify children struggling with writing
Janus Madsen, founder of Writereader and one of the initiators of the ATEL project. Illustration: Writereader.
The ATEL research project, which studies school children’s early writing development, has already yielded important knowledge—now the hope is that it will also lead to algorithms that may be able to help children with reading and writing difficulties.
Artiklen er ældre end 30 dage

When it comes to research into children’s language development, the focus has traditionally been on reading and spelling. So far, writing has been relatively underrepresented, but recent developments in digital teaching have provided new opportunities to study the development of written language in the youngest primary school children.

Automated Tracking of Early Stage Literacy Skills, ATEL

The project started in 2018 and is expected to be completed in 2023.

The project aims to develop a model for determining developmental levels in children’s early writing. The model is intended to form the basis for the development of automated assessment of students’ texts written in digital didactic writing tools, which in this case is the Skriv og Læs learning tool.

The project is worth DKK 14 million, 66 percent of which comes from the Innovation Fund Denmark, and the rest is self-financed.

The project is led by the company Writereader, the Danish School of Education (DPU), and DTU. The National Centre for Reading has also participated.


This is the starting point for the project Automated Tracking of Early Stage Literacy Skills, or simply ATEL. The project is partly focused on studying the way in which the children develop written language, and partly focused on building AI models which should help get that knowledge out and into practical teaching.

This is what Janus Madsen, founder of the edtech company Writereader and one of the people behind the project, tells us. The company is behind the digital teaching tool Skriv og Læs (“Read and Write”), which is used in approximately half of all Danish schools and forms the basis for the data collection for the AI project.

“The scoring model we are fine-tuning is an AI model that assesses children’s writing and that should help teachers figure out how to help their students. We hope that it can also be expanded to spot patterns related to writing and reading difficulties, for example dyslexia,” explains Janus Madsen, who worked as a primary school teacher for 17 years before founding Writereader.

Data has been collected through the use of the Skriv og Læs platform in Svendborg among pupils in grade 0–2 during the past 3.5 years. A total of 2100 primary school pupils and their teachers have participated.

Still unsure if it will work in practice

The ATEL project actually consists of two different tracks. The first, which is the one Janus Madsen refers to, has already produced important educational research. Namely, the project participants from the Danish School of Education (DPU) have found four specific dimensions within children’s written language related to not only spelling, but also their understanding of content, text production, and sentence construction.

Artiklen fortsætter efter annoncen

The idea is that one will be better able to bring that knowledge into play with an algorithm that can place students at four different stages of development within the four dimensions. In theory, it can open up much more differentiated teaching in writing for the youngest students.

The work on machine learning in that part of the project is also the most experimental, according to Michael Riis Andersen, associate professor at DTU Compute, who is helping develop the AI models.

“We are relatively certain that we can go a long way in solving the problem from a machine learning perspective if we can get enough data. But whether the models can work effectively in practice across many different schools and even more students is not something we can say with certainty yet,” he explains.

Warning flags

Because this part of the project is more complicated, the DTU researchers cannot use the whole data set. They can only use the texts that have been processed and sorted in relation to the different stages of development by their partners from the Danish School of Education. So far, one of the most difficult points has proven to be the analysis of the students’ text production, because it requires a deeper understanding of the text than the other stages of development do.

Artiklen fortsætter efter annoncen

“There are fairly well-functioning models in English for some aspects of text production compared to plain text. For example, if you have two sentences and write ‘it’ in the last of them with reference to a word in the previous sentence, the models can pick it up. We also have a bit of that in Danish, but our project is complicated by the fact that we are talking about texts written by children,” Michael Riis Andersen explains and emphasizes that the above example—what is also called coreference resolution—is only one minor part of a complete analysis of text production.

At the end of the ATEL project, dyslexia is also to be looked at. Children’s texts will be comparted to the results in the national tests used to screen for writing and reading difficulties.

“The question is whether we can go back into the texts of children who have been found to be dyslexic or have other writing or reading difficulties and investigate whether there are any patterns in their early writing related to the four dimensions and stages of development which we have found in their written language. Something that we can use to automatically raise a flag that says this child deserves special attention,” Janus Madsen says.

He emphasizes that the idea is not to replace the national tests, but that it can contribute to a more complex picture of the individual student’s written language development.

Trained on children’s books

The part of the project that the ATEL project group has been working with for the longest time and that has thus also seen the biggest improvements is what is called adult writing—a rewritten and comprehensible version of the text that the young students have written and based on which they can model their own writing. This is typically something that takes a long time for the teacher, but also something that is important in relation to following the children’s development of written language.

“The idea is that the algorithm should come up with suggestions for adult-written versions of the children’s texts based on what a lot of other children have previously written and had ‘translated’. With this, we hope to be able to get things in motion and make the adult writing more efficient, because the more the teachers rewrite, the more data we get, which will also make our models smarter,” Janus Madsen explains and says that he expects that the models will be tested on teachers outside Svendborg in the course of the year.

We started developing the models when there were around 100,000 available texts from the children and their teachers, while we at present have around 400,000 usable texts.

“This type of data is so different from other types of textual data we’re familiar with because it’s children’s writing. So both form and content differ greatly from other types of problems one sees in connection with text-related projects,” Michael Riis Andersen explains.

This also means that, as in many other projects, it has not been possible to use, for example, Wikipedia articles for further training of the models. The language is simply too complicated for it to make sense compared to the children’s very simple texts.

Artiklen fortsætter efter annoncen

“We have been lucky in that we have also secured a large collection of children’s books from a major Danish publisher, and our hypothesis is that even if the children’s books are not written by children, their universe matches our project much better. It’s something we’re still looking at, but right now it points to an improvement in the model,” Michael Riis Andersen says.

Illustration: privatfoto.

A step up the ladder of abstraction

The biggest challenge for the project is that it is a rather unique one of its kind, but that does not mean that there is no other research that can be consulted. Some of the existing models the team has relied on are used for machine translation.

“It’s about taking a step up the abstraction ladder and looking at what formats and types of problems we’re working with. In terms of methodology, it is also reminiscent of what one does when translating from French to Danish on the computer, we just work with children’s language and adult language,” Michael Riis Andersen says.

When it comes to measuring the success of the model, Michael Riis Andersen and the rest of the DTU team are looking at several metrics. One of the most important is the editing distance, i.e. how close the computer-generated adult-written version can come to the one written by the teacher.

“In the latest figures I have, about 40 percent of our computer-generated texts were identical to those the teachers had written,” Michael Riis Andersen says and adds:

“However, it’s not a perfect measure because it doesn’t take into account the severity of misspellings. In other words, if it’s only a few silent letters that the student is missing or if we’re talking about bigger mistakes. Proposing a text that is 100% identical to the teacher’s is often a very strict requirement, so we also measure how often we are closer to the adult text than the children’s text, and we are closer in 65-70 percent of the cases.”

Therefore, he believes that, in the vast majority of cases, the computer-generated text can be beneficial to the teacher because it is easier to understand. In the long term, the hope is also that this way of measuring the results can be used to disqualify computer-generated texts that are too bad, so that in those cases the teachers do not have to spend time dealing with both a broken student text and a broken computer text.

An example of a text written by a child and the computer-generated “adult version” Illustration: Writereader.

Requires extra effort at the schools

While the project is still ongoing, the focus is on collecting usable texts from students and teachers.

“An AI model like this will never be finished, it can always be improved, but we are actually certain that the project will actually end with us getting the results we had intended. And it also seems that it will be better than what we had hoped for when we originally started,” Janus Madsen from Writereader says.

Janus Madsen has already spent much of his time during the project visiting the schools. In particular, kindergarten class teachers, who do not usually work so much with early writing, required some support—according to the project description, the children must use the Skriv og Læs platform at least 60 minutes per week.

“There are a lot of people who are open to projects in the primary schools. But there is also a constant stream of new initiatives that the teachers have to decide whether to participate in, so supporting them and keeping them focused has been a very large part of the project management. Especially when the project is as long-term as this one,” Janus Madsen says.

In order to save the teachers as much time as possible when using the Skriv og Læs platform, Writereader also added a number of additional tools to the app, such as writing templates and images that teachers could lean on, so that it became easier to do writing exercises.

“During my time as a primary school teacher, I experienced that there is a great deal of teaching that is based on opinions, so my goal in creating digital teaching aids is that they must be based on research. That was also my idea with this project; that we should link research and practice,” he says.

At present, there is about half a year of the project left, and the group has begun to look at the possibilities of continuing the work when the project framework expires.

Ingen kommentarer endnu.  Start debatten
Debatten
Log ind eller opret en bruger for at deltage i debatten.
settingsDebatindstillinger