
Welcome to My Personal Website
Hello and welcome to my corner of the internet! I am Auday Berro, a computer scientist specializing in natural language processing (NLP) and Dialogue Services. My work is centered on the field of paraphrase generation and harnessing the potential of language models, including large language models (LLMs), to push the boundaries of what is possible in human-computer interactions.
About Me
Dr. Auday Berro received his PhD in Computer Science from Université Claude Bernard Lyon 1 (UCBL, France) on June 25, 2024, under the supervision of Prof. Boualem Benatallah. Since December 2024, he has been a postdoctoral researcher at the LIRIS laboratory in Lyon, France, under the supervision of Prof. Mohand-Saïd Hacid. His research focuses on AI-driven digital health, leveraging real-world patient data, artificial intelligence, and reinforcement learning to optimize sleep patterns and enhance mental well-being through adaptive, personalized interventions.
Before joining UCBL, he obtained a Bachelor's degree in Computer Science from the Lebanese University, Faculty of Sciences 4 (Zahlé, Lebanon) in 2017, followed by a Master's degree in Computer Science from the Lebanese University, Faculty of Sciences 1 (Hadath, Lebanon) in 2018. In 2019, he completed a Master 2 in Data Science for Risk Analysis, also at Faculty of Sciences 1, under the supervision of Prof. Ali Jaber. During this period, he undertook a research internship entitled Polluscope: microservices based data-intensive workflow, focusing on the automatic composition of Big Data applications using the cloud and containerization with Docker. This internship was carried out under the supervision of Dr. Yehia Taher within the ADAM team of the DAVID laboratory at the University of Versailles - Saint-Quentin-en-Yvelines (Versailles, France).
My Research
In the dynamic field of NLP, my research has focused on improving human-computer interaction through a better understanding of user requests in textual natural language. My PhD thesis, entitled "Paraphrase Generation for Conversational Services Learning", explores automated paraphrase generation techniques to improve training data for conversational AI, such as task-oriented chatbots. It presents ParapLine, a two-stage pipeline for generating diverse and relevant paraphrases by combining several generation methods and employing filtering mechanisms. This work also presents a taxonomy of errors identified in transformer-based paraphrase models and a manually annotated dataset for error analysis. In addition, it compares the effectiveness of large language models such as GPT-3.5 with crowdsourcing for the generation of syntactically diverse paraphrases, finding that LLMs offer a cost-effective alternative with superior syntactic novelty.
What I Do:
- Natural Language Understanding: Innovating scalable and cost-effective methods for acquiring high-quality training data for task-oriented dialogue services
- Paraphrase Generation: leveraging and developing techniques to reformulate user utterances into multiple variations while preserving their meaning, which is crucial for training robust dialogue services
- Data Quality and Diversity: Implementing and evaluating paraphrase generation pipelines to ensure that generated datasets are both semantically relevant and diverse (lexical and syntactic diversity)
Key Contributions throughout my research:
- Designed and evaluated a paraphrase generation pipeline. [Paper- | ]
- Developed a taxonomy of error types for transformer-based paraphrase generation models [ ]
- Dataset development and Model Development: Create a new manually annotated dataset for paraphrasing tasks [ ] and use it to develop a multilabel annotation model through fine-tuning a BERT language model [ ].
- Explored the potential of LLMs like GPT-3.5 Turbo for generating syntactically diverse paraphrases.
My research interests include:
- Dialogue Services: training data acquisition, intent and entity recognition
- Quality Control in Crowdsourcing and AI/LLM paraphrase generation
- AI/crowd-based Training Data Curation and Quality
- Prompt engineering and retrieval-augmented generation (RAG)
My Journey
My journey into computer science began in early childhood, with a fascination for the development of video games. I was intrigued by the way games like Crash Bandicoot on the PlayStation 1 reacted to my joystick movements, and how the CD allowed the console to display the game on screen. This initial curiosity turned into a deep passion for the field. Lively discussions with my father and his friends, in particular Prof. Ir. Isin Argun, particularly about science and mathematics, fueled my interest. Over the years, I have worked on a variety of projects, from my undergraduate work at the Lebanese University to my doctoral thesis, each of which has strengthened my expertise and understanding of this ever-evolving field.
Conference proceedings


Berro, A., Benatallah, B., Gaci, Y. & Benabdeslem, K. (2024). Error Types in Transformer-Based Paraphrasing Models: A Taxonomy, Paraphrase Annotation Model and Dataset. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases - ECML PKDD 2024, Vilnius, Lithuania. Cham: Springer Nature Switzerland, 2024. p. 332-349.

Dos Santos, V., Benatallah, B., Berro, A., & MacMahon, S. (2024). Diverse Utterances Generation with GPT to Improve Task-Oriented Chatbots and APIs Integration. The International Conference on Foundation and Large Language Models (FLLM2024) Dubai, UAE.

Berro, A., Yaghoub-Zadeh-Fard, M. A., Baez, M., Benatallah, B., & Benabdeslem, K. (2021). An Extensible and Reusable Pipeline for Automated Utterance Paraphrases. Proceedings of the VLDB Endowment, Volume 14(12): 2839-2842 (2021), Aug 2021.

Berro, A., Baez, M., Benatallah, B., Benabdeslem, K., & Yaghoub-Zadeh-Fard, M. A. (2021). Automated Paraphrase Generation with Over-generation and Pruning Services. Lecture Notes in Computer Science, Service-Oriented Computing 19th International Conference, ICSOC 2021, Virtual Event, November 22–25, 2021, Proceedings, 13121, pp.400-414.

Ramírez, J., Berro, A., Baez, M., Benatallah, B., & Casati, F. (2021). Crowdsourcing Diverse Paraphrases for Training Task-oriented Bots HCOMP 2021

Ramírez, J., Baez, M., Berro, A., Benatallah, B., & Casati, F. (2022). Crowdsourcing Syntactically Diverse Paraphrases with Diversity-Aware Prompts and Workflows. International Conference on Advanced Information Systems Engineering, Jun 2022, Leuven, Belgium. pp.253-269.

Bouguelia, S., Berro, A., Benatallah, B., Baez, M., Brabra, H., Zamanirad, S., & Kheddouci, H. (2022). Process-oriented intents: a cornerstone for superimposition of natural language conversations over composite services. The 20th International Conference on Service-Oriented Computing (ICSOC), Nov 2022, Seville, Spain. pp.575-583.
Unpublished / In process
- Conference paper - Berro, A., & Benatallah, B., CLONE: A Duplication evaluation metric for Paraphrases Generation Models
- Journal - Yaghoub-Zadeh-Fard, M. A., Berro, A., Benatallah, B., Ramirez, J., & Baez, M. Models and Crowdsourcing Prompts for the Automatic Detection of Malicious Worker Behavior in Crowdsourced Paraphrases.
Teaching
- Algorithms and data structures (INF-TC1): École Centrale de Lyon - Écully, France
- Object-oriented design and programming (INF-TC2): École Centrale de Lyon - Écully, France
- Practical IT project (INF-TC3): École Centrale de Lyon - Écully, France
- PHP ASPE BUT info: IUT de Lyon 1 - Villeurbanne, France
- C-language: IUT de Lyon 1 - Villeurbanne, France
Hobbies and other interests
- Enjoying nature walks, music, and drawing.
- Watching anime: Berserk, Naruto, One Piece, Dragon Ball Z, Bobobo-bo Bo-bobo, Great Teacher Onizuka, The Simpsons, The Regular Show, and Bleach.
- Cooking, Maté & Kishk aficionado.
- Reading: history, politics, mythologies, and novels.
Get in Touch
I believe in the power of knowledge sharing and collaboration. Whether you are interested in discussing potential projects, have questions about my research, or simply want to connect, feel free to reach out at audayberro@gmail.com.
Thank you for visiting my website. I look forward to connecting with you!