Lab Members

Director

Ali Emami, Assistant Professor of Computer Science

MSc Students

  • Robert Morabito (2022-2023, Undergraduate; 2024 – Present, MSc)
  • Kaige Chen (Fall 2024 – Present)
  • Kazi Nishat Anwar (Fall 2024 – Present, Co-supervised with Dr. Nishat)
  • Nikta Gohari Sadr (Fall 2023 – Present)
  • Sarfaroz Yunusov (Fall 2023 – Present)

Undergraduate Researchers

  • Tyler Mcdonald (Summer 2023 – Present, NSERC Undergraduate Student Research Awardee)
  • Sangmitra Madhusudan (Summer 2024 - Present)
  • Anthony Colisimo (Summer 2024 - Present, Co-supervised with Dr. Li)
  • Skye Reid (Summer 2024)
  • QiQi Gao (Summer 2022 – Summer 2023)
  • Ghofrane Faidi (Summer 2024)
  • Angel Loredo (Summer 2024)
  • Harsh Lalai (Summer 2024)

Alumni

  • Abhishek Kumar, M.Sc. (Full Stack Data Systems Specialist at 360 Energy Inc., 2024 – Present)
June, 2024, Niagara Falls, Canada
June, 2024, Niagara Falls, Canada
August, 2024, ACL 2024, Bangkok
August, 2024, ACL 2024, Bangkok

Our Mission

The Brock NLP lab is is working on developing fair, robust, and reliable AI systems. Our research focuses on three key areas:

  1. Bias Detection and Mitigation in AI Models
  2. Reasoning and Benchmarking of AI Systems
  3. AI Interpretability and Reliability

Research Areas

1. Bias Detection and Mitigation in AI Models

We’re working on identifying and mitigating several forms of biases in AI models. We’re finding recently that the very recognition and classification of what is “toxic” or “biased” is quite tricky and culturally/temporally-bound. It is now time more than ever to collaborate with experts from beyond the field (e.g., Psychology, Anthropology, Philosophy) to tackle these problems!

A glimpse at our latest dataset, that pits language models against increasingly and more explicitly problematic content. We find that LLMs and humans tend to disagree with what is acceptable and what is not! [Read more about this study](/publication/morabito2024stopbenchmarkinglargelanguage/).
A glimpse at our latest dataset, that pits language models against increasingly and more explicitly problematic content. We find that LLMs and humans tend to disagree with what is acceptable and what is not! Read more about this study.
Debiasing shouldn't always lead to *good* outcomes if the bias specifications are negative. For example, given a debiasing method that is developed to reduce toxicity in LMs, if the definition of toxicity used by the debiasing method is *reversed*, would the debiasing results also be reversed? Apparently not always!! [Read more about this study](/publication/morabito-2023-debiasing/).
Debiasing shouldn’t always lead to good outcomes if the bias specifications are negative. For example, given a debiasing method that is developed to reduce toxicity in LMs, if the definition of toxicity used by the debiasing method is reversed, would the debiasing results also be reversed? Apparently not always!! Read more about this study.

Recent Publications:

  • Morabito, R., Madhusudan, S., McDonald, T., Emami, A. (2024) STOP! Benchmarking Large Language Models with Sensitivity Testing on Offensive Progressions. In Proceedings of EMNLP 2024
  • Kumar, A., Yunusov, S., Emami, A. (2024). Subtle Biases Need Subtler Measures: Dual Metrics for Evaluating Representative and Affinity Bias in Large Language Models. In Proceedings of ACL 2024.
  • Morabito, R., Kabbara, J., Emami, A. (2023). Debiasing should be Good and Bad: Measuring the Consistency of Debiasing Techniques in Language Models. In Findings of ACL 2023.

2. Reasoning and Benchmarking of AI Systems

We’re looking to test, harness, and push the boundaries of reasoning capabilities of AI systems. At the same time, we believe in this artificial “intelligence” as more of a means than an end. AI applied towards diversifying storytelling education, multilingual/multicultural representation, and precise language understanding are example of some of these means!

We find that LLMs are able to create personalized stories that reflect diverse identities, and we show that readers prefer them over generalized stories (whether human-written or LLM generated). Impressively, they were even better at conveying intended morals and were more engaging to readers -- this might just revolutionize how we create inclusive literature! [Read more about this study](/publication/yunusov2024mirrorstoriesreflectingdiversitypersonalized).
We find that LLMs are able to create personalized stories that reflect diverse identities, and we show that readers prefer them over generalized stories (whether human-written or LLM generated). Impressively, they were even better at conveying intended morals and were more engaging to readers – this might just revolutionize how we create inclusive literature! Read more about this study.
We developed a platform that allows users to challenge language models of their choice, *on-the-fly*, with difficult alterations of Winograd Schema Challenge problems -- they get to see immediately if their created instance stumped the model, and the new instances are sent to a batch that the models iteratively fine-tune on in order to become more robust to adverserial perturbations. It's human-in-the-loop, but also presented to the general public! Interface of EvoGrad at [https://www.evograd.com/](https://evograd.com). [Read more about this study](/publication/sun-2024-evo/).
We developed a platform that allows users to challenge language models of their choice, on-the-fly, with difficult alterations of Winograd Schema Challenge problems – they get to see immediately if their created instance stumped the model, and the new instances are sent to a batch that the models iteratively fine-tune on in order to become more robust to adverserial perturbations. It’s human-in-the-loop, but also presented to the general public! Interface of EvoGrad at https://www.evograd.com/. Read more about this study.
We developed a powerful prompting technique we coin *Tree-of-Experts* that outperforms recent techniques in having LLMs create multi-constrained challenging Winograd Schemas instances. *Paradoxially*, we also found that LLMs struggle to solve their own problems, despite providing the exact key to solution while they were constructing the sentences! We coin this the *Generation-Evaluation Inconsistency*.  [Read more about this study](/publication/zahraei-2024-wsc/).
We developed a powerful prompting technique we coin Tree-of-Experts that outperforms recent techniques in having LLMs create multi-constrained challenging Winograd Schemas instances. Paradoxially, we also found that LLMs struggle to solve their own problems, despite providing the exact key to solution while they were constructing the sentences! We coin this the Generation-Evaluation Inconsistency. Read more about this study.

Recent Publications:

  • Yunusov, S., Sidat, H., Emami, A. (2024) MirrorStories: Reflecting Diversity through Personalized Narrative Generation with Large Language Models. In Proceedings of EMNLP 2024
  • Sun, J.H., & Emami, A. (2024). EvoGrad: A Dynamic Take on the Winograd Schema Challenge with Human Adversaries. In Proceedings of COLING-LREC 2024.
  • Zahraei, P.S., & Emami, A. (2024). WSC+: Enhancing The Winograd Schema Challenge Using Tree-of-Experts. In Proceedings of EACL 2024.

3. AI Interpretability and Reliability

We’re working on probing the inner workings of AI models (frustratingly blackbox as they are!), focusing on understanding their decision-making processes, biases, and limitations to enhance their reliability, interpretability, and overall performance.

In this spirit of the Winograd Schema Challenge, ever wonder if it applies multi-modally? That is, when an image generation model, like Stable Diffusion, is generating an image with antecedents and a pronoun, which antecedent does it attend to at the pronoun word? Diffusion Attentive Attribution Maps ([DAAM](https://github.com/castorini/daam)), a recent work, makes that analysis possible, and so we applied it on a new task we created, called *WinoVis*. The results were super interesting! [Read more about this study](/publication/park-2024-winovis/).
In this spirit of the Winograd Schema Challenge, ever wonder if it applies multi-modally? That is, when an image generation model, like Stable Diffusion, is generating an image with antecedents and a pronoun, which antecedent does it attend to at the pronoun word? Diffusion Attentive Attribution Maps (DAAM), a recent work, makes that analysis possible, and so we applied it on a new task we created, called WinoVis. The results were super interesting! Read more about this study.
If a model says it's certain, does it mean it actually is? We actually have access to *two* sources of certainty in a model, its *expressed* certainty, which it can provide when asked, say, on a scale of 1-10 how certain it is, and its *internal* certainty, which can be acessed in many recent models via log probabiities of its tokens. When normalized and systematically tested, we can learn a ton about which models are most reliable/'honest' and when! [Read more about this study](/publication/kumar-2024-confidence/).
If a model says it’s certain, does it mean it actually is? We actually have access to two sources of certainty in a model, its expressed certainty, which it can provide when asked, say, on a scale of 1-10 how certain it is, and its internal certainty, which can be acessed in many recent models via log probabiities of its tokens. When normalized and systematically tested, we can learn a ton about which models are most reliable/‘honest’ and when! Read more about this study.

Recent Publication:

  • Park, B., Janecek, M., Li, Y., Ezzati-Jivan, N., Emami, A. (2024). Picturing Ambiguity: A Visual Twist on the Winograd Schema Challenge. In Proceedings of ACL 2024.

  • Kumar, A., Morabito, R., Umbet, S., Kabbara, J., Emami, A. (2024). Confidence Under the Hood: An Investigation into the Confidence-Probability Alignment in Large Language Models. In Proceedings of ACL 2024.

Our our work is regularly presented at conferences such as ACL, EMNLP, NAACL, EACL, COLING-LREC, ICML, and NeurIPS.

Research Focus Areas

A fun word cloud generated from all of our research works!

Map of Student Origins

Join Us

We are recruiting new graduate students for Fall, 2024

Undergraduates: Please don’t hesitate to email me to inquire about research projects that I (or better, yet, you) may have in mind. Please also attach your transcript as well as a brief description of which areas of my research interests (e.g., natural language processing) you would like to work on and why. I highly encourage, and prefer, students that are planning on a summer internship (under the NSERC USRA or SURA program), or are planning to do an Honour’s thesis.

Graduates: M.Sc. (Computer Science) and PhD (Intelligent Systems and Data Science) admissions are handled centrally in our department. Please see this page for application instructions.