This was a project I completed as part of a voluntary research assistant role I assumed during my time as an Ops Clerk in the Singapore Army. I worked with Walled AI Labs, a non-profit initiative within the larger Walled AI started by my close friend, Dr Rishabh Bhardwaj.

I spent a small block period from Jun to Aug 2024 developing modelsdatasets and a giant standalone library named walledeval. These systems were built to evaluate the safety of large language models (LLMs) like GPT-4 and Claude. The library was designed to be user-friendly and accessible to researchers and practitioners in the field of AI safety.

Under Rishabh’s tutelage, I gained invaluable experience in the field of AI safety and research methodologies. This project not only honed my technical skills but also deepened my understanding of the ethical implications and safety concerns associated with large language models.

We recently published a paper at EMNLP 2024, under the System Demonstrations Track! Check it out here. The paper details the library and its capabilities, and how it can be used to evaluate the safety of LLMs.