OVERVIEW
This research evaluates the effectiveness of Large Language Models (LLMs) as autonomous agents within LLM-Powered Applications (LPAs), focusing on their task completion abilities and security. The study identifies key strengths, limitations, and vulnerabilities, with initial findings aimed at improving LLM-based systems for high-stakes applications. A paper is underway and to soon be submited to the Association for Computing Machinery (ACM).
MY CONTRIBUTIONS
I am responsible for designing the tasks to be tested, selecting the appropriate frameworks and tools for the research, and deploying and testing the system. Currently, I am documenting our findings in a paper to submit to the Association for Computing Machinery (ACM).
Additionally, I developed the rubric and assessment methods to ensure a thorough evaluation process aligned with the research objectives. I also created the research poster and will be presenting our findings at upcoming conferences.