Large language systems (LLMs) have achieved remarkable success in various natural language processing tasks. Scientific text summarization is a particularly difficult task due to the jargony nature of scientific literature. Evaluating LLMs on this specific task requires carefully designed benchmarks and metrics. Several investigations have analyz