Haize Labs, a leader in AI safety and alignment, has partnered with AI21 Labs to align the Jamba large language model (LLM) with the ethical and operational needs of businesses. This collaboration highlights Haize Labs’ innovative automated red-teaming platform, which was instrumental in ensuring Jamba adheres to rigorous safety, transparency, and reliability standards.

Haize Labs and AI21 Labs partnered to enable trust and safety for enterprise LLM applications
Figure 1: Haize Labs and AI21 Labs partnered to enable trust and safety for enterprise LLM applications.

You can find the full details in our report here.

A Business AI Code of Conduct: Haize Labs’ Role in Defining Ethical AI

At the core of this initiative is the development of a Business AI Code of Conduct, a framework that defines how LLMs should operate in professional settings. Haize Labs worked closely with AI21 to translate the principles of traditional employee codes of conduct into actionable tenets for AI systems. These tenets formed the foundation for training and aligning Jamba to meet the complex ethical demands of businesses, guided by rigorous red-teaming from Haize Labs.

Key contributions by Haize Labs include:

  • Automated Adversarial Testing: Haize Labs generated dozens of thousands of adversarial prompts designed to challenge the model’s adherence to the code of conduct, identifying vulnerabilities and improving alignment.
  • AI and Human Feedback Loops: Leveraging its expertise, Haize Labs designed a system that integrates automated feedback with human review to continuously refine model behavior.
  • Transparent Frameworks: By providing datasets and evaluations for public review, AI21 and Haize Labs ensure accountability and encourage collaborative improvements.
Benign Harmful
Figure 2: Examples of automated feedback results on benign (Top) and harmful (Bottom) LLM responses. Here, a higher score indicates a worse (more harmful) response.

Advanced Training and Alignment Powered by Haize Labs

Haize Labs’ red-teaming platform revolutionizes how LLMs are trained and tested. For Jamba, the process included:

  1. Comprehensive Adversarial Testing: Haize Labs deployed search and optimization algorithms to uncover harmful inputs that might bypass Jamba’s refusal mechanisms. These prompts were scored by AI judges calibrated to AI21’s standards.
  2. Iterative Refinement: Responses that violated the code of conduct were incorporated into adversarial training, creating a feedback loop to improve future performance.

This approach allows businesses to trust that the models will consistently reject harmful or unethical requests, aligning with their values and expectations.

Aligning with Global Standards: Haize Labs’ Commitment

Haize Labs ensured that Jamba’s alignment process adheres to the OECD AI Principles, focusing on:

  • Inclusive Growth and Well-being
  • Human-centered Values and Fairness
  • Transparency and Explainability
  • Robustness, Security, and Safety
  • Accountability

Through 60 specific tenets mapped to these principles, Haize Labs enforced clear behavioral expectations for Jamba, ensuring it rejects harmful, divisive, or unethical content while upholding human rights and democratic values.

Results Driven by Haize Labs’ Expertise

Ultimately, Haize Labs’ contributions yielded measurable success:

  • Improved Alignment Scores: By designing model interventions robust to Haize Labs’ adversarial prompts, Jamba achieved mean safety scores significantly closer to ideal thresholds for ethical behavior.
  • Customizable System Instructions: Haize Labs aided AI21 in tailoring their model’s behavior, making it easy to align Jamba with company-specific values.

Pioneering the Future of Ethical AI

Haize Labs and AI21 view the Business AI Code of Conduct as a living framework, designed to evolve alongside societal and business needs. By placing a premium on transparency, collaboration, and adaptability, we empowers organizations to build AI systems that are safe, ethical, and highly effective.

For businesses seeking to deploy AI responsibly, Haize Labs demonstrates how state-of-the-art automated red-teaming can transform LLMs into trusted partners in the enterprise workplace. This partnership with AI21 exemplifies how testing-oriented AI development can set new standards for professionalism and safety.