Model Card for Llama-3.3-Argunaut-1-70B-SPIN

This model is a fine-tuned version of DebateLabKIT/Llama-3.3-Argunaut-1-70B-SFT. It has been trained using TRL and vLLM.

It is released as part of the

Argunauts Project

Quick start

from transformers import pipeline

question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
generator = pipeline("text-generation", model="DebateLabKIT/Llama-3.3-Argunaut-1-70B-SPIN", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])

Training procedure

This model was trained with Self-Play Fine-Tuning (SPIN), a method introduced in Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models.

A general description of the training procedure can be found in this blog post.

For SPIN training we use

Hugging Faces's TRL
Microsoft's DeepSpeed
Cognitive Computation's Spectrum
LinkedIn's Liger Kernels

Training parameters and set-up, same for each epoch:

Parameter	Value
dpo loss	sigmoid
learning rate	2e-7
lr scheduler	linear
effective batch size	64
deepspeed ZeRO	stage 3
deepspeed CPU offload	optimizer & parameters
spectrum top % snr ratios targeted	30%

Training one epoch takes roughly 4 hours on 4 H100.

Framework versions

TRL: 0.14.0
Transformers: 4.46.3
Pytorch: 2.4.1
Datasets: 3.1.0
Tokenizers: 0.20.3

Chat Example




👤 You
Hi, are you familiar with Argdown?
🤖 Assistant
Yes, I am familiar with Argdown. It is a markup language specifically designed for representing and analyzing argumentative texts in a structured format. Argdown allows users to create detailed, formalized reconstructions of arguments by breaking them down into premises and conclusions. This makes it easier to understand the logical flow of an argument, identify any gaps or flaws in reasoning, and facilitate discussions or debates about the topic.
In addition to its markup language features, Argdown is often integrated with tools like argument mapping software, which can visually represent the structure of arguments. These visualizations are helpful for spotting relationships between different parts of an argument and can be useful both for teaching critical thinking skills and for analyzing complex arguments in various fields such as law, philosophy, and social sciences.
Are you working on a project that involves using Argdown? I'd be happy to provide more information or help with specific questions!
👤 You
Can you please map the following text?
School uniforms deter crime and increase student safety.
In Long Beach, California, after two years of a district-wide K-8 mandatory uniform policy, reports of assault and battery in the district’s schools decreased by 34%, assault with a deadly weapon dropped by 50%, fighting incidents went down by 51%, sex offenses were cut by 74%, robbery dropped by 65%, possession of weapons (or weapon “look-alikes”) decreased by 52%, possession of drugs went down by 69%, and vandalism was lowered by 18%. [64]
One year after Sparks Middle School in Nevada instituted a uniform policy, school police data showed a 63% drop in police log reports, and decreases were also noted in gang activity, student fights, graffiti, property damage, and battery. A peer-reviewed study found that schools with uniform policies had 12% fewer firearm-related incidents and 15% fewer drug-related incidents than schools without uniforms. [25][69]
School uniforms also prevent students from concealing weapons under baggy clothing, make it easier to keep track of students on field trips, and make intruders on campus more visible. Frank Quatrone, superintendent in the Lodi school district of New Jersey, states, “When you have students dressed alike, you make them safer. If someone were to come into a building, the intruder could easily be recognized.” [6][38]
Further, school uniforms create a level playing field among students, reducing peer pressure and bullying. When all students are dressed alike, competition between students over clothing choices and the teasing of those who are dressed in less expensive or less fashionable outfits can be eliminated. Research by the Schoolwear Association found that 83% of teachers thought “a good school uniform…could prevent bullying based on appearance or economic background.” Arminta Jacobson, founder and director of the Center for Parent Education at the University of North Texas, states that uniforms put “all kids on the same playing field in terms of their appearance. I think it probably gives them a sense of belonging and a feeling of being socially accepted.” [5][91]
🤖 Assistant
The argument presented can be broken down into several key points and mapped as follows:
School uniforms deter crime and increase student safety.
  <+ The implementation of school uniform policies has been shown to significantly decrease incidents of violence and crime in schools, including Long Beach, California, and Sparks Middle School in Nevada.
    <- There may not be a direct correlation between the adoption of school uniform policies and decreases in violence and crime, as other factors could also have contributed to these reductions.
      <+ Changes in local law enforcement strategies or increased police presence might have coincided with the implementation of uniform policies and could independently account for some of the reported declines.
    <- The decrease in violent incidents may be attributed to a broader trend rather than being directly caused by school uniform policies.
  <+ By eliminating the opportunity for students to wear baggy clothing that can conceal weapons, school uniforms increase student safety.
    <+ School uniforms help keep track of students on field trips and make intruders more visible on campus.
      <- The ability to recognize an intruder based on their attire assumes a static environment where no one changes clothes, which is not practical in real scenarios.
  <+ By making all students dress alike, school uniforms eliminate competition among students over clothing choices, thereby reducing peer pressure and bullying.
    <+ Reducing social conflicts helps create an environment that encourages academic focus and productivity.
      <- Some level of competition can foster healthy rivalry and motivate students to work harder academically, which could be beneficial in the right context.
  <+ School uniforms promote a sense of equality among students from different economic backgrounds, reducing discrimination based on clothing choices.
    <- Uniforms do not eliminate social conflicts that arise from non-visual differences such as personality traits or cultural background.
      <- The ability to recognize an intruder based on their attire assumes a static environment where no one changes clothes, which is not practical in real scenarios.
  <+ By creating a level playing field among students, school uniforms promote a sense of belonging and social acceptance, reducing bullying based on appearance or economic background.
    <+ Schools with uniform policies have fewer incidents related to firearms and drugs compared to those without such policies.
      <- The comparison between schools may be flawed due to differences in socio-economic status, geographic location, and existing safety measures.


🛑 Poor answer
Model added an objection early on which is not in the text, got confused, and started to repeat stuff. 

👤 You
Please summarize, in your own words, the main argument presented in the text. No more than 2-3 sentemces, please.
🤖 Assistant
School uniforms significantly contribute to reducing violence, crime, and bullying in schools by eliminating competition over clothing choices and creating a sense of equality among students from different economic backgrounds. Additionally, by making all students dress alike, school uniforms prevent the concealment of weapons under baggy clothing, make it easier to keep track of students on field trips, and increase visibility for intruders on campus. This leads to an overall safer learning environment.
👤 You
Let's focus on this argument. Just for the protocol, what's the conclusion?
🤖 Assistant
School uniforms deter crime and increase student safety.
👤 You
Can you reconstruct this argument as premise conclusion structure, using argdown syntax?
🤖 Assistant  
(1) The implementation of school uniform policies has been shown to significantly decrease incidents of violence and crime in schools, including Long Beach, California, and Sparks Middle School in Nevada.
-- with FOL from (1) --
(2) Schools that implement uniform policies tend to have fewer violent incidents.
(3) By making all students dress alike, school uniforms eliminate competition among students over clothing choices.
(4) Competition among students over clothing choices is a source of peer pressure and bullying.
(5) Reducing peer pressure and bullying contributes to increased student safety.
-- with FOL from (2) (3) (4) (5) --
(6) School uniforms can help reduce violence and crime in schools by eliminating competition over clothing choices.
(7) By eliminating the opportunity for students to wear baggy clothing that can conceal weapons, school uniforms increase student safety.
(8) School uniforms help keep track of students on field trips and make intruders more visible on campus.
-- with FOL from (7) (8) --
(9) By making all students dress alike, school uniforms prevent the concealment of weapons under baggy clothing, make it easier to keep track of students on field trips, and increase visibility for intruders on campus.
-- with FOL from (6) (9) --
(10) School uniforms deter crime and increase student safety.
8.04 tok/sec

👤 You
That's way too comlicated. Just start with 2 or 3 premises.
🤖 Assistant  
(1) The implementation of school uniform policies has been shown to significantly decrease incidents of violence and crime in schools, including Long Beach, California, and Sparks Middle School in Nevada.
-- with FOL from (1) --
(2) Schools that implement uniform policies tend to have fewer violent incidents.
(3) By making all students dress alike, school uniforms eliminate competition among students over clothing choices.
(4) Competition among students over clothing choices is a source of peer pressure and bullying.
(5) Reducing peer pressure and bullying contributes to increased student safety.
-- with FOL from (2) (3) (4) (5) --
(6) School uniforms can help reduce violence and crime in schools by eliminating competition over clothing choices.
(7) By eliminating the opportunity for students to wear baggy clothing that can conceal weapons, school uniforms increase student safety.
(8) School uniforms help keep track of students on field trips and make intruders more visible on campus.
-- with FOL from (7) (8) --
(9) By making all students dress alike, school uniforms prevent the concealment of weapons under baggy clothing, make it easier to keep track of students on field trips, and increase visibility for intruders on campus.
-- with FOL from (6) (9) --
(10) School uniforms deter crime and increase student safety.
7.57 tok/sec


🛑 Poor answer
Just repeating the previous analysis, no simplification. 

👤 You
No. Please simplify the argument radically, and render the gist in your own words.
🤖 Assistant
The argument's gist is:
<Uniforms reduce bullying>: By making all students dress alike, school uniforms eliminate competition among students over clothing choices.
    +> School uniforms deter crime and increase student safety.

👤 You
Render this as a premise-conclusion structure in argdown, please
🤖 Assistant
The argument can be reconstructed as follows:
(1) By making all students dress alike, school uniforms eliminate competition among students over clothing choices.
(2) Competition among students over clothing choices is a source of peer pressure and bullying.
(3) Reducing peer pressure and bullying contributes to increased student safety.
-----
(4) School uniforms deter crime and increase student safety.

👤 You
To formalize this argument, would you rather use propositional logic (with propositional constants p, q, r etc.) or predicate logic (with predicate constants F,G,... etc. )
🤖 Assistant
This is a good question. I will first try to provide an answer using argdown syntax.
👤 You
Ok, so please try to formalize the argument above.
🤖 Assistant
The premises can be formalized as follows:
Premise (1): p -> q
Premise (2): r & s
Premise (3): t -> not u  
👤 You
What do the propositional constants p,q, etc. stand for. Please provide the keys.
🤖 Assistant  
[["p", "school uniforms are worn by all students"], ["q", "competition among students over clothing choices is eliminated"], ["r", "students are competing with each other about their clothing choices"], ["s", "peer pressure and bullying are a problem in schools"], ["t", "peer pressure and bullying are reduced"], ["u", "student safety is not increased"]]

👤 You
And how would you formalize the conclusion?
🤖 Assistant
The argument's conclusion can be formalized as: w
👤 You
Can you reconsider this and try to use the previously introduced placeholders?
🤖 Assistant
Formalization of conclusion (4): p -> not u
👤 You
OK, please ointegrate your formalization as inline data into the argdown argument reconstruction.
🤖 Assistant  
(1) If school uniforms are worn by all students, then competition among students over clothing choices is eliminated. {"formalization": "p -> q", "keys": {"p": "school uniforms are worn by all students", "q": "competition among students over clothing choices is eliminated"}}
(2) Students are competing with each other about their clothing choices and peer pressure and bullying are a problem in schools. {"formalization": "r & s", "keys": {"r": "students are competing with each other about their clothing choices", "s": "peer pressure and bullying are a problem in schools"}}
(3) If peer pressure and bullying are reduced, then student safety is not increased. {"formalization": "t -> not u", "keys": {"t": "peer pressure and bullying are reduced", "u": "student safety is not increased"}}
-----
(4) If school uniforms are worn by all students, then student safety is not increased. {"formalization": "p -> not u", "keys": {"p": "school uniforms are worn by all students", "t": "peer pressure and bullying are reduced", "u": "student safety is not increased"}, "follows_from": [1, 2, 3]}
7.57 tok/sec

👤 You
Wonderful. Now, please write a Z3 program that checks whether the inference is valid.
🤖 Assistant  
(declare-sort Universal)
(declare-fun p () Bool) ;; school uniforms are worn by all students
(declare-fun q () Bool) ;; competition among students over clothing choices is eliminated
(declare-fun r () Bool) ;; students are competing with each other about their clothing choices
(declare-fun s () Bool) ;; peer pressure and bullying are a problem in schools
(declare-fun t () Bool) ;; peer pressure and bullying are reduced
(declare-fun u () Bool) ;; student safety is not increased
(define-fun premise1 () Bool (=> p q))
(define-fun premise2 () Bool (and r s))
(define-fun premise3 () Bool (=> t (not u)))
(define-fun conclusion4 () Bool (=> p (not u)))
(define-fun argument () Bool (=> (and premise1 premise2 premise3) conclusion4))
(assert (not argument))
(check-sat)

👤 You
Thanks so much. I just pasted that into the Z3 playground, accordinhg to which the argument is NOT valid. Can you explain why, and suggest how to improve your reconstruction?
🤖 Assistant
The argument's conclusion does not logically follow from its premises.
To fix this, we should revisit our initial analysis of the text.
👤 You
But before going back to the text, can you suggest a modification of the argument (we've just checked) and which I repeat here for clarity:
(1) If school uniforms are worn by all students, then competition among students over clothing choices is eliminated. {"formalization": "p -> q", "keys": {"p": "school uniforms are worn by all students", "q": "competition among students over clothing choices is eliminated"}}
(2) Students are competing with each other about their clothing choices and peer pressure and bullying are a problem in schools. {"formalization": "r & s", "keys": {"r": "students are competing with each other about their clothing choices", "s": "peer pressure and bullying are a problem in schools"}}
(3) If peer pressure and bullying are reduced, then student safety is not increased. {"formalization": "t -> not u", "keys": {"t": "peer pressure and bullying are reduced", "u": "student safety is not increased"}}
-----
(4) If school uniforms are worn by all students, then student safety is not increased. {"formalization": "p -> not u", "keys": {"p": "school uniforms are worn by all students", "t": "peer pressure and bullying are reduced", "u": "student safety is not increased"}, "follows_from": [1, 2, 3]}

What could we change here to render the argument valid? Just point out potential modifications.
🤖 Assistant
To make this an inferentially valid argument, I would suggest revising it as follows:
Replace (4) with: not p.
Here's the revised argument:
(1) If school uniforms are worn by all students, then competition among students over clothing choices is eliminated. {"formalization": "p -> q", "keys": {}}
(2) Students are competing with each other about their clothing choices and peer pressure and bullying are a problem in schools. {"formalization": "r & s", "keys": {}}
(3) If peer pressure and bullying are reduced, then student safety is not increased. {"formalization": "t -> not u", "keys": {}}
-----
(4) School uniforms are not worn by all students. {"formalization": "not p", "keys": {}, "follows_from": [1, 2, 3]}




	
		
	
	
		Citations
	

Cite SPIN as:
@misc{chen2024selfplayfinetuningconvertsweak,
      title={Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models}, 
      author={Zixiang Chen and Yihe Deng and Huizhuo Yuan and Kaixuan Ji and Quanquan Gu},
      year={2024},
      eprint={2401.01335},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2401.01335}, 
}

Cite TRL as:
@misc{vonwerra2022trl,
    title        = {{TRL: Transformer Reinforcement Learning}},
    author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
    year         = 2020,
    journal      = {GitHub repository},
    publisher    = {GitHub},
    howpublished = {\url{https://github.com/huggingface/trl}}
}

DebateLabKIT
/

Llama-3.3-Argunaut-1-70B-SPIN

Model Card for Llama-3.3-Argunaut-1-70B-SPIN

Quick start

Training procedure

Framework versions

Chat Example

Citations

Model tree for DebateLabKIT/Llama-3.3-Argunaut-1-70B-SPIN

Datasets used to train DebateLabKIT/Llama-3.3-Argunaut-1-70B-SPIN

Space using DebateLabKIT/Llama-3.3-Argunaut-1-70B-SPIN 1