Trangle Heshvp's picture

Trangle Heshvp

Trangle

·

AI & ML interests

None yet

Recent Activity

liked a model 3 days ago

meituan/DeepSeek-R1-Block-INT8

liked a dataset 5 days ago

qihoo360/Light-R1-DPOData

liked a model 5 days ago

qihoo360/Light-R1-32B

View all activity

Organizations

Trangle's activity

New activity in microsoft/Phi-3-small-8k-instruct 10 months ago

Is it possible that this is a small model of GPT-3.5?

#6 opened 10 months ago by

Why a different architecture from mini and medium?

#5 opened 10 months ago by

New activity in 152334H/tinystories 11 months ago

Excuse, how to load or use this dataset? thanks

#1 opened 11 months ago by

New activity in databricks/dbrx-base 12 months ago

Please, authorize access for the base weight!

#5 opened 12 months ago by

New activity in brucethemoose/jondurbin_bagel-dpo-34b-v0.2-exl2-4bpw-fiction about 1 year ago

Is there documentation for quantization alignment in long text?

#4 opened about 1 year ago by

New activity in jondurbin/bagel-dpo-34b-v0.2 about 1 year ago

For the original 200k context, would it be better to do an ntk patchwith 4k?patch

#5 opened about 1 year ago by

New activity in alpindale/goliath-120b over 1 year ago

Crazy

#5 opened over 1 year ago by

New activity in internlm/internlm-chat-20b over 1 year ago

好奇现在大模型怎么不和通义千问对比一下，千问在理解能力和指令执行上效果比百川强

#2 opened over 1 year ago by

New activity in Qwen/Qwen-7B-Chat over 1 year ago

FlashAttention推理时还是需要关闭，目前开启输出是错乱的

#27 opened over 1 year ago by

New activity in TigerResearch/tigerbot-13b-chat-v1 over 1 year ago

基于llama2训练的模型，你们有一个bug并没有修复

#1 opened over 1 year ago by

基于llama2训练的模型，你们有一个bug并没有修复

#1 opened over 1 year ago by

基于llama2训练的模型，你们有一个bug并没有修复

#1 opened over 1 year ago by