Skip to content
View zigzagcai's full-sized avatar
🏝️
Happy coding, happy life!
🏝️
Happy coding, happy life!
  • Shanghai, China
  • 07:28 (UTC +08:00)

Block or report zigzagcai

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 250 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
zigzagcai/README.md

Hi there 😄

I am Zheng Cai, nickname zigzagcai, an AI Infra Engineer and Lifelong Learner.

I have general interest in (M)LLM pre/post-train and love to share my thoughts via blogs on zhihu: 由A800平台训练InternLM-7B无法收敛引发的思考, 支持变长序列的Mamba-1训练.

🥑 For now, I have personal interest in Agentic RL and Inference-Time Scaling, and believe it will bring new paradiam shift.

🍓 For AI, I believe that more is different and intelligence emerges from complexity, and like the ideas behind The Bitter Lesson.

🍒 For Infra, I love to build practical distributed systems that orchestrate computation/communication/caching to scale up and scale out better, and believe in the ideas behind The Hardware Lottery.

So, what I try to do is to build a bridge between various accelerators and large models, with the hope of achieving efficient system-model co-design in the new AI paradiam (Self-Evolving Agentic AI Systems).

Pinned Loading

  1. DeepSeekV3 DeepSeekV3 Public

    Simple and efficient implementation of 671B DeepSeek V3 that trainable with FSDP+EP, targeted for HuggingFace ecosystem

    Python 6 1

  2. codelion/openevolve codelion/openevolve Public

    Open-source implementation of AlphaEvolve

    Python 4k 579

  3. varlen_mamba varlen_mamba Public

    Forked from state-spaces/mamba

    Mamba SSM architecture that supports training on variable-length sequences

    Python 12 1

  4. state-spaces/mamba state-spaces/mamba Public

    Mamba SSM architecture

    Python 16k 1.5k

  5. InternLM/InternEvo InternLM/InternEvo Public

    InternEvo is an open-sourced lightweight training framework aims to support model pre-training without the need for extensive dependencies.

    Python 408 69

  6. hpcaitech/ColossalAI hpcaitech/ColossalAI Public

    Making large AI models cheaper, faster and more accessible

    Python 41.2k 4.5k