【ChatGPT】AIチャット総合 Part6【Bing・Bard】

【ChatGPT】AIチャット総合 Part6【Bing・Bard】at ESITE

【ChatGPT】AIチャット総合 Part6【Bing・Bard】 - 暇つぶし2ch890:名無しさん＠お腹いっぱい。
23/06/21 13:59:06.23 7ZDtogiM.net
@ImAI_Eruel: GPT-4に関する特大のリーク情報
MetaのPyTorchの開発リーダーも認めている&似た話が一部で共有されていた,ということで結構確度は高いです
曰く，
"GPT-4は220Bパラメータのモデル８つ（2200億x8 = 1.76兆パラメータ）からなる混合モデルで,各モデルは別のデータ/タスクで学習している"
とのこと

以下はいくつかの補足情報です．
・出力時には16回の推論を行なっている&いくつかの細かいトリックがあるという話については詳細不明
・現在は当時よりは小型化したモデルを使っている可能性が高い
・1.2兆パラメータ説もあり,モデルのどこかでパラメータを共有していると考えられる
・モデルサイズを考えると,運用コストが凄まじいことになるため，以前アルトマンが言っていた「GPT不足のため機能解放が十分にできていない」という話についても頷ける

@soumithchintala: i might have heard the same 😃 -- I guess info like this is passed around but no one wants to say it out loud.
GPT-4: 8 x 220B experts trained with different data/task distributions and 16-iter inference.
Glad that Geohot said it out loud.

Though, at this point, GPT-4 is probably distilled to be more efficient.

@pommedeterre33: Unexpected description of GPT4 architecture from geohotz in a recent interview he gave. At least it’s plausible.
URLﾘﾝｸ(pbs.twimg.com)

次ページ