- 關(guān)于我們
- 針對(duì)假冒留學(xué)監(jiān)理網(wǎng)的聲明
- 留學(xué)熱線:4000-315-285
留學(xué)中介口碑查詢
開始日期:
2023年7月22日
專業(yè)方向:
計(jì)算機(jī)與人工智能,理工
導(dǎo)師:
Osman(卡內(nèi)基梅隆大學(xué) (CMU) 終身正教授)
課程周期:
2周在線科研+2周線下面授
語(yǔ)言:
英文
建議學(xué)生年級(jí):
大學(xué)生
項(xiàng)目產(chǎn)出:
2周在線科研+2周深入面授科研與實(shí)驗(yàn)室Workshop 與諾貝爾獎(jiǎng)得主交流機(jī)會(huì) 項(xiàng)目報(bào)告 優(yōu)秀學(xué)員獲主導(dǎo)師Reference Letter EI/CPCI/Scopus/ProQuest/Crossref/EBSCO或同等級(jí)別索引國(guó)際會(huì)議全文投遞與發(fā)表指導(dǎo)(共同一作或獨(dú)立一作可選) 結(jié)業(yè)證書 成績(jī)單
項(xiàng)目介紹:
“多臂強(qiáng)盜”問(wèn)題是概率論中的一個(gè)經(jīng)典問(wèn)題,亦是深度強(qiáng)化學(xué)習(xí)中的重要模塊。人們針對(duì)解決此類不確定性序列決策問(wèn)題,提出了“多臂強(qiáng)盜”算法框架(Multi-Armed Bandits,簡(jiǎn)稱MAB,中文又譯作“多臂老虎機(jī)”)。近年來(lái)這一算法框架因優(yōu)異的性能和較少的反饋學(xué)習(xí)等優(yōu)點(diǎn),在推薦系統(tǒng)、信息檢索到醫(yī)療保健和金融投資等諸多應(yīng)用領(lǐng)域中受到了廣泛關(guān)注。本課題正是以此框架為核心內(nèi)容,學(xué)生將在參與的過(guò)程中深入了解算法的基礎(chǔ)模型及應(yīng)用,將認(rèn)識(shí)到被廣泛使用的上置信界算法(Upper Confidence Bound,簡(jiǎn)稱UCB)及湯普森采樣算法(Thompson Sampling Algorithms)。導(dǎo)師還將講授自身在該領(lǐng)域的最新研究成果。 This is an introductory course on multi-armed bandits, which provides a sequential decision-making framework under uncertainty and has broad applications in recommendation systems, dynamic pricing, clinical trials, financial investments, etc. We will cover the classical multi-armed bandit model and its applications, several widely used algorithms proposed for its solution including the Explore-Then-Commit (ETC), Upper Confidence Bound (UCB) and Thompson Sampling (TS) Algorithms, performance analysis of these algorithms, and conclude the lectures with the recent work of the instructor on correlated and structured bandits.