贝叶斯优化是一种用于对黑盒函数进行全局优化的序贯设计策略[1][2][3],它不假定任何函数形式。该方法通常用于优化代价高昂、难以评估的函数。随着21世纪人工智能的兴起,贝叶斯优化算法在机器学习问题中获得了显著应用,常用于超参数优化(hyperparameter values)。[4][5]
历史
编辑术语通常归功于Jonas Mockus,并在其1970年代和1980年代关于全局优化的一系列出版物中提出。[6][7][1]
早期数学基础
编辑1960年代到1980年代
编辑贝叶斯优化的最早想法[8] sprang in 1964, from a paper by American applied mathematician Harold J. Kushner,[9] 可追溯到1964年,由美国应用数学家Harold J. Kushner在一篇题为“A New Method of Locating the Maximum Point of an Arbitrary Multipeak Curve in the Presence of Noise” (页面存档备份,存于互联网档案馆)的论文中提出。虽然该论文并未直接提出贝叶斯优化,但他首次提出了一种在噪声环境下定位任意多峰曲线最大点的新方法,该方法为后续的贝叶斯优化提供了重要的理论基础。
到1980年代,我们现在使用的贝叶斯优化框架已被明确建立。1978年,立陶宛科学家Jonas Mockus[10]在其论文“The Application of Bayesian Methods for Seeking the Extremum”中讨论了如何在各种不确定条件下使用贝叶斯方法寻找函数的极值。在该论文中,Mockus首次提出了期望改进(Expected Improvement principle (页面存档备份,存于互联网档案馆),简称EI)原则,EI是贝叶斯优化的核心采样策略之一。该准则通过最大化期望改进在探索与利用之间取得平衡,从而高效地优化函数。由于该原则的实用性和深远影响,Jonas Mockus被广泛认为是贝叶斯优化的奠基人。尽管期望改进是最早提出的核心采样策略之一,但并非唯一;随着现代发展,还出现了改进概率(Probability of Improvement,PI)、上置信界(Upper Confidence Bound,UCB)[11]等策略。
从理论到实践
编辑在1990年代,贝叶斯优化开始逐步从纯理论向实际应用过渡。1998年,Donald R. Jones及其合作者[12]发表了一篇题为“Efficient Global Optimization of Expensive Black-Box Functions”的论文[13]。在该论文中,他们提出了高斯过程(Gaussian Process,GP)并详细阐述了Jonas Mockus在1978年提出的期望改进准则。通过Donald R. Jones及其同事的工作,贝叶斯优化开始在计算机科学和工程等领域崭露头角。然而,由于当时计算能力的限制,贝叶斯优化在计算复杂度方面仍受到较大影响。
进入21世纪,随着人工智能和仿生机器人技术的兴起,贝叶斯优化被广泛用于机器学习和深度学习,成为超参数调优的重要工具。[14]如Google、Facebook和OpenAI等公司已将贝叶斯优化纳入其深度学习框架以提高搜索效率。然而,贝叶斯优化仍面临诸多挑战,例如当使用高斯过程[15]作为代理模型时,在数据量大的情形下高斯过程的训练会非常缓慢且计算代价很高,这使得该优化方法在更复杂的药物开发与医学实验中难以良好适用。
策略
编辑贝叶斯优化用于如下形式的问题: ,其中 为所有可能参数 的集合,通常为最佳使用场景设定維度不超过20( ),且其成员资格可以容易地被评估。对于计算代价高昂导致评估困难的函数,贝叶斯优化尤为有利。目标函数 是连续的且具有未知结构,称为“黑盒”函数;在评估后只能观测到 ,而其导数通常不可用。[17]
由于目标函数未知,贝叶斯策略将其视为随机函数并在其上放置先验分布。先验反映了对函数行为的信念。在收集到被视为数据的函数评估后,将先验更新为目标函数的后验分布。后验分布进而用于构造采集函数(acquisition function,也常称为填充采样准则),以决定下一次查询点。
用于定义目标函数先验/后验分布的方法有若干种。最常见的两种方法是使用高斯过程(Gaussian processes)进行克里金(kriging)方法。另一种较低成本的方法使用树结构Parzen估计器为“高”和“低”点构建两组分布,然后寻找使期望改进最大的位点。[18]
标准贝叶斯优化依赖于每个 易于评估这一假设;偏离该假设的问题被称为“奇异贝叶斯优化”(exotic Bayesian optimization)问题。如果已知存在噪声、评估以并行方式进行、评估质量在难度与准确性之间存在权衡、存在随机环境条件,或评估涉及导数,优化问题就可能变为奇异问题。[17]
采集函数
编辑采集函数的示例包括:
- 改进概率(probability of improvement)
- 期望改进(expected improvement)
- 贝叶斯期望损失(Bayesian expected losses)
- 上置信界(Upper Confidence Bound,UCB)或下置信界
- 汤普森采样(Thompson sampling)
以及这些方法的混合体。[19]这些采集函数在探索与利用之间进行权衡,以尽量减少函数查询次数。因此,贝叶斯优化适用于评估代价昂贵的函数。
求解方法
编辑通常通过离散化或借助辅助优化器来寻找采集函数的最大值。采集函数的最大化采用数值优化技术,例如牛頓法或拟牛顿法,如BFGS算法。
应用
编辑该方法已被广泛应用于解决问题,[20]包括排序学习[21]、计算机图形学与视觉设计[22][23][24]、机器人学[25][26][27][28] 、無線感測網路[29][30]、自动算法配置[31][32] 、自动机器学习工具箱[33][34][35]、强化学习[36]、规划、视觉注意力、深度学习中的架构配置、静态程序分析、粒子物理实验[37][38]、质量-多样性优化[39][40][41]、化学、材料设计和药物开发等领域[17][42][43][44]。
贝叶斯优化也被应用于面部识别领域。[45]方向梯度直方图(Histogram of Oriented Gradients,HOG)算法作为一种流行的特征提取方法,其性能高度依赖参数设置。优化这些参数虽具挑战性但对提高准确率至关重要。[45]一种基于树结构帕森估计器(Tree-structured Parzen Estimator,TPE)的贝叶斯优化方法来同时优化HOG算法参数和图像尺寸,用于面部识别。[45]该优化方法具有适配其他计算机视觉应用的潜力,并推动基于手工参数的特征提取算法的发展。[45]
参见
编辑参考
编辑- ^ 1.0 1.1 Močkus, J. Bayesian Approach to Global Optimization. Dordrecht: Kluwer Academic. 1989. ISBN 0-7923-0115-3.
- ^ Garnett, Roman. Bayesian Optimization. Cambridge University Press. 2023 [2026-02-12]. ISBN 978-1-108-42578-0. (原始内容存档于2025-08-10).
- ^ Hennig, P.; Osborne, M. A.; Kersting, H. P. Probabilistic Numerics (PDF). Cambridge University Press. 2022: 243–278 [2026-02-12]. ISBN 978-1107163447. (原始内容存档 (PDF)于2025-12-03).
- ^ Snoek, Jasper. Practical Bayesian Optimization of Machine Learning Algorithms. Advances in Neural Information Processing Systems 25 (NIPS 2012). 2012, 25 [2026-02-12]. arXiv:1206.2944 . (原始内容存档于2025-12-17).
- ^ Klein, Aaron. Fast bayesian optimization of machine learning hyperparameters on large datasets. Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, PMLR. 2017: 528–536 [2026-02-12]. arXiv:1605.07079 . (原始内容存档于2025-11-13).
- ^ Močkus, Jonas. On bayesian methods for seeking the extremum. Optimization Techniques IFIP Technical Conference Novosibirsk, July 1–7, 1974. Lecture Notes in Computer Science 27. 1975: 400–404. ISBN 978-3-540-07165-5. doi:10.1007/3-540-07165-2_55 .
- ^ Močkus, Jonas. On Bayesian Methods for Seeking the Extremum and their Application. IFIP Congress. 1977: 195–200.
- ^ GARNETT, ROMAN. BAYESIAN OPTIMIZATION First published 2023. Cambridge University Press. 2023. ISBN 978-1-108-42578-0.
- ^ Kushner, Harold. vivo.brown.edu. [2026-02-12]. (原始内容存档于2021-04-16).
- ^ Jonas Mockus. Kaunas University of Technology. [2025-03-06]. (原始内容存档于2025-09-08) (英语).
- ^ Kaufmann, Emilie; Cappe, Olivier; Garivier, Aurelien. On Bayesian Upper Confidence Bounds for Bandit Problems. Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics (PMLR). 2012-03-21: 592–600 [2026-02-12]. (原始内容存档于2026-01-02) (英语).
- ^ Donald R. Jones. scholar.google.com. [2025-02-25]. (原始内容存档于2025-05-15).
- ^ Jones, Donald R.; Schonlau, Matthias; Welch, William J. Efficient Global Optimization of Expensive Black-Box Functions. Journal of Global Optimization. 1998, 13 (4): 455–492 [2026-02-12]. doi:10.1023/A:1008306431147. (原始内容存档于2025-02-15).
- ^ T. T. Joy, S. Rana, S. Gupta and S. Venkatesh, "Hyperparameter tuning for big data using Bayesian optimisation," 2016 23rd International Conference on Pattern Recognition (ICPR), Cancun, Mexico, 2016, pp. 2574-2579, doi: 10.1109/ICPR.2016.7900023. keywords: {Big Data;Bayes methods;Optimization;Tuning;Data models;Gaussian processes;Noise measurement},
- ^ Mackay, D. J. C. Introduction to Gaussian processes. Bishop, C. M. (编). Neural Networks and Machine Learning. NATO ASI Series 168. 1998: 133–165 [2025-03-06]. (原始内容存档于2024-04-23).
- ^ Wilson, Samuel, ParBayesianOptimization R package, 2019-11-22 [2019-12-12]
- ^ 17.0 17.1 17.2 Frazier, Peter I. A Tutorial on Bayesian Optimization. 2018-07-08. arXiv:1807.02811 [stat.ML].
- ^ J. S. Bergstra, R. Bardenet, Y. Bengio, B. Kégl: Algorithms for Hyper-Parameter Optimization (页面存档备份,存于互联网档案馆). Advances in Neural Information Processing Systems: 2546–2554 (2011)
- ^ Matthew W. Hoffman, Eric Brochu, Nando de Freitas: Portfolio Allocation for Bayesian Optimization. Uncertainty in Artificial Intelligence: 327–336 (2011)
- ^ Eric Brochu, Vlad M. Cora, Nando de Freitas: A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning (页面存档备份,存于互联网档案馆). CoRR abs/1012.2599 (2010)
- ^ Eric Brochu, Nando de Freitas, Abhijeet Ghosh: Active Preference Learning with Discrete Choice Data (页面存档备份,存于互联网档案馆). Advances in Neural Information Processing Systems: 409-416 (2007)
- ^ Eric Brochu, Tyson Brochu, Nando de Freitas: A Bayesian Interactive Optimization Approach to Procedural Animation Design (页面存档备份,存于互联网档案馆). Symposium on Computer Animation 2010: 103–112
- ^ Yuki Koyama, Issei Sato, Daisuke Sakamoto, Takeo Igarashi: Sequential Line Search for Efficient Visual Design Optimization by Crowds (页面存档备份,存于互联网档案馆). ACM Transactions on Graphics, Volume 36, Issue 4, pp.48:1–48:11 (2017). DOI: https://doi.org/10.1145/3072959.3073598
- ^ Yuki Koyama, Issei Sato, Masataka Goto: Sequential Gallery for Interactive Visual Design Optimization (页面存档备份,存于互联网档案馆). ACM Transactions on Graphics, Volume 39, Issue 4, pp.88:1–88:12 (2020). DOI: https://doi.org/10.1145/3386569.3392444
- ^ Daniel J. Lizotte, Tao Wang, Michael H. Bowling, Dale Schuurmans: Automatic Gait Optimization with Gaussian Process Regression 互联网档案馆的存檔,存档日期2017-08-12.. International Joint Conference on Artificial Intelligence: 944–949 (2007)
- ^ Ruben Martinez-Cantin, Nando de Freitas, Eric Brochu, Jose Castellanos and Arnaud Doucet. A Bayesian exploration-exploitation approach for optimal online sensing and planning with a visually guided mobile robot (页面存档备份,存于互联网档案馆). Autonomous Robots. Volume 27, Issue 2, pp 93–103 (2009)
- ^ Scott Kuindersma, Roderic Grupen, and Andrew Barto. Variable Risk Control via Stochastic Optimization. International Journal of Robotics Research, volume 32, number 7, pp 806–825 (2013)
- ^ Roberto Calandra, André Seyfarth, Jan Peters, and Marc P. Deisenroth Bayesian optimization for learning gaits under uncertainty (页面存档备份,存于互联网档案馆). Ann. Math. Artif. Intell. Volume 76, Issue 1, pp 5-23 (2016) DOI:10.1007/s10472-015-9463-9
- ^ Niranjan Srinivas, Andreas Krause, Sham M. Kakade, Matthias W. Seeger: Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting (页面存档备份,存于互联网档案馆). IEEE Transactions on Information Theory 58(5):3250–3265 (2012)
- ^ Garnett, R.; Osborne, M. A.; Roberts, S. J. Bayesian optimization for sensor set selection. ACM. 2010-04-12 [2026-02-12]. ISBN 978-1-60558-988-6. doi:10.1145/1791212.1791238. (原始内容存档于2025-06-27).
- ^ Frank Hutter, Holger Hoos, and Kevin Leyton-Brown (2011). Sequential model-based optimization for general algorithm configuration (页面存档备份,存于互联网档案馆), Learning and Intelligent Optimization
- ^ J. Snoek, H. Larochelle, R. P. Adams Practical Bayesian Optimization of Machine Learning Algorithms (页面存档备份,存于互联网档案馆). Advances in Neural Information Processing Systems: 2951-2959 (2012)
- ^ J. Bergstra, D. Yamins, D. D. Cox (2013). Hyperopt: A Python Library for Optimizing the Hyperparameters of Machine Learning Algorithms (页面存档备份,存于互联网档案馆). Proc. SciPy 2013.
- ^ Chris Thornton, Frank Hutter, Holger H. Hoos, Kevin Leyton-Brown: Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms (页面存档备份,存于互联网档案馆). KDD 2013: 847–855
- ^ Jasper Snoek, Hugo Larochelle and Ryan Prescott Adams. Practical Bayesian Optimization of Machine Learning Algorithms (页面存档备份,存于互联网档案馆). Advances in Neural Information Processing Systems, 2012
- ^ Berkenkamp, Felix. Safe Exploration in Reinforcement Learning: Theory and Applications in Robotics (Doctoral Thesis论文). ETH Zurich. 2019 [2026-02-12]. doi:10.3929/ethz-b-000370833. hdl:20.500.11850/370833. (原始内容存档于2025-04-13) (英语).
- ^ Philip Ilten, Mike Williams, Yunjie Yang. Event generator tuning using Bayesian optimization (页面存档备份,存于互联网档案馆). 2017 JINST 12 P04028. DOI: 10.1088/1748-0221/12/04/P04028
- ^ Evaristo Cisbani et al. AI-optimized detector design for the future Electron-Ion Collider: the dual-radiator RICH case (页面存档备份,存于互联网档案馆) 2020 JINST 15 P05009. DOI: 10.1088/1748-0221/15/05/P05009
- ^ Kent, Paul; Gaier, Adam; Mouret, Jean-Baptiste; Branke, Juergen. BOP-Elites, a Bayesian Optimisation Approach to Quality Diversity Search with Black-Box descriptor functions. 2023-07-19. arXiv:2307.09326 [math.OC]. Preprint: Arxiv.
- ^ Kent, Paul; Branke, Juergen. Bayesian Quality Diversity Search with Interactive Illumination. Proceedings of the Genetic and Evolutionary Computation Conference (PDF). GECCO '23. New York, NY, USA: Association for Computing Machinery. 2023-07-12: 1019–1026 [2026-02-12]. ISBN 979-8-4007-0119-1. S2CID 259833672. doi:10.1145/3583131.3590486. (原始内容存档 (PDF)于2024-04-23).
- ^ Gaier, Adam; Asteroth, Alexander; Mouret, Jean-Baptiste. Data-Efficient Design Exploration through Surrogate-Assisted Illumination. Evolutionary Computation. 2018-09-01, 26 (3): 381–410. ISSN 1063-6560. PMID 29883202. S2CID 47003986. arXiv:1806.05865 . doi:10.1162/evco_a_00231 .
- ^ Gomez-Bombarelli et al. Automatic Chemical Design using a Data-Driven Continuous Representation of Molecules (页面存档备份,存于互联网档案馆). ACS Central Science, Volume 4, Issue 2, 268-276 (2018)
- ^ Griffiths et al. Constrained Bayesian Optimization for Automatic Chemical Design using Variational Autoencoders (页面存档备份,存于互联网档案馆) Chemical Science: 11, 577-586 (2020)
- ^ Serles et al. Ultrahigh Specific Strength by Bayesian Optimization of Carbon Nanolattices (页面存档备份,存于互联网档案馆) Advanced Materials, 37 (14) 2410651 (2025)
- ^ 45.0 45.1 45.2 45.3 Mohammed Mehdi Bouchene: Bayesian Optimization of Histogram of Oriented Gradients (Hog) Parameters for Facial Recognition. SSRN (2023)