The complexity of manycore System-on-chips (SoCs) is growing faster than our ability to manage them to reduce the overall energy consumption. Further, as SoC design moves toward three-dimensional (3D) architectures, the core’s power density increases leading to unacceptable high peak chip temperatures. In this article, we consider the optimization problem of dynamic power management (DPM) in manycore SoCs for an allowable performance penalty (say, 5%) and admissible peak chip temperature. We employ a machine learning– (ML) based DPM policy, which selects the voltage/frequency levels for different cluster of cores as a function of the application workload features such as core computation and inter-core traffic, and so on. We propose a novel learning-to-search (L2S) framework to automatically identify an optimized sequence of DPM decisions from a large combinatorial space for joint energy-thermal optimization for one or more given applications. The optimized DPM decisions are given to a supervised learning algorithm to train a DPM policy, which mimics the corresponding decision-making behavior. Our experiments on two different manycore architectures designed using wireless interconnect and monolithic 3D demonstrate that principles behind the L2S framework are applicable for more than one configuration. Moreover, L2S-based DPM policies achieve up to 30% energy-delay product savings and reduce the peak chip temperature by up to 17 °C compared to the state-of-the-art ML methods for an allowable performance overhead of only 5%.