A Generalize Hardware Debugging Approach for Large Language Models Semi-Synthetic, Datasets
Document Type
Article
Publication Date
11-26-2024
Department
Department of Electrical and Computer Engineering
Abstract
Large Language Models (LLMs) have precipitated emerging trends towards intelligent automation. However, integrating LLMs into the hardware debug domain encounters challenges: the datasets for LLMs for hardware are often plagued by a dual dilemma - scarcity and subpar quality. Traditional hardware debug approaches that rely on experienced labor to generate detailed prompts are not cheaply scalable. Similarly, strategies that depend on existing LLMs and randomly generated prompts fail to achieve sufficient reliability. We propose a directed, semi-synthetic data synthetic method that leverages version control information and journalistic event descriptions. To produce high-quality data, this approach utilizes version control data from hardware projects combined with the 5W1H (Who, What, When, Where, Why, How) journalistic principles. It facilitates the linear scaling of dataset volumes without depending on skilled labor. We have implemented this method on a collected dataset of open-source hardware designs and fine-tuned fifteen general-purpose LLMs to enable their capability in hardware debugging tasks, thereby validating the efficacy of our approach.
Publication Title
IEEE Transactions on Circuits and Systems I: Regular Papers
Recommended Citation
Fu, W.,
Li, S.,
Zhao, Y.,
Yang, K.,
Zhang, X.,
Jin, Y.,
&
Guo, X.
(2024).
A Generalize Hardware Debugging Approach for Large Language Models Semi-Synthetic, Datasets.
IEEE Transactions on Circuits and Systems I: Regular Papers.
http://doi.org/10.1109/TCSI.2024.3487486
Retrieved from: https://digitalcommons.mtu.edu/michigantech-p2/1315