arxiv:2306.06755

Attention, Compilation, and Solver-based Symbolic Analysis are All You Need

Published on Jun 11, 2023

Authors:

Aryan Mahajan ,

Abstract

In this paper we present a Java-to-Python (J2P) and Python-to-Java (P2J) back-to-back code translation method, and associated tool called CoTran, based on large language models (LLMs). Our method leverages the attention mechanism of LLMs, compilation, and symbolic execution-based test generation for equivalence testing between the input and output programs. More precisely, we modify the typical LLM training loop to incorporate compiler and symbolic execution loss. Via extensive experiments comparing CoTran with 10 other transpilers and LLM-based translation tools over a benchmark of more than 57,000 Java-Python equivalent pairs, we show that CoTran outperforms them on relevant metrics such as compilation and runtime equivalence accuracy. For example, our tool gets 97.43% <PRE_TAG>compilation accuracy</POST_TAG> and 49.66% runtime equivalence accuracy for J2P translation, whereas the nearest competing tool only gets 96.44% and 6.8% respectively.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

No model linking this paper

Cite arxiv.org/abs/2306.06755 in a model README.md to link it from this page.

No dataset linking this paper

Cite arxiv.org/abs/2306.06755 in a dataset README.md to link it from this page.

No Space linking this paper

Cite arxiv.org/abs/2306.06755 in a Space README.md to link it from this page.

No Collection including this paper

Add this paper to a collection to link it from this page.