skip navigation
skip mega-menu

一个带有AutoGen代理的Cobol到Python转换器

In this story, we are going to explore a Cobol to Python converter written with Microsoft’s AutoGen framework. AutoGen is a Python-based framework which allows to orchestrate multiple types of agents using different conversation patterns.

About AutoGen

AutoGen有以下几种基本代理:

  • ConversableAgent — this is the agent with the base functionality and is the base class for all other AutoGen agents. It contains the base functionality to send and receive messages from other agents, to initiate or continue a conversation.

  • UserProxyAgent — is a proxy agent for humans, soliciting human input as the agent’s reply at each interaction turn by default and also having the capability to execute code and call functions. We will be using this type of agent in our converter to start and terminate the conversation or to execute some code.

  • AssistantAgent — the agent which interacts with the LLM and typically generates text. 它既不执行代码,也不与用户交互. We have used this agent to generate Python code, generate unit tests and code reviews.

You can also create a GroupChat which allows you to manage a group of AssistantAgent’s which interact with each other.

最后,AutoGen还允许创建您自己的自定义代理, 但是我们没有在这个转换器中使用这个功能.

这些对话模式可以是:

  • 用户代理和助理座席之间的双向聊天

  • 群组聊天涉及一个用户代理和一组助理座席

  • multi-group chat involving multiple groups of agents which interact with each other

欲知详情,请浏览AutoGen网页: http://microsoft.github.io/autogen/docs/Use-Cases/agent_chat 

What is the goal of the Cobol Converter?

The goal of this command line tool is to read Cobol files and convert them into Python code with documentation and unit tests. Its secondary goal is to convert the Python code into REST based applications using FastAPI.

High-level workflow

How does the Cobol Converter work?

The Cobol converter uses two types of tools:

So we have combined AutoGen agents which are backed by an LLM (gpt-4–1106-preview) and also by conventional tools used for checking and formatting code.

在这个应用程序中有两个AutoGen代理集合(团队)在工作.

  • Cobol conversion team

  • REST conversion team

Agents Teams

Cobol Conversion Team

The Cobol conversion team converts Cobol to Python with documentation and unit tests. It has three agents apart from the user proxy agent which receives the user input:

  • Cobol转换代理——负责使用LLM进行Cobol转换

  • The Unit test agent — used to generate the unit tests from Python code using an LLM

  • The code reviewer — used to review the converted Python code and also the unit tests

REST Conversion Team

The REST conversion team gets as input the converted Python code and converts it into a REST interface. 如果应用程序是某种命令行应用程序, 它变成了一个基于REST接口的应用程序 FastAPI.

Full workflow of the Cobol Agent

The Cobol to Python converter workflow consists of a loop which processes each single Cobol file and uses the two agent ensembles and the traditional tools (Black and Pylint).

下面是这个工作流的注释版本:

Full conversion workflow

Cobol转换器工作流有以下几个主要阶段:

  1. 初始循环处理目录中的每个Cobol文件.

  2. 这是Cobol转换团队的任务块. In this block of tasks the Cobol code is converted, unit tests are created and the code is reviewed. When the Cobol Conversion Team stops, 它从代理中提取所有相关的Python代码或文本块.

  3. 在这个任务块中,代码审查被写入磁盘. Python代码也被格式化并写入磁盘. The code is also analysed with Pylint and the result of the analysis is written to disk. 单元测试也被执行,其输出保存在一个文件中.

  4. 这是REST转换团队的任务块. It has two agents which interact with each other: the REST code converter and the code reviewer. After they have generated the code, the Python code is extracted along with the code review.

  5. 在这个块中,代码审查被写入磁盘, the REST interface is actually executed in a process — to see if the code compiles and also runs. 然后关闭该进程,格式化并写入磁盘. Pylint is then used to analyse the code and this analysis is also written to disk.

在处理完所有Cobol文件后,转换器完成.

The Cobol Converter Output

For a small Cobol file like this one, you should get an output like this one:

Conversion output example

The files are:

  • rest_critique_write_student_2.txt - REST实现的代码审查

  • rest_write_student_2.py — The REST based implementation

  • rest_write_student_2.py_lint.REST实现的静态代码分析结果

  • test_write_student_2.py — The unit tests for the converted file

  • test_write_student_2.py_lint — The static code analysis report for the unit tests of the converted file

  • test_write_student_2.py_test_output.test_write_student_2的执行日志.py

  • write_student_2.py — The Cobol conversion file

  • write_student_2.py_lint.write_student_2的静态代码分析.py

If you are interested in the converted files, please check this Google Drive link: http://drive.google.com/drive/u/2/folders/1F7dqo5F2_zDzD8GcLlQFj70ZLdox5SL9 

Implementation

这个命令行工具的全部代码可以在这个存储库中找到: http://github.com/onepointconsulting/cobol-converter 

Installation, Configuration, Running

The Cobol code converter is a Python 3.11 application which requires Conda to be installed.

安装说明可以在 README of this project.

项目的配置依赖于 .env file, similar to the .env_local file that you can find in this project.

The Cobol files are read from a directory which should be under the project root folder. 该目录由SOURCE_CODE_DIR环境变量引用.

应用程序的主要入口点是这个文件: http://github.com/onepointconsulting/cobol-converter/blob/main/cobol_converter/cobol_converter_main.py

This main entry point accepts three arguments that determine how the output files are written: overwrite (overwrites the output files), clear (clears the output files), Only_new(只写尚未翻译的文件)

Prompts

我们已经将代理提示从代码中分离出来. 所有代理和用户代理的提示都在这里 tool filehttp://github.com/onepointconsulting/cobol-converter/blob/main/prompts.toml 

以下是Cobol转换团队使用的一些提示符:

[agents]

   [agents.python_coder]

   system_message = """你是一个有用的人工智能助手.

You convert Cobol code into Python code. Please do not provide unit tests. 而是提供一个main方法来运行应用程序.

Also do not omit any code for brevity. We want to see the whole code."""

   [agents.python_unit_tester]

   system_message = """你是一个有用的人工智能助手.

You create unit tests based on the unit test library for Python code in the conversation.

请将您正在测试的原始Python代码复制到您的响应中.

请确保导入单元测试库. Provide a main method to run the tests."""

   [agents.code_critic]

   system_message = """Critic. You are a helpful assistant highly skilled in evaluating the quality of a given code by providing a score from 1 (bad) - 10 (good) while providing clear rationale. 您必须考虑为每个评估编写最佳实践. Specifically, you can carefully evaluate the code across the following dimensions

—bugs (bug):是否有bug、逻辑错误、语法错误或打字错误? 有没有什么原因导致代码无法编译? How should it be fixed? 如果有bug存在,那么bug得分必须小于5分.

-目标遵从性(符合性):Cobol代码转换的好坏?

-数据编码(encoding):您能找到的单元测试有多好?


你必须为以上每一个维度提供一个分数.

{bug: 0,转换:0,遵从性:0,类型:0,编码:0,美观:0}

Do not suggest code.

Finally, based on the critique above, 建议编码员应该采取哪些具体的行动来改进代码.

如果单元测试已经可用并且看起来正常,则回复终止"""

Agent Teams Setup

代理团队设置在这两个文件中:

Cobol转换团队在这个文件中设置: http://github.com/onepointconsulting/cobol-converter/blob/main/cobol_converter/service/agent_setup.py 

REST转换团队在此文件中设置: http://github.com/onepointconsulting/cobol-converter/blob/main/cobol_converter/service/agent_rest_setup.py 

Main Workflow Implementation

工作流的主要实现可以在这个文件中找到: http://github.com/onepointconsulting/cobol-converter/blob/main/cobol_converter/service/cobol_conversion_service.py 

The link below points to the method which processes every Cobol file and performs the Cobol to Python conversion, Unit test generation, formatting, 静态代码分析和REST接口创建

http://github.com/onepointconsulting/cobol-converter/blob/main/cobol_converter/service/cobol_conversion_service.py#L94 

有关代码的更多详细信息,请查看 the code repository.

Takeaways

The Cobol Converter is a first step that can convert Cobol to Python in a short period of time. 较小的、较简单的程序可以正确地翻译,甚至可以执行.g. the write_student_2 example). However more complicated programmes (like e.g. the tic tac toe game) were not functionally equivalent when translated to Python (i.e. you could start the program, 但游戏是不可玩的)——即使输出的语法是正确的.

The LLM we used was gpt-4–1106-preview. We have not used any other models. 可能有其他模型在将Cobol转换为Python方面做得更好.

Ways to improve the conversion

我们可以想出几种方法来提高转化率:

  • 一个经过微调的LLM,专门研究Cobol方言的转换. This would be extremely important to have, especially if you intend to do many conversions.

  • 在脚本生成过程中建立人工反馈. The human feedback team should consist or at least one developer that understand Cobol and the target language well. No matter how good an LLM is these days, they still might generate code in unexpected or ways that are not aligned with your business goals. It would be very valuable if human developer feedback can be fed into the code generation process.

If you opt for fine-tuning a model you will eventually be able to refine your model over time by using the result of any successful conversion as new data to fine-tune your LLM.

Gil Fernandes, Onepoint Consulting

Subscribe to our newsletter

Sign up here