resym / README.md
ejschwartz's picture
minor renaming
ac00bbf

A newer version of the Gradio SDK is available: 5.23.3

Upgrade
metadata
title: ReSym Space
emoji: 🐢
colorFrom: green
colorTo: indigo
sdk: gradio
sdk_version: 5.22.0
app_file: app.py
pinned: false

ReSym Space

This is a space for testing the models from the ReSym artifacts. Sadly, at the time I am writing this, not all of ReSym is publicly available; specifically, the Prolog component is not available.

This space simply performs inference on the two pretrained models available as part of the ReSym artifacts. It takes a variable name and some decompiled code as input, and outputs the variable type and other information.

The examples are randomly selected from vardecoder_test.jsonl. As a result, the fields do not always parse correctly.

Disclaimer

I'm not a ReSym developer and I may have messed something up. In particular, you must prompt the variable names in the decompiled code as part of the prompt, and I reused some of their own code to do this.

Known Issues / Oddities

sub_40FD86

We do not get the same results for sub_40FD86. In fact, we don't create the same prompt. The prompt in vardecoder_test.jsonl is:

What are the original name and data type of variables `v3`, `v4`, `v5`?

It's unclear why a1, a2, and result are not listed.

first_token weirdness

The example inference scripts get the first token of the output and include it in the prompt. Technically this is data leakage, but since the first token is usually part of the prompt (a variable name or field expression) it's probably OK? But it's also pretty weird.

Indentation

Some decompilations in the dataset have whitespace for indentation included, and some do not.

field_access_driver clang parser

ReSym uses a clang-based parsing tool to extract field accesses. The tool still outputs the field accesses even if the code does not parse correctly. This seems to be design, so I am doing this too. Otherwise, most of the ReSym examples do not work, because external functions and variables are not properly declared.

Another oddity is that sometimes the field access driver will output a field access expression of "". This appears to be a bug in the field access driver.

Other

  • ReSym's parser fails for functions with a non-automatic name

Todo

  • Test field decoding more