resym / README.md
ejschwartz's picture
minor renaming
ac00bbf
---
title: ReSym Space
emoji: 🐢
colorFrom: green
colorTo: indigo
sdk: gradio
sdk_version: 5.22.0
app_file: app.py
pinned: false
---
# ReSym Space
This is a space for testing the models from the [ReSym
artifacts](https://github.com/lt-asset/resym). Sadly, at the time I am writing
this, not all of ReSym is publicly available; specifically, the Prolog component
is [not available](https://github.com/lt-asset/resym/issues/2).
This space simply performs inference on the two pretrained models available as
part of the ReSym artifacts. It takes a variable name and some decompiled code
as input, and outputs the variable type and other information.
The examples are randomly selected from `vardecoder_test.jsonl`. As a result, the fields do not always parse correctly.
## Disclaimer
I'm not a ReSym developer and I may have messed something up. In particular,
you must prompt the variable names in the decompiled code as part of the prompt,
and I reused some of their own code to do this.
## Known Issues / Oddities
### sub_40FD86
We do not get the same results for sub_40FD86. In fact, we don't create the same prompt. The prompt in `vardecoder_test.jsonl` is:
What are the original name and data type of variables `v3`, `v4`, `v5`?
It's unclear why a1, a2, and result are not listed.
### `first_token` weirdness
The [example
inference](https://github.com/lt-asset/resym/blob/main/training_src/fielddecoder_inf.py)
scripts get the first token of the output and include it in the prompt.
Technically this is data leakage, but since the first token is usually part of
the prompt (a variable name or field expression) it's probably OK? But it's
also pretty weird.
### Indentation
Some decompilations in the dataset have whitespace for indentation included, and
some do not.
### `field_access_driver` clang parser
ReSym uses a clang-based parsing tool to extract field accesses. The tool still
outputs the field accesses even if the code does not parse correctly. This
seems to be design, so I am doing this too. Otherwise, most of the ReSym
examples do not work, because external functions and variables are not properly
declared.
Another oddity is that sometimes the field access driver will output a field
access expression of `""`. This appears to be a bug in the field access driver.
### Other
* ReSym's parser fails for functions with a non-automatic name
## Todo
* Test field decoding more