Challenge: optimus-prime Category: Reverse Engineering Author: mitza (Tudor Mihai Alexandru)

optimus-prime - full reverse solve writeup

This challenge hint was: "It is rumored that a quantum computer could solve this in under one second." Real meaning: the binary computes the flag character by character with intentionally expensive math and recursion, so direct execution is slow. The practical solve is to reverse the algorithm and reimplement it with memoization.

Goal format: DCTF{[0-9a-f]{64}}. Binary provided: optimus-prime.

Quick Recon
Protection and Anti-Analysis Checks
Finding the Real Decrypt Loop
Extracting Static Tables
Reconstructing Core Functions
Per-Character Formula
Building the Offline Solver
Flag
Takeaways

1. Quick Recon

First triage:

ls -la
file ./optimus-prime
checksec --file=./optimus-prime
strings -n 4 ./optimus-prime | head -n 80

Important findings:

64-bit stripped PIE ELF, dynamically linked.
NX enabled, Full RELRO, no canary.
Interesting strings: /proc/self/status, TracerPid:, Computing flag..., Decrypted flag: %s.

Running the binary prints Computing flag... and starts outputting flag characters immediately, but very slowly. So this is not a strict checker; it is a slow flag generator.

2. Protection and Anti-Analysis Checks

From disassembly, main calls four functions in order:

fcn.00001390
fcn.00001430
fcn.00001280
fcn.00001f50

Reverse summary:

fcn.00001390: reads /proc/self/status, parses TracerPid:, exits if non-zero. Basic anti-debug.
fcn.00001430: measures runtime of a fixed arithmetic loop and exits if slower than 0xC350 microseconds (50 ms). Anti-instrumentation/anti-emulation timing check.
fcn.00001280: extra arithmetic warmup/obfuscation loop (100 rounds, modulo-like transformations).
fcn.00001f50: actual flag generation and printing loop.

None of these checks block a pure static + offline reimplementation strategy.

3. Finding the Real Decrypt Loop

Function 0x1f50 prints one character per iteration and flushes stdout each time. Loop counter is r15, from 0 to 0x45, so output length is 70 bytes.

Per iteration:

Load one 32-bit value from table at .rodata + 0x3100.
Compute a 64-bit state via recursive helper fcn.00001c90 (for these inputs, always taken).
Fold bytes and apply small algebraic transform to produce one byte.
XOR with byte from key table at .rodata + 0x30a0.
Print resulting character.

The control flow has branch noise that can output constants for tiny inputs, but table inputs are all > 2, so only the expensive recursive path matters.

4. Extracting Static Tables

Dumping .rodata gave all required constants:

objdump -s -j .rodata ./optimus-prime

Relevant regions:

0x3100..: 70 dwords: 0x0f, 0x17, 0x1f, ... i.e. n[i] = 0x0f + 8*i
0x30a0..: 70-byte XOR key stream
0x3060..: qword constants used by helper mixers

Extracted key bytes:

4c9e068a51f18a75a89368ba7f78f00814ad4c42a978d4d59859273b9737
ffe82b521d5c8c4902f06cf52f95c5111cce427f705b4afd60c0f4076180
e687a4913b8cbac74df8

5. Reconstructing Core Functions

The heavy part is a recursive chain:

f1c90(n)  [at 0x1c90]
  ├─ uses f1530(x, m) [0x1530]
  ├─ uses f16e0(x, m) [0x16e0]
  └─ uses f1b30(x, m) [0x1b30]

All four functions are recursive and full of equivalent-but-obscure base cases. Manual execution is unrealistic. I converted each to Python using unsigned 64-bit arithmetic and memoization.

Important constants seen across functions:

0x9e3779b97f4a7c15
0x9e3779b97f4ac2fa
0xc6a4a7935bd1e995
0xdeadbeef
0xbeef
0xbabe
0x0123456789abcdef
0x0123456789ab7751

6. Per-Character Formula

Once a 64-bit value x = f1c90(n[i]) is available, one output byte is:

y = x ^ (x >> 8) ^ (x >> 16) ^ (x >> 24) ^ (x >> 32) ^ (x >> 40) ^ (x >> 48) ^ (x >> 56)
z = (y * 8) ^ (y << 7) ^ y
al = ((z & 0xffffffff) * 0x15) & 0xff
out[i] = al ^ key[i]

This is equivalent to a custom byte-mixing + keystream XOR decryption layer.

7. Building the Offline Solver

The following script mirrors the reversed logic (with memoization) and reconstructs all 70 characters instantly.

from functools import lru_cache

MASK = (1 << 64) - 1
C1 = 0x9e3779b97f4a7c15
C1_ALT = 0x9e3779b97f4ac2fa
C2 = 0xc6a4a7935bd1e995
D = 0xdeadbeef
BEEF = 0xbeef
BABE = 0xbabe
C3 = 0x0123456789abcdef
C3_ALT = 0x0123456789ab7751

def u64(x): return x & MASK
def rol(x, r): return u64((x << r) | (x >> (64 - r)))
def swap32(x): return ((x & 0xffffffff) << 32) | ((x >> 32) & 0xffffffff)

@lru_cache(maxsize=None)
def f1530(x, m):
    x = u64(x)
    if m <= 1: r12 = x
    elif m == 2: r12 = u64(C1 ^ x)
    elif m == 3: r12 = u64((C2 * x) ^ D)
    else: r12 = f1530(x, m - 1)

    xp1 = u64(x + 1)
    if m <= 2: r13 = xp1
    elif m == 3: r13 = u64(xp1 ^ C1)
    elif m == 4: r13 = u64((C2 * xp1) ^ D)
    else: r13 = f1530(xp1, m - 2)

    xb = u64(x ^ BEEF)
    if m <= 3: rdi = xb
    elif m == 4: rdi = u64(C1_ALT ^ x)
    elif m == 5: rdi = u64((C2 * xb) ^ D)
    else: rdi = f1530(xb, m - 3)

    return u64(r12 + r13 + rdi)

@lru_cache(maxsize=None)
def f1b30(x, m):
    x = u64(x)
    if m <= 1: a = x
    elif m == 2: a = u64(C3 ^ x)
    elif m == 3: a = rol(x, 7)
    else: a = f1b30(x, m - 1)

    xp = u64(x + 0xcafe)
    if m <= 2: b = xp
    elif m == 3: b = u64(xp ^ C3)
    elif m == 4: b = rol(xp, 7)
    else: b = f1b30(xp, m - 2)

    xb = u64(x ^ BABE)
    if m <= 3: c = xb
    elif m == 4: c = u64(C3_ALT ^ x)
    elif m == 5: c = rol(xb, 7)
    else: c = f1b30(xb, m - 3)

    w = u64(a ^ b) >> 13
    return u64(a + b + c + w + w)

def mix_final(rdi, r12, rbp):
    return u64(rdi + r12 + rbp + swap32(u64(r12 ^ rbp)))

@lru_cache(maxsize=None)
def f16e0(x, m):
    x = u64(x)
    xh = u64(x ^ 0x1337)

    if m == 3:
        A = u64(xh * C1)
        B = u64((C2 * x) ^ D)
        return mix_final(u64(x + 3), A, B)

    if m == 4:
        A = u64(xh * C1)
        B = u64((C2 * x) ^ D)
        p = mix_final(u64(x + 3), A, B)
        return mix_final(u64((x + 4) * C1), u64((C2 * xh) ^ D), p)

    if m == 5:
        A = u64(xh * C1)
        B = u64((C2 * x) ^ D)
        p = mix_final(u64(x + 3), A, B)
        q = mix_final(u64((x + 4) * C1), u64((C2 * xh) ^ D), p)
        return mix_final(u64((C2 * u64(x + 5)) ^ D), f16e0(xh, 3), q)

    if m == 6:
        p = mix_final(u64((x + 4) * C1), u64((C2 * xh) ^ D), f16e0(x, 3))
        q = mix_final(u64((C2 * u64(x + 5)) ^ D), f16e0(xh, 3), p)
        return mix_final(f16e0(u64(x + 6), 3), f16e0(xh, 4), q)

    raise ValueError(m)

@lru_cache(maxsize=None)
def f1c90(n):
    if n <= 2:
        r12 = rbx = r13 = 1
    elif n == 3:
        r12, rbx, r13 = 5, 1, 1
    elif n == 4:
        r12, rbx, r13 = f1c90(3), 5, 1
    elif n == 5:
        r12, rbx, r13 = f1c90(4), f1c90(3), 5
    else:
        r12, rbx, r13 = f1c90(n - 1), f1c90(n - 2), f1c90(n - 3)

    var38 = f1530(n, (n % 5) + 3)
    var30 = f16e0(n, (n % 4) + 3)
    var28 = f1b30(n, (n % 3) + 3)
    acc = (var38 & 0xff) + (var30 & 0xff) + (var28 & 0xff)
    return u64(acc + r13 + r12 + rbx)

key = bytes.fromhex(
    "4c9e068a51f18a75a89368ba7f78f00814ad4c42a978d4d59859273b9737"
    "ffe82b521d5c8c4902f06cf52f95c5111cce427f705b4afd60c0f4076180"
    "e687a4913b8cbac74df8"
)

vals = [0x0f + 8 * i for i in range(70)]
out = []
for i, n in enumerate(vals):
    x = f1c90(n)
    y = x ^ (x >> 8) ^ (x >> 16) ^ (x >> 24) ^ (x >> 32) ^ (x >> 40) ^ (x >> 48) ^ (x >> 56)
    z = u64(u64(y * 8) ^ u64(y << 7) ^ y)
    al = ((z & 0xffffffff) * 0x15) & 0xff
    out.append(al ^ key[i])

print(bytes(out).decode())

Output:

[flag output intentionally omitted]

This matches required format exactly: DCTF{[0-9a-f]{64}}.

8. Flag

Final flag:
[withheld]

9. Takeaways

When a binary prints output but is extremely slow, treat it like an obfuscated generator and reimplement offline.
Anti-debug checks based on TracerPid and time thresholds are easy to bypass with static analysis.
Memoization is essential for recursive RE challenges; otherwise runtime explodes.
Always extract and trust static tables from .rodata first, then rebuild logic around them.

Writeup by mitza (Tudor Mihai Alexandru).
More challenge notes: Cyber-EDU profile