Skip to content
encrypt file python
Home » How to easily encrypt file in Python

How to easily encrypt file in Python

    The idea behind file encrypting in python

    In this article, we will create a simple script python that uses XOR to encrypt a file.
    Xor is one of the most simple encryption methods malware developers use to obfuscate their code thanks to its significant advantages:

    • simplicity (no need for an easy to find external encryption library)
    • minimal length

    Given that, it can be hard for an analyst to find, especially in a fast static analysis like string research, import table analysis or shellcode analysis.

    Prerequisites

    For this mini project the only thing you need is a working installation of python3

    What is XOR

    The XOR operator is a logical operation that takes two operands and returns a true or false value based on whether the operands are different or the same.

    It has the property that XORing two times an operand with the same one, we will have the starting one, it make possible to use this operator for symmetric encryption.

    The code

    We want to pass a file and XOR it char by char with a number passed in input:

    def xor_content(content, key:int):
        return [x^key for x in content]
    
    if __name__ == "__main__":
    
        if len(sys.argv) < 4:
            print("USAGE:")
            print("python3 xor_file.py <KEY> <INPUT_FILENAME> <OUTPUT_FILENAME>")
            exit(0)
        
        with open(sys.argv[2], 'rb') as i_file:
            in_content =  i_file.read()
            output = xor_content(in_content, int(sys.argv[1])) 
            with open(sys.argv[3], 'wb') as o_file: 
                o_file.write(bytearray(output))

    This code takes in a key and two file names as input. It then reads the first file, xors every byte with the key, and writes the result to the second file.

    We can launch it in this way:

    python3 main.py 65 input.txt output.txt

    Basically, launching that it will xor char by char of the file “input.txt” with 65 and then write the output onto “output.txt“.

    Improvements

    Encrypt a file in python seems really easy, but we can do something more.

    The XOR represents a single byte encoding, and due to the fact that a null byte xor with the key, returns the key, we can see a potential weakness. In fact usually there are a lot of null bytes and it can see that the encoded file shows a bunch of bytes equal to the key.
    The same problem occurs with a byte equal to the key.

    The malware author mitigates that using a NULL-preserving single-byte XOR encoding scheme.
    It simply means to skip NULL bytes and the ones equal to the key.
    So let’s try to make it, changing the xor_content method:

    def xor_content(content, key:int):
        return [x^key for x in content if x!=key and x != 0]

    Bruteforcing XOR encoding

    Our work seems to be ok, but it works just if we have found the key in assembly.
    Now let’s try to code a tool that bruteforces the encrypted file in order to find a specific set of bytes.

    First of all we need to write a method that checks if a sub is subarray of arr.

    def is_subarray(sub, arr):
        if len(sub) > len(arr):
            return False
        for i in range(len(arr)-len(sub)+1):
            if arr[i:i+len(sub)] == sub:
                return True
        return False

    This code block opens a file and reads its contents into a list.
    It then uses a for loop to iterate through every possible key value from 1-128.
    For each key, it xor’s the list of file contents with the key, and converts the result into a string.
    If the string contains the substring (in hex) passed as third argument, it prints the key and exits the program;
    otherwise, it prints “Key not found” and exits the program.

    Put all together

    Now let see the complete code for both scripts

    import sys
    
    
    def xor_content(content, key:int):
        return [x^key for x in content if x!=key and x != 0]
    
    if __name__ == "__main__":
    
        if len(sys.argv) < 4:
            print("USAGE:")
            print("python3 xor_file.py <KEY> <INPUT_FILENAME> <OUTPUT_FILENAME>")
            exit(0)
        
        with open(sys.argv[2], 'rb') as i_file:
            in_content =  i_file.read()
            output = xor_content(in_content, int(sys.argv[1])) 
            with open(sys.argv[3], 'wb') as o_file: 
                o_file.write(bytearray(output))
    import sys
    
    
    def xor_content(content, key:int):
        return [x^key for x in content if x!=key and x != 0]
    
    def is_subarray(sub, arr):
        if len(sub) > len(arr):
            return False
        for i in range(len(arr)-len(sub)+1):
            if arr[i:i+len(sub)] == sub:
                return True
        return False
    
    if __name__ == "__main__":
    
        if len(sys.argv) < 3:
            print("USAGE:")
            print("python3 bruteforce.py <INPUT_FILENAME> <KEY_TO_SEARCH>")
            exit(0)
        
        with open(sys.argv[1], 'rb') as i_file:
            in_content =  [x for x in i_file.read()]
            for k in range(1,128):
                output = xor_content(in_content, k)
                string = [x for x in bytearray.fromhex(sys.argv[2])]
                if is_subarray(string, output):
                    print(f"The key is {k}")
                    sys.exit(0)
            print("Key not found")

    Conclusion

    At this point we have all that we need, and we can test it with this simple commands:

    python3 xor_file.py 88 in.txt out.txt 
    python3 bruteforce.py out.txt 4d5a
    
    

    If everything is right the output should be:

    The key is 88

    Note: “4d5a” passed as input in our example is the hexadecimal ASCII value of the initial two bytes of a DOS MZ EXECUTABLE.

    This is a technique to hinder static analysis, if you want to know some simple ideas to do the same with dynamic analysis, take a look to those posts: