Initially, I was thinking along with your traditional blocks on the disk, but I quickly realised that the block number 1073741825 was waaaaaay too large to be referring to the image itself.
I next tried my luck by running TSK tools (complied with libewf) over the images.
First running mmls
to get the offset:
Then using the offset of the main partitions to run fls
over the
fls -o 2048 -r -m / -i ewf /mnt/hgfs/Case2-HDFS/HDFS-Master.E01 | grep 1073741825
I grepped over the image with the block number just in case there was an inode or file name that would match. There were a few hits for an interesting file called blk_1073741825
which seemed to be related to Hadoop file blocks.
I jumped into an Autopsy case with the challenge images preloaded and looked into these files.
From the content of the file, it looks like the typical content of a unix sources file. I tried sources
, and sources.list
but they were both wrong…
I decided to take a few steps back and try to understand what I was looking at. After *lots* of googling about HDFS, I managed to conclude the following:
- The distributed file system allocates blocks when replicating the data across the nodes
- The blk_* files contain the raw bytes of the portion (or whole file in this case) to be replicated
- There are transaction logs that contain the metadata of the HDFS replications in edit_* files. More info on this below:
These files are in binary format and although I actually solved the challenge here when I saw /text/AptSource
, I looked a bit deeper to understand the format.
To parse the files, Hadoop has inbuilt functionality which is documented here.
(I ran out of time bit would have liked to test this out! Hopefully, some of the other write-ups explain this a bit better :P)
Answer: AptSource