8 posts

Please don’t use unknown online services for customer verification.

I didn’t really know how to title this post, as the above seems almost too obvious. Unfortunately, I’ve found more and more private sector companies using random online services for customer verification and ‘OSINT’ work. I was finally prompted to write this after seeing a company upload a customer’s passport to Forensically (which is a fantastic site) and then a popular reverse image search site. I believe Forensically and the reverse image site are all above board, but I don’t know that, and neither did they when they uploaded the passport.

The issue isn’t just that this is happening, but that those I’ve spoken to just don’t understand why it’s an issue. This is even in the wake of GDPR (which you’ll be painfully aware of if you’re in Europe). What’s most confusing about all this is that those working in counter-fraud/customer verifications know how attractive photos of passports and the like are to criminals, but they don’t seem to make the connection between using random online services and the potential for criminality.

Since challenging a few companies, the two phrases I’ve heard on repeat are:
We’ve checked the site’s code of conduct/details and it’s fine” or “We’ve checked the code and nothing gets ‘uploaded’ anywhere, it’s all on-page javascript etc.”

This might be true at 9am, but might have changed by 1030am – have you checked again? Each time you uploaded something? Really?

A random site on the internet has nothing to prevent them doing whatever they like, and changing what they do at will. What if your favourite reverse image search tool decided one day to start publishing all the pictures you’ve reverse searched somewhere? Would you know if it changed? Would your staff? What would you be able to do about it?

Similarly with tools that are all running ‘on page’ and not uploading the data to a server – would you know if that changed? It’s usually a single line of code to dump all of the results to a server somewhere.

Not to ramp up the scare tactics, but I’ve personally heard of a few people running sites with free services getting shady offers involving skimming data, and it only takes a few bad days to make those offers seem a lot more appealing. Obviously this can happen in any business context, but it’s far more likely if they’re not breaking any contracts.

How can I properly procure an online service?

The first step with using anything to process customer data is always to get straight on the phone to your legal department. There’s no substitute for proper legal advice, and with data being such a hot issue at the moment, it’s vital to stay on the right side of the law with it all.

As some more general guidelines for if you’ve not got a great legal department:
If you can’t get something that works completely offline (and make sure that it’s properly firewalled off), you need to get a proper, legally binding agreement. If you’re in Europe, for each service you use, you need a ‘data processor agreement’ (https://gdpr.eu/what-is-data-processing-agreement/) to achieve basic compliance. Even if you’re not based in Europe, you’re still going to run into trouble if you don’t have a similar sort of agreement in place. The agreement doesn’t need to be a 300 page tome – I’ve seen perfectly acceptable agreements on two sides of A4.

When dealing with any supplier, make sure to ask what happens with the data – do they store it? How long for? Do they have external auditing procedures? This all needs to be in writing. Often suppliers will have a boilerplate contract which won’t go into much detail – ignore these (or fill them out if you must) and send over a list of all your questions and write your own contract up. My advice is to make the questions as direct as possible and with as little wiggle room as possible. Don’t allow for responses like ‘we keep your data for as long as is reasonable’ – you want it in a numerical format. If they can’t give you an exact timeframe, write your own clause in – ‘as soon as possible, but always within x weeks/months’. Ideally, they shouldn’t store anything at all, but this somehow seems impossible for most services.

You also need to make sure that you’re notified of any changes to their service. This can be a tricky one to negotiate, as most companies will only really want to notify you of big updates or changes. Don’t settle for this – you need to be informed of any change to the live code base, and to have a designated point of contact to talk these changes through. You also need to be able to leave the contract if any of the changes aren’t to your liking.

Final Notes

I know this all sounds like common sense, but somehow the risks get forgotten somewhere down the line. In the last year or two – regardless of all the data leaks we’ve seen – some counter-fraud and verification teams are still using online services on faith alone. Check what you’re using, how you’re using it and if you have everything you need in place.

Social Engineering – basically just fraud.

I’ve been doing Open Source Intelligence (OSINT) gathering professionally for quite a while now – back from the days when it was just a GeoCities check to see if the subject had been ridiculous enough to create a page dedicated to their criminal enterprise (surprisingly/unsurprisingly, this was very common). However, I’ve only recently shuffled my way into the OSINT ‘scene’ which has popped up on Reddit/Twitter/forums. It’s great to see that there’s a network of people passionate about the subject area and there is a lot of great sharing and caring going on.

However, as with any community, a lot of buzzwords creep in, it starts to become a ‘club’, and newcomers flock in with wildly varying levels of experience. It’s great that things are opening up as it all adds new perspectives, but since getting involved I’ve seen a lot of people posting methodologies and suggestions which are…well pretty much just illegal. There’s no way to sugar coat it, and it doesn’t matter what jurisdiction you’re in: some of what is being shared as ‘OSINT methodologies’ fall directly into either harassment, stalking or fraud.

The main offender for this is ‘Social Engineering‘.

Social Engineering – what it is.

The Wikipedia entry for Social Engineering (the best we can get to the current consensus for the word) is:

Social engineering, in the context of information security, refers to psychological manipulation of people into performing actions or divulging confidential information. 


For anyone unsure of what Social Engineering is (which seems to be a lot of people), this video is the single best explanation:

As you can see, it’s basically lying to someone to get what you want via invoking the most holy trinity of the BLT. This is otherwise known as fraud.

Here’s the UK and the US definitions of fraud in case anyone has forgotten:

Fraud act 2006 (Section 2)

[A person commits fraud if they make…] a false representation, dishonestly, knowing that the representation was or might be untrue or misleading, with intent to make a gain for himself or another, to cause loss to another or to expose another to risk of loss.


US Code 18 (it’s a bit trickier in the US as there are many laws to choose from which cover it):

Whoever falsely and willfully represents himself to be a citizen of the United States shall be fined under this title or imprisoned not more than three years, or both.

(Many more under Chapter 47, Chapter 63 and stated cases)

Lets take the UK definition and apply it to our man Crash Override. He commits a false representation straight out by saying he’s “Mr Eddie Vedder from accounting”. He knows this is untrue and misleading, and he does this for gain (to get access to the modem number). It’s pretty straight forward.

As you can see, the definition of social engineering is just fraud by another name. Now some might start arguing that ‘manipulation’ doesn’t have to involve fraud, but I honestly can’t think of a ‘manipulation’ which wouldn’t be fraudulent in some way by most legal systems.

That isn’t what *I* mean though.

In computer security discussions, the term ‘social engineering’ is well understood – it’s the phishing scams and ransomware attacks. However this term which most people seem to understand in the context of compsec, somehow gets distorted when we talk about OSINT – I’ve seen posts with things like ‘if that doesn’t work try a bit of social engineering to see if you can find out x’ or ‘I couldn’t find out anything online so I used social engineering to get what I needed’.

Now I don’t think that people are quite recommending ransomware or similar – it’s more likely one of the below:

  • The writer doesn’t really understand what ‘social engineering’ is and just uses it as a buzz word for anything from adding a subject as a friend on Facebook to holding their spouse hostage.
  • They’re using it to reference social media OSINT methodologies.
  • The writer doesn’t want to say ‘lie to them to get what you want’.

Now the first one is a distinct possibility – we’ve all heard countless people use phrases in that sort of clunky, ‘I’ve-only-heard-this-at-a-conference’ way and I feel that a lot of people are using the term Social Engineering to sound a bit more ‘exciting’. It certainly sounds cooler than ‘…and then I looked at his Facebook feed until my eyes turned into sandpaper-y cubes.’

The second I think is mostly a way for OSINT practitioners to flag up that they do ‘social media stuff’ as well as Experian checks. I’ve been to a number of conferences recently where a worried director exclaims ‘won’t someone think of the social media platforms!’ after too much talking about any other type of OSINT service (or whilst the speaker is just taking a breath) and ‘social engineering’ sounds like they know how to do the facebooks.

More than that, I think some ‘OSINT evangelists’ are also trying to push such language in a marketing sense – the whole ‘we’re willing to go to the very edge of legality/I can kill a man with my intersects alone’ vibe sells contracts unfortunately.

I know. Why is this important?

Mostly because I feel there’s a lingering misunderstanding that using the word OSINT makes you somehow exempt from the usual rules. It’s the same as the ‘if it’s on the net then I can do what I like with it’ misconception.

Everyone who has been involved in this field in a professional setting knows that isn’t the case, but unfortunately a lot of newcomers seem to believe that there’s some sort of magical get-out-of-jail-free card available under the umbrella of ‘doing OSINT’. It’s always been a problem, but using ‘criminal’ terms like social engineering in a ‘valid tactic’ sort of way starts to muddy the water more than ever.

This is compounded by a mix up of what is acceptable when hired to do security/pentesting, and what is acceptable without such a contract – this may seem obvious to some, but for newcomers it’s not. They read @OSINT_BLACK_OPS_SN1PER_HACK3R tweet ‘wasn’t getting anywhere on my new contract so did a bit of social engineering and now I’m the CEO’s dentist’ and think ‘I’m learning how to do OSINT gathering! I’ll social engineer my local gardening club and see what Margery is really up to!’

Instead of the 50k bonus and as much mouthwash as he can swig for the rest of his life, our intrepid newbie ends up having awkward bedtime chats with cellmate and fellow stalker Profusely Sweaty Greg, all the while wondering why the magical shield of ‘just OSINT’ing’ didn’t protect him.

For some, this whole post will seem all very patronising and obvious – to others, it will seem like pedantry. I understand this, and I’m not trying to say that we should jealously guard our secrets or make the community any less welcoming to newcomers, but for those who are experienced I’d really like to ask you to try to share your knowledge responsibly and throw in a quick comment the next time you see someone getting told to ‘do a bit of social engineering’.

The road map from v0.6 to v0.7

Version 0.6 has been a bit of a turning point for the project – I’d say that Metadata Interrogator now does what I wanted it to do when I set out to create it. I wouldn’t say it’s reached ‘v1.0’ yet, but it analyses a whole raft of metadata above just the usual EXIF data, it has a useful timeline and highlights some interesting data both in comparison between files and across a whole data set.

As it’s got to that stage, I don’t think it’s necessary to do quite so many small iterations, and so the move between 0.6 and 0.7 is going to be a big one and take some time to do. The below is a sort of road map on where I want to take the project.

  1. Performance – This is the big one, currently it takes far too long to load up and analyse files. To try to get better performance (and maybe a smaller file size?!) I’ll try to port it across to Nuitka (https://nuitka.net). I’ve heard fantastic things about it, and I’m sure it’ll improve things no end. Early tests have not ended well however, so time will tell if I can gather together enough goat entrails and black candles to make this work.
  2. Release on Mac and Linux – I’m holding off on this till I sort out Nuitka, but it’s a main priority after that.
  3. Installer version – Whether this is necessary depends on the gains I get from Nuitka – if they’re impressive I’ll keep it portable, however if not I’ll add a version that can be installed which will give faster access for regular users.
  4. Pre-computed Analysis – One thing I’m very interested in doing is doing analysis on large data sets of files to work out what ‘normal’ metadata drift on a single device looks like. As in, how many (and which) fields change as one smartphone takes 100 different pictures. This will then be built into the software to highlight where the differences aren’t ‘normal’.
  5. Overhaul the UI – The UI works, but it’s still a bit janky in parts. This isn’t a top priority as it all seems to work roughly fine for me, but it’s not ideal.

As always, any comments or suggestions would be great.

Trimming down the Metadata Interrogator

Currently, Metadata Interrogator comes in at a completely-unreasonable-in-this-day-and-age 76mb file size. As promised, an attempt was made to reduce the size of Metadata Interrogator by using a different table GUI (pure TKinter rather than Pandastable). Unfortunately, even with my best efforts the file size reduction wasn’t that significant – 76mb to 55mb.

Whilst it is a reduction in size (and I might be able to shave off another 5-10mb) – for the reduction in functionality (and extra work in maintaining two versions) it’s not really worth it.

Now that we’re in the age of widespread broadband, terabyte USB sticks and colour TVs I feel it’s a roughly acceptable file size, although do get in touch if you have a really burning use-case for a much smaller file size.

Going forward, I’ll be working on new functionality and optimising the load. I’ll also release a zipped package version which should run faster and still stay relatively portable.

Digital Document Forensics Training

Digital Document Forensics (DDF) is what Metadata Interrogator is all about really, it’s trying to gather as much information as possible from a file – especially any ‘hidden’ attributes that might give us clues to who/what/where/when/how the file was made. 

This has obvious utility in a number of sectors, but if you’re interested in using it for counter-fraud/customer validation/KYC then you might be interested in an online course that I run on the subject. Your keen forensic senses might be able to warn you this is a slightly promotional post.

At the moment, the course doesn’t use metadata interrogator as I’m not quite happy it’s stable enough (although hopefully that will change soon!) and the course itself is much wider in scope than just gathering metadata. You also get a fancy certificate.

The training course covers (amongst other things):

  • PDF, MS Office and Image file specific analysis 
  • Email Analysis
  • File Signature analysis/magic numbers
  • Best practice of evidence handling (Hashing/storage)
  • Creating a professional forensic report.

If you’re interested, head over to: http://pdacounterfraud.co.uk/forensic-document-analysis/

MakerNote – the greatest secret there ever was.

You may or may not have noticed that there are tons of MakerNote fields that come up when photos are analysed. Some of these are followed by a descriptor, some just have something like 0x0002 and then a jumble of numbers and letters as the result. Whilst it sounds unlikely, this isn’t my awful programming causing this.

The MakerNote fields are custom fields allowed within the Exif standard that allow device manufacturers to store whatever they want in them. For some reason, these fields are also a jealously guarded secret by companies – some just aren’t listed, and others are encrypted. Even with my biggest and shiniest security hat on I can’t really understand what they could be storing in them that requires such security – I don’t know what you’d need to store on a file that would give insight into anything proprietary or that would have security implications.

Various efforts have gone into trying to decipher what these fields mean (and the ones known are included in Metadata Interrogator) however some are very difficult to guess. If there’s anyone out there that wants to team up on trying to decipher the fields of common devices, please get in touch.