Web Input - Securing Data, First Level of Defense

Posted by TxtEdMacs on May 19, 2008 9:45 AM
LXer Linux News; By Herschel Cohen

LXer Feature: 19-May-2008

This article focuses upon testing the reliability user input at the lowest level. The first line of defense is use of automated searches that might detect malicious inputs. Personally I wish there were a better option. Being realistic, we are confronting coders with superior skills that have added advantage of surprise, stealth and economic incentives. Whereas we are reactive to new or suspected threats as they arise or worse discovered later.


This article focuses upon testing the reliability user input at the lowest level. The first line of defense is use of automated searches that might detect malicious inputs. Personally I wish there were a better option. Being realistic, we are confronting coders with superior skills that have added advantage of surprise, stealth and economic incentives. Whereas we are reactive to new or suspected threats as they arise or worse discovered later. However, those specializing in cracking sites employ less skilled, so-called script kiddies to exercise their attacks from a safe, detached distance. While the creators gain the benefit of viewing the effectiveness of their attacks with minimal personal risk, that might be to our benefit. That is, we are not fending off the first team.

Despite the focus upon the input details, I still wish to examine the security issue in global terms. Qualitatively my questions are: how great is the risk, are there identifiable threat vectors and what effective steps can be employed to counter them? A bit of clarification of terms, by effective measures I mean not only in proportion to the threat, but in economic terms do these efforts imposed upon us consummate with the gain. Or are we doomed to ever greater exertion at the cost of the value of our sites?

Those are the issues,just do not expect a definitive guide from me. I prefer to gain leverage using the simplest techniques, that are economic efforts in terms of the time and effort expended. Moreover, I will take advantage of every odd, non-typical or even unique peculiarities of a site and the users to make it harder for the break-in artists. I advise you to seek similar advantages.

The Broader Perspective

Luckily for most of us, when an intruder arrives figurative at our site doorstep, it is not by design. If they break in, it is a crime of opportunity, not because you were specifically targeted. If this is the case, then we can discourage them long enough they will seek "lower hanging fruit" elsewhere. Therefore, I suggest to resist incursions by various methods in differing locations to be just enough to deter further efforts.

Let's assume your site is targeted, do you surrender to the inevitable? I think not, because more likely your site is in a group having desirable content, i.e. financial data, personal identities information that can be easily sold. Most likely though valuable, your data is not unique. Once again we can resist enough to divert to efforts to easier targets if you use a layered defense. Rarely will the effort be personal. Moreover, the intruder must worry about leaving a forensic trail by exerting too extensive an effort. The probabilities still favor of your dealing with a lower skill level attackers. Therefore, resistance is not futile and unexpected counter measures should be employed whenever possible.

General Assessments

Truthfully I have no accurate measure of the risk imposed upon site operators. Moreover, we do not all have the same options or tools at our disposal. For example, those owning their own servers run from their own physical locations with trained, competent staff have many advantages. Operating our own web server at a hosted location at least gives us hands on control of the server with a professional staff as aid. However, our own mistakes or inattention could allow our server to be compromised. Should that happen, we would be at fault. On a shared site, conditions are more murky where the errors of commission, of omission or worse by the staff could render our sites exposed. I have no easy solution for the latter, other than being observant and complaining bitterly if the intruder were allowed in the back door. However, document your assertions, suspicions alone do not suffice. Moreover, it would be very embarrassing if it could be shown it was your error caused the breach and others were compromised.

It would be nice if merely specifying Linux would suffice, but it alone cannot stop all intrusions. Whatever situation we find ourselves we should put up any barrier we can to prevent easy access to unwanted visitors at restricted entry points. Further, I advise using whatever is different than most sites to place traps or spurious directions. Study and adopt any tactic that you are comfortable employing. We must all do what we can stop criminal trespass.

Realistically, if those devoted to breaching site defenses are working on the issue full time, most of us are in no position to counter them. We are trying to build, maintain and add content to web sites. We do not have the time to learn all the tricks necessary to match our best adversaries. However, that expenditure of energy seems not to be necessary. Most sites give the appearance of not having been breached and some are very poorly built. As examples I have seen is a local site requiring a very large id (long numeric) with a guessable password, which to my dismay are sent in clear text. Another was a commercial site that did the same thing, this time including full credit card information. A third experience was much more significant. I received a new credit card a year early, which was due to the bank's data store being breached. So we can conclude, both risky sites exist for the taking and successful break ins occur what should be professionally defended sites. Therefore, I think we need to design our defenses to fend off attacks as a matter of principle.

My Working Principles

     All External Data is Suspect

Perl scripting has an interesting concept that I mentioned previously, but not by name. That is, in perl all external input is seen as "tainted", meaning untrusted. Much of the discussions I have read when I played with perl and the web was the need to untaint data. However, the efforts can be daunting when the tool is regular expression pattern matching. The latter technique is difficult to employ, since regular expressions can be difficult to write that reliably match suspicious code. False positives can make the costs prohibitive.

     Verifying or Untainting Data

Much of the effort is predicated on extensive use of carefully built regular expressions to catch potentially malicious code. If I were to take this route, I would rely upon whatever knowledge I could garner from more experienced users on the various scripting forums/mail lists. Those lessons might not suffice for the input content I need. Moreover, I would have to resign myself to the necessity of following recent security breaches to be prepared to revise previous solutions. I have concluded it would forever be a reactive battle where those writing the new means to fool the standard responses were better coders than I. I would lose all control and my efforts would be dictated by external factors.

I see pattern matching with regular expressions as just one tool in our arsenal. Yes, I will use it, but sparingly. Where possible I prefer dumping user input than vetting it using exhaustive testing. We need to be unpredictable to make the code intrusions less likely. They are in the business that values return on investment where yield matters.

     Simple is Better

The best programmers I have encountered see laziness as a virtue. Hence, beg, borrow or steal anything that will lessen your effort to combat the threat. I suggest being open to a layered approach where a simple non code defense combined with more complex code routines yields a stouter defense. Moreover, search for characteristics that are unique to your site, your applications and your users. Use that uniqueness to stymie the more standard break in attacks. Above all, if you can, keep it simple.

Please Note

I see no reason to repeat showing of the input form in this installment. However, If you need the reminder, I suggest your checking the original introductory article on LXer or the one on data analysis to view the form. I should say too, that this series is based on essentially a thought experiment, hence, I do not feel the urgency that I might have if I were really implementing this model system. Therefore, my stance is more detached, where I try to balance limited, simple efforts that might suffice. That is one of the reasons I opt for the hybrid approach.

First Level of Protection

The data fields are copied from the data analysis article in the link just above. It also shows my disposition of each field based on perceived need. Now each one is listed with an exposition of the reasoning, this time in more detail. Then we look at the security that can be applied. Here is the list:

  • Dates - datetime - internal
  • Links - text, title and file name - user input & internal
  • Author - text, user input
  • Summary - text, user input
  • Keywords - text, user input
  • Content - text, user input

Let's go through the list, but keep in mind your needs and options probably differ from mine.

     Date Data

I could dispense with this field due to my wanting the essentially a date-time stamp when the data was submitted and the latter's relation to our predetermined publication schedule. I also see that field as a possible avenue to introducing devious code. Indeed it is shown it might be possible to gain some administrative rights [1.] via an attack using a date type entry[2., 3.].

In a subsequent article I will show you the form code, where I intended but did not implement obtaining the system date, formatting it and placing it in the field. If the field is taken for data input, the code would have to confirm its actually meet the criteria of being a date type. It would have to be formatted to properly be entered if their were a correctable error in entry and some sort of search for malicious code would have to be written. So I avoid most of those difficulties by throwing the input for this field away. If possible, I suggest your doing the same.

     Author's Name

Here I had the advantage of having a single author, however, one that might have opted to use more than one name. That is one of the reasons I suggested avoiding a pick list. I would have avoided filling in the name too though it should be known on the basis of the identity and password form that would lead to this form. Do not give away information without good cause.

My test regime would have been to test for an exact match of names. Even a typographical error would dump the entire contents. I would not bother running a regular expression matching tests. Any code added content other than trailing spaces would suffice to fail the input. Moreover, I would make no attempt to notify the user of the error from messages sent from the server to the machine doing the input.

I know this may not be possible for every site. Too many insist on their personal ease, while damning later security problems on others. Hence, for those in such an adverse position, exclude any tags, indication of redirection, SQL commands, SQL comment symbols and unexpected key words or commands [4.]. Another option is to use my suggested exact match test of the name and if it fails flag contents for visual inspection before it is processed further. Even the obdurate might understand the need to halt suspicious input.

     Title and File Name

We are not evaluating the link's veracity, hence, I would have looked at the title and file name separately using regular expression matching for the same set of questionable content outlined just above. I have no simple option for either, except the file name. While a dash ("-") or an underscore ("_") might be acceptable a period would not. Moreover, the presence of a forward or back slash would kill the process as would most other symbols. I would have had the advantage of having prior agreement with my author.


This is probably the last field that has a chance of working with the rough regular expression matching outlined above. However, the search for the SQL key words would have to be modified to see an entire clause, because the single instances of some words would be appropriate. Moreover, I would have to worry about SQL syntax that might work on my database server. That might not be an easy set to define. Commas might be used to search for too long text strings, but I would not rush into creating a working regular expression to catch that potentially questionable content.


It starts to get a lot harder with more lengthy input that might actually describe suspicious type content, yet be fully valid. Nonetheless, a few symbols might still be questionable and should be at least flagged if not crippling the attempted input. For example, I would be very suspicious of the pound "#" symbol, but now you have seen a perfectly valid usage. My worries would be that it could be used as part of an SQL injection attack. That symbol comments our all text that follows. The classic case is where the intruder guesses a valid user id but the comment negates the second part of the password test and allowed entry. However, with my author that would be unlikely.

I will mention one instance that should arouse suspicion immediately. If there is an indication of an html redirect, wherever it is found should be rejected. That applies to previous inputs.

I would impose an upper limit, by agreement with my authors. However, I would probably allow the field to hold about twice that limit. The test I would impose is if that portion greater than should be used is not empty space, I would reject it.


I deem that regular express pattern matching, in my hands, would yield too many false positives. I would limit my flagging the content as dangerous only upon finding a redirection or content beyond the agreed maximum size. Even with the author I had the variation would not be easily predictable.

What Safety Have We Bought?

In my assessment and using the vernacular: Not much. But this is only the first level of resistance. We can employ other simple, more effected steps if we can mask our intent and means. For reasons I will explain in a subsequent article that may not be easily accomplished against the hardened break-in professional. Therefore, let us hope we face only the more amateurish B Team.

Hopes alone will not suffice to protect your site or server. Hence, searches for problematic inputs must be run. As I mentioned try to concentrate on the easier cases, that is, odd symbols and html are out of place where the expected input should only be simple text. Moreover, redirection to other web pages should be extremely rare to non-existent, so its presence should be taken as a dangerous input. Furthermore, SQL comment symbol should be treated with suspicion, though it has other valid usage, hence, search for it where it does not belong. However, SQL injection could be the greatest danger, if your site is highly automated. Nonetheless, the English like words are hard to flag if the suspicious commands are not known in advance. Another problem is abbreviated commands work on some database engines. Yes searches have a role, but it will not fully protect your site.

What's Next?

The next installments I will offer a combination of qualitative measures that many could employ to make it more difficult for the intruders. And in a few cases I will suggest more detailed methods I would have investigated if I were to implement a similar html form to gather user input. Again, repeating myself, it has been my experience that sites that purport to use security are many times lax and / or ignorant of basic steps. Unfortunately, I have encounter that even on a commercial site. This would be laughable were it not for the risk they impose upon their users, unknowingly. So don't make those errors. I have also seen it done on a hosting site, where implicitly the clientele depend upon the competence of the support staff and their procedures. Here your options are limited, to getting your own server or like most of us, hope we do not become victims.

Until next time, do what you can to be safe.

Corrections, suggested extension or comments write please check for email address on the site http://bst-softwaredevs.com thanks.

     © Herschel Cohen, All Rights Reserved


1. Previous link a summary, this leads you to the pdf, which is three pages showing the code how the attack might work.

2. I would suggest more than a pinch of skepticism should be swallowed before seeing Oracle as fatally flawed. The same technique should apply to other databases too.

3. My skepticism, above, is warranted as the execute immediate* as it is known in Sybase was added in version 12.0 and extended in 12.5, the last version I used. More- over, SQL Server has the exec() function (a.k.a dynamic SQL) that appears could execute similar code that is so dangerous in Oracle.

4. Again depends on content, e.g. an article discussing database commands and techniques used in creating web pages might contain matches to what could be otherwise nasty trick code.

* See Tips, Tricks & Recipes for Sybase ASE by Rob Verschoor Sypron Publications Netherlands pp 229 -233 (2003) [later editions are available]

Return to the LXer Features

This topic does not have any threads posted yet!

You cannot post until you login.


  Latest Features
Hans Kwint: Updating from Ubuntu LTS 16.04 to 18.04
May 03, 2018

Dr Tony Young: A KMail Breakthrough.
May 01, 2016

James Dixon: Installing jstock with Slackware 14.1
Jan 19, 2016

James Dixon: Installing sbopkg with Slackware 14.1
Jan 16, 2016

James Dixon: Open Source Wealth Management
Jan 15, 2016

Hans Kwint: GNUifying Windows: Make the best of imposed Windows-use at work
Oct 23, 2015

Dr. Tony Young: Update to "How long is a piece of string?"
Oct 20, 2015

Russell Hollander: Chromebooks, Linux, and Lenovo
Sep 23, 2015

Scott Ruecker (Phoenix, U.S.) : Interview With Richard Kenner of AdaCore
Aug 29, 2014

penguinist: Better Than a Quad-Head Display: My Adventures with "4K" 2160p and Linux
Mar 31, 2014

View all

  Search Features

Search LXer Features:

[ Copyright © LXer | All times are recorded in Central Daylight Time (CDT) ]

[ Contact Us | Privacy Policy | Terms of Service | About us | rss | Mobile ]