Linux News
The world is talking about GNU/Linux and Free/Open Source Software

Web Input - Securing Data, Second Level of Defense

Posted by TxtEdMacs on Jun 4, 2008 11:51 AM
LXer Linux News; By Herschel Cohen

LXer Feature: 04-Jun-2008

My implicit presumption in this series is that break ins are unplanned, opportunistic occurrences. Break in attempts are triggered by encountering an input form. As I mentioned previously, do not give information away needlessly. Moreover, I strongly suggest you consider becoming passively aggressive by making your presentation of the form and its expected input somewhat unpredictable. Moreover, I advise turning your data input into a simple waste of time and effort for those not trained to use the entry way.

Preventive Methods

Simple Rules

"Don't give away the store"
Separate Site & Input Server
Set Traps
Hybrid Approach - When in Doubt, Check

Now in more detail.

"Don't give away the store"

The foregoing is a sage dictum. To see one instance of what I mean, let's look at the data entry form from a previous article:

    Figure 1. News Item/Article Input HTML Form

The header showing the target site was used for a quick demonstration where I borrowed from an existing template. There should be no need to inform any visitor the purpose of the form, it is obvious. However, its intended target is not. Therefore, there is no need to give away information needlessly; do not use an identifying header as shown above.

I will use this as an extreme example, however, also be certain your labels do not contain an excess of information. The less said the better. I suggest investing time into training your users.

Separate Site & Input Server

When I mocked up this form, I planned to have it reside on a related site, but with no obvious surface connection to OpenSourceToday (dot) org. Now I would suggest even a more radical cleavage between the two. If possible, run it off a different machine, preferably using a server you completely own. For example, a home server piggy backed onto an individual's account that has access to a static IP. If you this suggestion, the load should be minimal, since only those trusted [1,] should be accessing this server [2.]. Do not fear excluding questionable visits.

I intend to make further suggestions, however, each person should be thinking how their site offers differing opportunities to fend off unwanted intrusions. Moreover, at best, this is a guide not a how to giving step wise instructions on hardening your site. My intention is to stimulate thought, some suggestions might not even be possible to implement (those may appear in a later installment). I think those suggested here should work in most instances.

Set Traps

I will keep the character of this portion of the discussion consistent by stressing the qualitative approaches. My thoughts revolve around adding simple inputs, that require ambiguous values. Both fields would give no guidance to the user and they would not contain default values. Furthermore, those values have to be consistent with other entries or the entire content is labeled as suspicious and dumped [3.].

My model system would have two Boolean input fields added. In my design the first might have almost any descriptive label, however, any input would result in an immediate failure once it reached the server. Do not use Javascript on the form; it could be read easily by viewing the page source. If the first is left empty, the second would require something that plays the role of saying either true or false. The actual values use would only those agreed upon by the code and the user(s). The users must be trained.

On OpenSourceToday site I would have added the field to ask if the entry was a "News Item?", if evaluated as true then both the Article and File Name fields [4.] had to be empty. If either or both had content, again with confidence the entire content could be excluded.

When we use techniques as these, which are simple and easy to implement there is satisfaction. However, it is too easily bypassed by simple social engineering techniques to be relied upon for the long term. Therefore, give some thought on how to extend their life expectancy [5.].

Hybrid Approach

The human eye is a quick judge of bogus content. When I merely look at even a non-brazen subject line by sight I can sense the scam or spam much better than a coded routine. In the same sense, I think it is wise to scan even those deemed good by code alone. Even obvious intrusion attempts can teach us where to put our efforts. Moreover, it is wise to select some percentage of those pasted and confirmed inputs before allowing them to be stored in your database [6.]. The goal of many intrusions are to compromising the data storage structure, hence, your database must be protected.

I strongly suggest viewing some subset of your total received submissions. In addition, I suggest giving thought how to fend off attacks with multiple and varied means. I think the more unexpected your approach the more likely your chances of avoiding your site being taken over by the unsavory. Nonetheless, some means suggested may be beyond the ability of your users. Thus, your limits are defined by those with which you serve.

Review

I want you to notice that while some code is necessary to employ the suggested counter measures, they are quite simple in comparison to the creation of regular expressions that will accurately pick dubious insertions. Nonetheless, do not become over enamored with this combination of automated trapping and this second line of defense. See the latter as more as judicious placement of a limited number of slippery areas that might for a time fool some intruders. Moreover, too soon it too will be compromised if you have a significant number of potential users. At least one will complain openly, to all that will listen, giving the details on how to bypass the traps you have set. Whatever the inputs needed will be divulged, hence, you will lose the advantage of surprise. Therefore, I advise you to have other ideas that will make it more difficult for potential intruders with the expectation that those too will be compromised by those that see every change as a personal burden.

Again, I urge you not to become too predictable. For example, should the details of the two added Boolean inputs be compromised, do not remove them from the form. There are still other options. You can cease evaluating the inputs or you could assign subsets of users with differing correct inputs. Whatever route you select, let the intruder worry by giving no obvious clues. Moreover, as I will stress in the next installment simply thank users for their input, especially those that are known to be trash. Real users can be informed by separate routes of perceived entry problems.

Yes we have made it a bit more difficult to breach our defenses, but as I mentioned this will not suffice for the longer term. I think we need another layer of defenses to hinder and discourage those that are in the business of compromising servers and sites.

What's Next?

First let me remind you and myself the goal really is automating use of the inputs to create dynamic content for a site like OpenSourceToday. However, I think it is also obvious that more counter measures are necessary to give us more confidence the data extracted is reliable before we move to the pasting it onto the site. Thus, we are require at least one, if not more, installments focusing on applying more secure methods to validate the data input.

I intend to broach the necessity of informing those responsible that there are indications the site is under attack. Then, I would suggest an easily implemented, tertiary method to indirectly confirm the validity of the input. However, I might stress first the need to limit access to the form, with up front code checks of the connection source, not the input (see footnote 2). I have many other ideas, but this phase must end so that we can finally see useful work performed. Thus, even if I end the security issues portion abruptly and seemingly arbitrarily, let me remind you this is a guide not a cook book so you must devise what is best for your purposes.

Corrections, suggested extension or comments write please check for email address on the site http://bst-softwaredevs.com thanks.

____________________________________________________________________

    1.  Take a look at some of the suggestions in the O'Reilly
        book Web Database Applications with PHP and MySQL Chapter
	11 Authentication and Security

    2.  One of the techniques is limiting it to known IP addresses,
        exclude all others.

    3.  The term "dumped" is not literal, however, it is possible 
        to learn from the content how these people attempt to 
	bypass tests.  It can be a guide on how to revise your 
	own protective code and devices.

    4.  News items were meant to be only short summaries with
        short lifetimes where the bulk of an article or a file 
	name was excessive effort.  Look at an example here.

    5.  I have some ideas, if you are interested write to me
        directly.  I will either add some suggestions in a later 
	installment or will respond directly.

    6.  I will tell you later why all input goes into temporary
        storage until certified.  That will be discussed in
	a later installment.