Secure Web Input - Data Analysis

Posted by TxtEdMacs on May 3, 2008 11:40 AM EDT
LXer Linux News; By Herschel Cohen
Mail this story
Print this story

LXer Feature: 03-May-2008

In the introductory article for this series I painted my intent out in broad strokes. Now my task is to determine the absolute minimum data set I need to extract from the user input form. That is, what is sufficient to build dynamic content into the Open Source Today (dot) org site. The constraints I use may seem arbitrary, however, you too should use whatever is afforded you to simplify your tasks.

Overview



In the introductory article for this series I painted my intent out in broad strokes. Now my task is to determine the absolute minimum data set I need to extract from the user input form. That is, what is sufficient to build dynamic content into the Open Source Today (dot) org site. The constraints I use may seem arbitrary, however, you too should use whatever is afforded you to simplify your tasks. Yes, be aware of potential complexities, but only extend your data model when you have no alternative, not a moment sooner.



General Principle: the Wisdom of KISS



As an undergrad engineering aphorisms irritated me; they were so obvious. Nonetheless, later experience thought me both their truth and the obliviousness of those to whom it applied all too well. Though by temperament I am not predisposed to simplicity, I now see simplicity as a virtue. Recognition of where a system can become complex may exhibit insight, however, until needed it is best to add only data identities and types you absolutely require.



Beginning the Process



In order to avoid the need to leave this page, I am showing both graphics that were in the introductory article. That will make the discussion of the data required simpler. First it the data input form:



  OpenSourceToday News Item/Article Input Screen  Figure 1. News Item/Article Input HTML Form  


Next is the article listing example:



  Sample OpenSourceToday Article Listings  Figure 2. Sample OpenSourceToday Article Listings  


Policy - Need vs. Risk



     Date Data

You can see both graphics contain this type data and in the same format. Moreover, I intended to use at least two datetime data elements. For example, the date (and time) for the submission of the item and the expected or actual publication date. Neither would be taken from the submitted form. Data integrity and reducing risk is a partial explanation. I will say more in the next article.



     Title and File Name

The above should be simple text data types, primarily alpha numeric with a very limited set of symbols allowed. Both are required to build the link seen in the article listing, but alone they do not possess sufficient information to perform the task. Nonetheless, the actual location of the file is known at the server. Therefore, that is just one more reason not to have the user involved. What the user must insert an unique file name.



     Author's Name

Simple text allowing only alphanumeric input. I advise against using pick lists.



     Summary

This is a large text field that should contain mostly alphanumeric characters, but could easily contain external links. The size should be limited by the setting in the form.



This set alone suffices to create the article list entry shown in the graphic. Nonetheless, the user input form was created for submissions of articles. Therefore, the form requires a couple more data entry fields, those are the keywords and the article's content.



     Keywords

This type data should be a limited set of words and/or short phrases separated by commas. Hence, alphanumerics should suffice with unusual characters being suspect.



     Content

The content is text laced with html tags for headings, underline, bold and italics to name a few. Moreover, links to external sources are expected whereas an image might be rare it too could be present. Furthermore, the content could be significant size that could potentially hide malicious, tricky code. Despite the risk it must be accepted if articles are to be uploaded automatically.



Summation



Those listed are the data set that I saw as the minimum required to allow the ability to automate at least three important instances of content for the Open Source Today site. Notice there is not a one to one mapping of input fields to the output on either total or partial web pages. Where the option exists, internally known values are preferred over user input.



So here is the list:



  • Dates - datetime - internal
  • Links - text, title and file name - user input & internal
  • Author - text, user input
  • Summary - text, user input
  • Keywords - text, user input
  • Content - text, user input


not much, but it would have sufficed for multiple usages.



What's Next? Security



Begin reading now on the topic. Security is not my forte, I lack both the depth of knowledge and the temperament to be a good guide. Therefore, do not wait for this next article's appearance to research the field on your own.



I have indicated implicitly that security was an issue with a few of these data fields. However, the security problem is more generalized. That is, any outside data input is suspect. So the best I can do is outline what I would have done to lower the risk. Be assured my suggestions will be simple. Remember too I am dealing with a special case. Thus, it is my approach that is important, i.e. a layered set to hinder an intruder not a general prescription to obviate the security problems for all web sites.



Corrections, suggested extension or comments write please check for email address on the site http://bst-softwaredevs.com thanks.

     © Herschel Cohen, All Rights Reserved



  Nav
» Read more about: Story Type: LXer Features, Tutorial

« Return to the newswire homepage

This topic does not have any threads posted yet!

You cannot post until you login.