Secure Web Input - Data Analysis
In the introductory article for this series I painted my intent out in broad strokes. Now my task is to determine the absolute minimum data set I need to extract from the user input form. That is, what is sufficient to build dynamic content into the Open Source Today (dot) org site. The constraints I use may seem arbitrary, however, you too should use whatever is afforded you to simplify your tasks. Yes, be aware of potential complexities, but only extend your data model when you have no alternative, not a moment sooner.
General Principle: the Wisdom of KISS
As an undergrad engineering aphorisms irritated me; they were so obvious. Nonetheless, later experience thought me both their truth and the obliviousness of those to whom it applied all too well. Though by temperament I am not predisposed to simplicity, I now see simplicity as a virtue. Recognition of where a system can become complex may exhibit insight, however, until needed it is best to add only data identities and types you absolutely require.
Beginning the Process
In order to avoid the need to leave this page, I am showing both graphics that were in the introductory article. That will make the discussion of the data required simpler. First it the data input form:
Figure 1. News Item/Article Input HTML Form
Next is the article listing example:
Figure 2. Sample OpenSourceToday Article Listings
Policy - Need vs. Risk
You can see both graphics contain this type data and in the same format. Moreover, I intended to use at least two datetime data elements. For example, the date (and time) for the submission of the item and the expected or actual publication date. Neither would be taken from the submitted form. Data integrity and reducing risk is a partial explanation. I will say more in the next article.
Title and File Name
The above should be simple text data types, primarily alpha numeric with a very limited set of symbols allowed. Both are required to build the link seen in the article listing, but alone they do not possess sufficient information to perform the task. Nonetheless, the actual location of the file is known at the server. Therefore, that is just one more reason not to have the user involved. What the user must insert an unique file name.
Simple text allowing only alphanumeric input. I advise against using pick lists.
This is a large text field that should contain mostly alphanumeric characters, but could easily contain external links. The size should be limited by the setting in the form.
This set alone suffices to create the article list entry shown in the graphic. Nonetheless, the user input form was created for submissions of articles. Therefore, the form requires a couple more data entry fields, those are the keywords and the article's content.
This type data should be a limited set of words and/or short phrases separated by commas. Hence, alphanumerics should suffice with unusual characters being suspect.
The content is text laced with html tags for headings, underline, bold and italics to name a few. Moreover, links to external sources are expected whereas an image might be rare it too could be present. Furthermore, the content could be significant size that could potentially hide malicious, tricky code. Despite the risk it must be accepted if articles are to be uploaded automatically.
Those listed are the data set that I saw as the minimum required to allow the ability to automate at least three important instances of content for the Open Source Today site. Notice there is not a one to one mapping of input fields to the output on either total or partial web pages. Where the option exists, internally known values are preferred over user input.
So here is the list:
not much, but it would have sufficed for multiple usages.
What's Next? Security
Begin reading now on the topic. Security is not my forte, I lack both the depth of knowledge and the temperament to be a good guide. Therefore, do not wait for this next article's appearance to research the field on your own.
I have indicated implicitly that security was an issue with a few of these data fields. However, the security problem is more generalized. That is, any outside data input is suspect. So the best I can do is outline what I would have done to lower the risk. Be assured my suggestions will be simple. Remember too I am dealing with a special case. Thus, it is my approach that is important, i.e. a layered set to hinder an intruder not a general prescription to obviate the security problems for all web sites.
Corrections, suggested extension or comments write please check for email address on the site http://bst-softwaredevs.com thanks.
© Herschel Cohen, All Rights Reserved
You cannot post until you login.