How to plan for Data Mining projects
http://www.dmreview.com/portals/portalarticle.cfm?articleId=1038094&topicId=230255
As someone who has suffered through the ambiguities of mis-scoped data mining intensive projects, I found the following article to be fairly useful.
Eric King is bold enough to identify that DM projects, especially the first engagements, are an evolving optimization problem and, hence, expectations should be inherently leveled to never expect a "final answer" nor anticipate a single pass.
A doomed project is typified by the following features where the buyers-
- Collect product literature from data mining tool vendors at industry events or as advertised in journals.
- Invite vendors whose retail price of their flagship product fits within available discretionary budgets to visit on site.
- Gain a free education in data mining through subjective presentations at the vendor's expense (too many are anxious to chase any sales bait, qualified or otherwise).
- Purchase a data mining tool from the vendor who presented last.
- Throw some data at the tool and await magical results.
- Stare at the numbers or even visualizations thereof, wondering why an angelic chorus did not accompany the results.
- Without knowing whether the results are useless or phenomenal, data mining is dismissed as hyped and/or pie-in-the-sky technology.
- Hire independent expertise in both the organizational/business problem being addressed and data mining and ensure some sort of symbiosis or a third-part liason, if need be. Bundling the task into one person might run risks. An in-house business manager might reject some very significant results on the basis of their seemingly contradicting his experience (...it is actually preferable not to have the industry's strongest domain expert who also happens to do some data mining. While the consultant may appear impressive at the outset, too much industry expertise can introduce subjectivity and preconceived notions that may skew the way models are developed and interpreted.). A pure data manager might rush to analyze the data, instead of focusing first on amassing a comprehensive understanding and assessment of the client's business model and all available resources.
As for the second case, anybody who has worked in any analytics team would tell you that the biggest problem in directing a team of statisicians is to instill in them the fact that the essence of the problem at hand is business and not mathematics for its own sake.
2. The paper suggests using a DMPA - DM Project Assessment to evolve a flexible framework of strategy. the output is usually a situational assessment regarding -
- Data Certification: A topical survey of the structure and nature of the data to support predictive analytics.
- Existing Resources: Additional tools may be recommended to support or replace existing products. Are the skills available in house to support the modeling process after deployment? What other technologies or methods have been used in the past? Are previous performance benchmarks available?
- Stakeholder Objectives: Are the questions to which executives seek answers aligned with the resources amassed in the findings? Are there desired and/or required performance levels? Are the benchmarks realistic from the consultant's experience?
- Functional Managers: There are many situations in which companies are either unable or unwilling to take the actions recommended by the model. (In the words of Jack Nicholson in A Few Good Men, it should be determined in advance if "You can't handle the truth!")
- Constraints: Are there hard boundaries that must be identified and built into the decision process - either before or after the model's implementation? Because virtually all data mining methods present a tradeoff between accuracy and explainability, a point on the scale should be defined. What are tolerable levels of false positives or negatives from the model?
- User Buy-in: If they won't adopt it, why build it? How may the system be designed to encourage dedicated use?
- IT Support: While usually not a deal-killer, IT is typically far more willing to support the model's function when they are included in the strategy and are invited to become data mining advocates. If IT is going to support another project that requires data access, it helps if they can also appreciate the high-level vision and benefits to the organization.
The value of the DMPA is all in the strategy, not the tactics.
Again the need for flexibility is stressed. The recommendations report from the DMPA will produce an overarching project plan. Early stages may be firmly priced. However, later stages may only be estimated because it cannot be known in advance what information will be derived from the data and how it should be leveraged.
In all, I think that a small pilot on some sample data can also solve the same problem. The added advantage would be that this would also yield an inkling on what the final results would look like. Hence, a strategy may be evolved from the issues faced in running the pilot and an idea of the "truth" might save the client emptional and monetary expenses in the future if he decides that the results, however robust, are not something he can work on right now. Maybe due to internal resistance, no control over the top factors identifed, whatever.
I think, that cleints have to understand DM is a consultancy project. It can just offer a diagnosis or suggestion. Implementation is the onus of the client.
Hence, like any consultancy, DM involves buy-ins from the stakeholders and all-round honesty.

3 Comments:
I work with web analytics-a client side data logging..wherein the only four letter word i am made to think about is "DATA".Is it in some way similar to data analytics.. Yet to read all te links that u have given! J!!!!!!
yes!
Web analytics is one of the most critical aspects of data analytics emerging. Data Analytics is simply finding patterns in large data by
1. either summarizing it at different levels using OLAP - Ex. Auto insurance claims summarized by City, Age or Gender
2. Using the statistical property of large data converging to patterns to discover these patterns
Small world:)
Nice to have met u(ur blogs n comments)..Has bcum a kin of ambrosia to me(Lol)
Post a Comment
Subscribe to Post Comments [Atom]
<< Home