Business Intelligence, CodeProject, OLAP, Posting Code, Software

Exploring Data Cubes On NSimpleOlap (Alpha)

The goal of this article is to give a quick run down of the current features of the NSimpleOlap library, this is still in development with new features being added and with its API still undergoing refinement.

NSimpleOlap is an OLAP engine, more precisely an in-memory OLAP engine, and it’s intended for educational use and for application development. With simple to use API to allow for easy setup and querying, which I hope will make it easier to demonstrate the usefulness of OLAP engines in solving certain classes of problems.

This library will be provided as is and with no front-end, it is not directed towards use in finance, fancy dashboards or regular Business Intelligence use cases. In which over-hyping and exaggerated licensing fees, have in my opinion limited the scope of use of these systems and undermined their acceptance. Limiting their use to the BI silo that is often only for the eyes of business managers.

But it can be much more…

Starting With The Demo Application

I will start with the demo application, it makes for easier presentation and exploration of the main concepts required to query a data cube. This is a simple console application, a throwback in time for those that are more graphically minded, but it will do for the purposes of this article.

You can find the Demo application and the NSimpleOlap library following this link:

https://github.com/calexconc/NSimpleOLAP

The seed data in this demo application is a very basic flat file CSV file, although the base dimensions are referenced as numeric ids that have a correspondent in a supporting CSV file.

category, gender, place, Date, expenses, items
1, 1, 2, 2021-01-15,1000.12, 30
2, 2, 2, 2021-03-05,200.50, 5
4, 2, 5, 2021-10-17,11500.00, 101
3, 2, 2, 2021-08-25,100.00, 20
2, 1, 6, 2021-02-27,10.10, 5
1, 2, 2, 2021-08-30,700.10, 36
5, 2, 5, 2021-12-15,100.40, 31
1, 1, 3, 2021-09-07,100.12, 12
3, 2, 3, 2021-06-01,10.12, 30
2, 2, 2, 2021-06-05,10000.12, 30
1, 2, 1, 2021-05-04,100.12, 1
4, 2, 2, 2021-01-03,10.12, 6
2, 2, 3, 2021-11-09,100.12, 44
1, 2, 3, 2021-07-01,10.12, 8
4, 1, 1, 2021-04-24,100.12, 5
1, 1, 6, 2021-06-02,10.12, 7
4, 3, 6, 2021-05-18,100.12, 30
2, 1, 2, 2021-08-21,60.99, 8
1, 2, 2, 2021-02-16,6000.00, 89
4, 3, 6, 2021-03-07,600.00, 75
1, 1, 6, 2021-01-01,10.00, 12
4, 2, 2, 2021-07-28,2000.00, 30
5, 2, 6, 2021-12-20,50.10, 11
3, 1, 3, 2021-06-08,130.50, 2

Executing the demo application will show the following initial console messages.

You can type help to get a basic example of how you can make simple queries, and get the available dimensions and measures.

You can type a simple query, and get the outcome once you hit enter.

As you can see the results aren’t chronologically ordered, but in the order the cells were picked up by the query engine. This will be resolved once order selection is implemented.

Here’s another example.

And another example, but now focusing in the records that there is no data for gender.

As you can see some of the outputs have many empty spaces, because the test data isn’t very big. So in terms of the space of all available aggregations the current data cube is very sparse. But you can still view the data through different perspectives and have an idea of what is possible.

Starting Your Own Cube

At this stage of development you can define dimensions, measures and metrics. Being that you can define regular dimensions that define lists of attributes or entities (colour, gender, city, country, etc.), or define Date dimensions that need to be handled differently. Since these follow defined calendar patterns and need to be generated from the incoming data in the facts tables.

Measures are variables that were observed from the entities that are defined in the facts table, these can be quantities of goods sold or bought, value or price of goods, total value of invoice, temperature, rainfall, etc.. These will be aggregated inside the cube in various combinations, although this will entail a certain loss of context. Since the aggregated cell that resulted from multiple data points won’t tell much about the pattern of the input data. But a cube is about exploring the forest and not about the individual trees.

Metrics are expressions that are calculated at aggregation time, and these allow to make some extra calculations as well as to keep some extra data context in the cell. These calculated values can be averages, minimum and maximum values, or any expression made with a composition of the implemented operations.

Setting Up Regular Dimensions

When adding new dimensions you will need to initially setup your facts data source. In this particular example we will need to specify a CSV file and add the fields from the file that we want as sources for your Cube. Also, you will need to specify the data source that has the dimension member. Which will have the column that will be used as an id and the column that will be used as the dimension member name.

CubeBuilder builder = new CubeBuilder();

builder.AddDataSource(dsbuild =>
        {
          dsbuild.SetName("sales")
            .SetSourceType(DataSourceType.CSV)
            .SetCSVConfig(csvbuild =>
            {
              csvbuild.SetFilePath("TestData//facts.csv")
                              .SetHasHeader();
            })
            .AddField("category", 0, typeof(int));
        })
        .AddDataSource(dsbuild =>
        {
          dsbuild.SetName("categories")
            .SetSourceType(DataSourceType.CSV)
            .AddField("id", 0, typeof(int))
            .AddField("description", 1, typeof(string))
            .SetCSVConfig(csvbuild =>
            {
              csvbuild.SetFilePath("TestData//dimension1.csv")
                              .SetHasHeader();
            });
        });

Then you will need to map the columns in your fact data source with your cube dimensions.

builder.SetSourceMappings((sourcebuild) =>
        {
          sourcebuild.SetSource("sales")
            .AddMapping("category", "category");
        })

And then add the metadata mappings from dimension members data sources.

builder.MetaData(mbuild =>
        {
          mbuild.AddDimension("category", (dimbuild) =>
          {
            dimbuild.Source("categories")
              .ValueField("id")
              .DescField("description");
          });
        });

Setting Up Measures

Getting a measure into a cube requires only two steps, first map the measure column from the facts data source.

builder.AddDataSource(dsbuild =>
        {
          dsbuild.SetName("sales")
            .SetSourceType(DataSourceType.CSV)
            .SetCSVConfig(csvbuild =>
            {
              csvbuild.SetFilePath("TestData//tableWithDate.csv")
                              .SetHasHeader();
            })
            .AddField("category", 0, typeof(int))
            .AddField("expenses", 4, typeof(double));
        })

And then add the measure metadata mapping for the cube.

builder.MetaData(mbuild =>
        {
          mbuild.AddDimension("category", (dimbuild) =>
          {
            dimbuild.Source("categories")
              .ValueField("id")
              .DescField("description");
          })
          .AddMeasure("spent", mesbuild =>
          {
            mesbuild.ValueField("expenses")
              .SetType(typeof(double));
          });
        });

Setting Up Date Dimensions

Adding a Date dimension will add an extra layer of complexity, since you will need to specify what kind of Date levels you want the data to be sliced into.

You will start with mapping the Date field and in this case specify the date time format that it was set on the CSV file.

builder.AddDataSource(dsbuild =>
        {
          dsbuild.SetName("sales")
            .SetSourceType(DataSourceType.CSV)
            .SetCSVConfig(csvbuild =>
            {
              csvbuild.SetFilePath("TestData//tableWithDate.csv")
                              .SetHasHeader();
            })
            .AddField("category", 0, typeof(int))
            .AddDateField("date", 3, "yyyy-MM-dd")
            .AddField("expenses", 4, typeof(double));
        });

Add the mapping to the data source and indicate what label fields you want.

builder.SetSourceMappings((sourcebuild) =>
        {
          sourcebuild.SetSource("sales")
            .AddMapping("category", "category")
            .AddMapping("date", "Year", "Month", "Day");
        })

When defining the dimension metadata specify the dimension labels and the type of information the data will be transformed. In this this case you will have three dimensions: Year, Month and Day.

builder.MetaData(mbuild =>
        {
          mbuild.AddDimension("category", (dimbuild) =>
          {
            dimbuild.Source("categories")
              .ValueField("id")
              .DescField("description");
          })
          .AddDimension("date", dimbuild => {
            dimbuild
            .SetToDateSource(DateTimeLevels.YEAR, DateTimeLevels.MONTH, DateTimeLevels.DAY)
            .SetLevelDimensions("Year", "Month", "Day");
          })
          .AddMeasure("spent", mesbuild =>
          {
            mesbuild.ValueField("expenses")
              .SetType(typeof(double));
          });
        });

Setting Up Metrics

At the moment metrics can only be set after the Cube is initialized and not at configuration time, since that will require parsing text expressions. But you can still add metrics using the expression building API.

Setting up a metric will require you to identify what measures you want to use, and what maths operations are necessary to build it. As a simple example…

var cube = builder.Create<int>();
cube.Initialize();

cube.BuildMetrics()
    .Add("Add10ToQuantity", exb => exb.Expression(e => e.Set("quantity").Sum(10)))
    .Create();

This won’t do much to further the understanding of the data but it’s a start.

For more useful expressions you can also combine two measures and get rates and ratios.

cube.BuildMetrics()
    .Add("RatioSpentToQuantity", exb => 
     exb.Expression(e => e.Set("spent").Divide(ex => ex.Set("quantity").Value())))
    .Create();

Or use some useful functions and retain some context from the source data.

cube.BuildMetrics()
        .Add("AverageOnQuantity",
          exb => exb.Expression(e => e.Set("quantity").Average()))
        .Add("MaxOnQuantity",
          exb => exb.Expression(e => e.Set("quantity").Max()))
        .Add("MinOnQuantity",
          exb => exb.Expression(e => e.Set("quantity").Min()))
        .Create();

Getting More With Queries

A data cube is nothing if it cannot be queried, the NSimpleOlap fluent query API borrows many concepts from the MDX query language. You will need to get familiarized to specify your rows and columns as tuples. In general that is no different as setting paths or using something like xpath in XSL or any DOM XML API. You are not only slicing the cube but you are also defining what data hierarchies you want to visualize.

Defining a simple query and sending the output to the text console.

cube.Process();

var queryBuilder = cube.BuildQuery()
    .OnRows("category.All.place.Paris")
    .OnColumns("sex.All")
    .AddMeasuresOrMetrics("quantity");

var query = queryBuilder.Create();

query.StreamRows().RenderInConsole();

|                                | sex male  | sex female
    category toys,place Paris    |     12     |      8
 category furniture,place Paris  |     2      |      30
  category clothes,place Paris   |            |      44

You can also select both measures and metrics at the same time in a query.

var queryBuilder = cube.BuildQuery()
    .OnColumns("sex.All")
    .AddMeasuresOrMetrics("quantity", "MaxOnQuantity", "MinOnQuantity");

var query = queryBuilder.Create();
var result = query.StreamRows().ToList();

Making filters on the aggregate values and the facts is also possible. First we will filter on the aggregates.

var queryBuilder = cube.BuildQuery()
    .OnRows("category.All.place.All")
    .OnColumns("sex.All")
    .AddMeasuresOrMetrics("quantity")
    .Where(b => b.Define(x => x.Dimension("sex").NotEquals("male")));

var query = queryBuilder.Create();
var result = query.StreamRows().ToList();

Then we will reduce the scope of the data by filtering on a measure.

var queryBuilder = cube.BuildQuery()
    .OnRows("category.All.place.All")
    .OnColumns("sex.All")
    .AddMeasuresOrMetrics("quantity")
    .Where(b => b.Define(x => x.Measure("quantity").IsEquals(5)));

var query = queryBuilder.Create();
var result = query.StreamRows().ToList();

Making filters on the facts will generate a cube with a smaller subset of data. This makes sense since the main Cube doesn’t have the full context of the facts, and any operation that requires digging on the source facts will require generating a new Cube to represent those aggregations.

In Conclusion…

The NSimpleOlap core is getting more stable and it’s already possible to query on complex hierarchies of dimensions. But there is still much to do, getting Time dimensions, adding dimension levels through metadata, transformers to convert measure data into interval dimensions to be able to query age ranges, etc.. Also, some more work is required to have a structure to enable better rendering of row and column headers in a hierarchical structure. Much to do, and so little time…

Business Intelligence, CodeProject, OLAP, Software

Presenting NSimpleOlap (Alpha & Unstable)

NSimpleOlap is a project that I started in 2012 with the goal of building a stand-alone embeddable .Net OLAP library, that can be used within the context of console, desktop, or other types of applications. And my initial motivation for starting this project was that at the time there weren’t that many lightweight Open Source implementations. Or the implementations that suited my preferences were too expensive, or that would only exist as server solutions, etc..

In my previous professional path building tools for Marketing Research I was exposed to many of the tropes of what it’s called Analytics, and that gave me some understanding of the basics of Business Intelligence. Even after leaving the Marketing Research business I still kept an interest in the subject and the tools of the trade. And I researched the market for tools that had similar business cases, like survey and questionnaire applications, and OLAP and BI Server solutions. Some products struck a cord with me like Jedox, Pentaho, JasperReports, and I even dabbled on Microsoft SQL Server Analysis Services. But these were not the products I was looking for.

Since my Interests had shifted, I wanted a OLAP engine that could be used within the context of an application and that could do aggregations quickly on a limited dataset, or in real-time but with some caveats. And although it’s true that at the time there were some analytics solutions, like Tableau, that provide a full range of data, reporting and UI tools, and some real-time features. In 2012 I decided to start this project.

The project in the beginning of 2012 was actually evolving very quickly, unfortunately a personal mishap derailed everything. And for professional and personal reasons I wasn’t able or motivated to restart development on the project. But out of frustration and disillusionment with the way technical skills are evaluated I decided take a chance and get the project into a releasable state. And it’s my intention that this project will help to educate more developers on the utility of aggregation engines beyond the field of Business Intelligence and Finance.

At a personal level I am quite frustrated with the way interviews for developer roles are done, and how technical skills are evaluated, and all the selection process. From the box ticking, to questions about algorithms and data structures that are rarely or never used, or the online gamified technical tests, the code challenges that require several days of full time work (and that are suspiciously like real world problems), the bait and switch, etc.. And that is just the recruiting process, the actual work itself very often provides very little in terms of career growth. Being that in some cases, people that you work with have an incentive to devalue your skills, steal your ideas or just take advantage of you. Also, it’s annoying as hell to have to watch the constant manhood measurement contests, the attention seeking narcissists, the friendly backstabbers, and the occasional incompetent buffoon.

Well, that is out of my chest… Rant over.

The Project

At the present moment the NSimpleOlap project is still in alpha stage and unstable. And at the moment will only allow for some basic querying and limited modes of aggregation. Being that some of its features are still experimental, and are implemented in a way to allow for easy testing of different opportunities for optimization or/and feature enhancement. You can find it by going to the following Github repository:

https://github.com/calexconc/NSimpleOLAP

At the conceptual level NSimpleOlap borrows a lot from the MDX query language, the model of the Star Schema, and modelling and mapping conventions that are common in the modelling of data Cubes. As and example tuples and tuple sets are the way you can locate Cube cells, and can be used to define what information comes in rows or in columns. Examples of tuples are as follows:

Category.Shoes.Country.Italy
Year.2012.Products.All
Gender.Female.State.California.Work.IT

There are some concepts that you will need to be familiar so you can use this library:

Dimension – This is a entity or characteristic of your data points, it can be a socio-demographic variable like gender, age, region, etc., or product name, year, month etc..
Dimension Member – This a member of a dimension, in the case of gender an example would be “female”.
Measure – This is a variable value from your data points, it can be the sale value, number of items bought, number of children, etc..
Metrics – This is a value that is calculated from the aggregated results, can be an average, a percentage, or some other type of complex expression.

To be able to populate the Cube you will need to organize your data in a table that has all the facts, where the dimension columns have numerical keys, and that you have those keys and relevant metadata in separate dimension definition data sources.

Building Your First Cube

Building a Cube will require some initial setup to identify the data sources, mappings and define the useful metadata. In the following example we will build a Cube from data that is contained in CSV files, and these will be used to define the Cube dimensions and measures.

CubeBuilder builder = new CubeBuilder();

builder.SetName("Hello World")
.SetSourceMappings((sourcebuild) =>
{
  sourcebuild.SetSource("sales")
  .AddMapping("category", "category")
  .AddMapping("sex", "sex"));
})
.AddDataSource(dsbuild =>
{
  dsbuild.SetName("sales")
  .SetSourceType(DataSourceType.CSV)
  .SetCSVConfig(csvbuild =>
  {
    csvbuild.SetFilePath("TestData//table.csv")
    .SetHasHeader();
  })
  .AddField("category", 0, typeof(int))
  .AddField("sex", 1, typeof(int))
  .AddField("expenses", 3, typeof(double))
  .AddField("items", 4, typeof(int));
})
.AddDataSource(dsbuild =>
{
  dsbuild.SetName("categories")
  .SetSourceType(DataSourceType.CSV)
  .AddField("id", 0, typeof(int))
  .AddField("description", 1, typeof(string))
  .SetCSVConfig(csvbuild =>
  {
    csvbuild.SetFilePath("TestData//dimension1.csv")
    .SetHasHeader();
  });
})
.AddDataSource(dsbuild =>
{
  dsbuild.SetName("sexes")
  .SetSourceType(DataSourceType.CSV)
  .AddField("id", 0, typeof(int))
  .AddField("description", 1, typeof(string))
  .SetCSVConfig(csvbuild =>
  {
    csvbuild.SetFilePath("TestData//dimension2.csv")
             .SetHasHeader();
  });
})
.MetaData(mbuild =>
{
  mbuild.AddDimension("category", (dimbuild) =>
  {
  dimbuild.Source("categories")
    .ValueField("id")
    .DescField("description");
  })
  .AddDimension("sex", (dimbuild) =>
  {
  dimbuild.Source("sexes")
    .ValueField("id")
    .DescField("description");
  })
  .AddMeasure("spent", mesbuild =>
  {
  mesbuild.ValueField("expenses")
    .SetType(typeof(double));
  })
  .AddMeasure("quantity", mesbuild =>
  {
  mesbuild.ValueField("items")
    .SetType(typeof(int));
  });
});

Creating the Cube will require you to make the necessary method calls so the data will be loaded and processed. And this can be done as follows.

var cube = builder.Create<int>();

cube.Initialize();
cube.Process();

Querying The Cube

Querying the Cube can be done by using the querying interface, here’s a basic example:

var queryBuilder = cube.BuildQuery()
  .OnRows("sex.female")
  .OnColumns("category.shoes")
  .AddMeasuresOrMetrics("quantity");

var query = queryBuilder.Create();
var result = query.StreamCells().ToList();

In the previous example you streamed the results by cells, but you can also stream by rows:

var result_rows = query.StreamRows().ToList();

You can also add some basic expressions to filter on the table facts, this will limit the scope of the rows that will be aggregated.

var queryBuilder = cube.BuildQuery()
  .OnRows("sex.All")
  .OnColumns("category.All")
  .AddMeasuresOrMetrics("quantity")
  .Where(b => b.Define(x => x.Measure("quantity").IsEquals(5)));

var query = queryBuilder.Create();
var result = query.StreamCells().ToList();

Or you can add some basic expressions to filter on dimension members, which won’t affect the scope of the aggregated results.

var queryBuilder = cube.BuildQuery()
  .OnRows("sex.All")
  .OnColumns("category.All")
  .AddMeasuresOrMetrics("quantity")
  .Where(b => b.Define(x => x.Dimension("sex").NotEquals("male")));

var query = queryBuilder.Create();
var result = query.StreamCells().ToList();

Concluding & Hopes For the Future

In conclusion, there is still a lot of work to be done to have the sets of features like dimension levels, Date and Time dimension transformers, query expressions, etc.. Hopefully these features will be coming in the near future.

CodeProject, Project Management, Rambling, Software

How To Turn Around a Failing Software Project

In this article I will discuss some scenarios and situations that involve turning around a project and save it from the jaws of failure. I will also discuss about those cases where chances are slim or non-existent. And, discuss some of the possible solutions and pitfalls. A word of warning, this won’t be about heart warming stories and people probably won’t be singing kumbaya afterwards.

Failure and success can be framed in many ways for a project, depending if you are in a product company, a services company or a non-tech company developing internal projects. In the first case, success might be defined in terms of goals achieved (business and/or technical) and meeting deadlines, in a service company successful delivery on time and spec for a particular customer or customers. And, in the later it will be successful delivery on time, satisfying the requirements of the internal stakeholders. I know… This is all very fuzzy in terms of defining success.

While success is not guaranteed. Failure, on the other side, can rear its ugly head early on. When deadlines are missed, budgets burn like a match and technical flaws show themselves with high bug counts. As problems start to accumulate people start to get anxious, reputations are at stake, sometimes the company’s future is at stake. As the project spirals close to failure, changes are need to turn the boat around with a new captain on board.

Naval metaphors aside, being project manager under duress is very much like a soccer team manager that was hired to avoid relegation to a minor league (yay, sports metaphor). Some managers are specialized in turnarounds, it is not a pleasant job. Usually it comes with consequences for the team members, with some players getting health issues for the rest of their lives in the process.

But project management isn’t operating in a competitive sports tournament landscape, projects aren’t a sequence of games that need to be won against other teams and intermediate victories can be invisible to stakeholders, and defining success is not as easy as checking against a score.

Best Case Scenarios

Failure presents itself in many forms, sometimes for external reasons like a death of a key person, an unexpected downturn in the economy, a middleware product that was discontinued that can severely impact the chances of success. Many are internal, bad choices when selecting managers and team members, company culture and other human related issues. From my experience, technical issues are most of the time more tractable than human issues. Specially if what is required is more time to get around the learning curve. Human issues on the other side can spiral into a lot of drama and soap opera re-enactments with no extra dramatic music though.

For when you are presented into the drama of a failing project the best situation that you can have is:

Upper management is committed to the success of the project (a bit desperate will be nice, but not too much).
The previous project manager has left the company (more on that later).
The team members aren’t all duds and have enough skills to get the work done.
The budget isn’t running on fumes.
The company’s culture isn’t the cloak and dagger type.

The first thing that you will need to do is to understand what the project is, its goals and scope, timetables, technical architecture and technology dependencies. Afterwards, meet the team, understand what roles each one has, read their interactions and their demeanour, don’t make any judgements based of first impressions. Discuss what was done by the team so far, if possible do one-on-one meetings, check if the information from the team in the meetings, and on the individual interactions are consistent with each other. And, if the information that the team is giving is consistent with the feedback from upper management.

If possible, before interacting with the customer or third-party stakeholders get an understanding if the previous project manager has lied, mischaracterized, deceived or in any way told tall tales that have framed their minds about the state of the project. If the answer is yes, then thread carefully… People often get very emotional when the facts are told, so start getting some information about these third parties and don’t do any reveals on your first meetings with them.

On the first meetings with your peers and direct superiors discuss small on-boarding issues and check their responses, also evaluate their actions to small requests for information that are connected to the project and the company. In those meetings check how everyone behaves, determine if there are managers that are prone to snap or bully. Check how other managers react when they feel under pressure. This will be important to evaluate to what degree the systems of rewards and sanctions are applied within the company and what type of personality is usually hired for managerial roles.

After the initial stages of getting to know the team, peers and third parties, make an initial evaluation and determine what are the ones that can help you, those that are neutral and those that can hinder you. This evaluation is not a static evaluation but a continuous loop until the project is finished. So, when you implement an initiative, you will monitor the feedback and responses, and re-evaluate your position, and make corrective changes for the next iteration.

One of the items that requires quick evaluation is the overall composition of the team, identifying poor performers and team members that generate entropy. Team members that don’t contribute with much code and have a high rate of defects for the amount of code they contribute, or break the builds frequently and don’t listen to other team members feedback, should be removed from the team as quickly and painlessly as possible. Check for internal replacements if possible, prioritize for developers with a good track record and that know what they are doing. If you need to hire or contract, prioritize people who you already know and worked with before and meet the previous conditions – keep a habit of keeping in touch with developers that performed well in the past and that can still help you or can reference people who you can work with.

Keep an eye for armchair trainers that keep second guessing your actions, these can be specially problematic before you can show results and can frame the attitudes of others and of the upper management.

On the project side you will need to tightly manage scope, it is of the utmost importance to identify and focus on the core deliverables and avoid being distracted by side-features. Renegotiate the scope with the stakeholders and have them agree to a primary set of deliverables. This is the time, when it important to know in advance if the previous manager over-promised or lied about the status of the project. Scope needs to be negotiated at every iteration, and it needs to fit the capacity and the skills of the team, over-promising or over committing will not get your project on-time and on-spec.

To evaluate the status of the project you will need to meet regularly with the team members, either on stand-ups, sprint meetings or meetings to discuss project issues. Meet individually, if possible, to check on critical developments and evaluate what work is still required and if there are any blockers. Insist on a continuous release system with at least a set of sanity tests to verify early problems. Verify the feedback from the QA team and check how the defect count is evolving and verify if individual team members require help in resolving any defect. Prioritize stability over features, team members need to fix defects before starting new work.

On the QA side of things prefer a data driven approach, have a process for comparing the results of the software project the team is developing against an older version of the product, client or company data, or business scenarios developed by the QA team or Business Analysts. Use also methods to generate noisy data to verify how the software handles boundary conditions and errors. This will provide a benchmark of progress that will help in gaining the stakeholders thrust.

Be alert for software dependencies that start breaking your builds, track their versions and make a judgement call with the team about when to revert to an older version and when to fix the code that no longer works with the new version of the library. Schedule this work around deliveries and avoid doing this when pressed on a deadline. Also, avoid the temptation of switching from software libraries for capricious reasons, only if there are clear advantages and the schedule allows for rework.

Automate as much as you can, don’t fall for the trap that says there is no time for automation. Start automating at the very beginning by identifying a suitable candidate to do this work, and identify the work that has the most repeatable patterns. By doing this you can create enough slack that can be used to mitigate any need for extra rework due to unforeseen events, a missed requirement or to clear defects.

Make clear and frequent reports with the milestones that were reached, but beware of sounding too optimistic. That might create the incentive for upper management to start cutting resources, budget or reduce the time on the deadlines before critical work is finished. Also it might create exaggerated expectations that might be dashed by unknown unknowns. All of these items represent extra unnecessary stress and have the potential of becoming a reputation risk for yourself.

Find ways to reward the team when a difficult milestone is reached, it is important signal that their effort is appreciated. When communicating with them, be straightforward and clear. If there is a risk of the project being cancelled due to a change of mind of a stakeholder, let them know. Address these risks with the team and also the actions that are being taken that can mitigate the risk.

On this scenario a successful turn around is still a lot of work, but you are still dependent on managing the relationship with third-party stakeholders. These need to be reassured by the quality of the deliverables, and they need to feel that the scope of the project is bringing them value. If they feel otherwise they might be tempted to cancel their participation or cancel the project.

From this best case scenario I have given a template that will be expanded in the next cases that are much less optimistic.

Project Gone Wrong II

In this situation you will have your work cut out for you. Here there is an increased risk of things going wrong, mostly due to human issues. In this case the project is already showing signs of stress, and there are extra factors that can increase risk like:

Upper management requires quick results but isn’t committing much into new resources or support.
The team is unbalanced, with not enough highly skilled developers.
The budget is conditional on reaching goals, so cancellation is a constant spectre.
The previous manager is still working in the organization.

In this situation you will need to quickly determine which developers are good enough to continue the work, and identify those you need to try to replace as quickly as you can. You will also need to verify if the team lead is part of the problem and if you will need to find a new one. In this, being an insider actually pays off, you probably already know the people in the team and who in the company can be allocated. So that you are able to re-structure the team for the project needs. An outsider will have a tough time to quickly understand those that can get results from those that sit idle browsing Facebook as they fake being busy.

The other question is understanding why the previous manager left the project but still kept its job in the company, the reasons why will indicate if that will become a problem to you or not:

Sickness, or burnout which could indicate that he or she might have been under a lot of pressure.
Calling in sick, which can be used to avoid career destroying events and get the blame on someone else.
The previous manager has a powerful protector, and to avoid being tarnished by failure a new placement was found. In this case be very careful.
Just gave up on the project and negotiated another placement.

The previous manager can also act as a spoiler in the case your performance is better than his/her, to avoid looking bad he or she might feel the need to frame other managers opinion of you. So it is good to get an idea of his/her personality to estimate this risk, discreetly get some references from other people who worked with him/her.

Find if there were other managers meddling or trying to change the direction of the project to fit their needs. Check what is their influence and their standing in the pecking order. Check if they can be co-opted, or if they can be persuaded to stop and be on the sidelines. Otherwise, you might have to use guile to put them out of action. A warning, using such means will get you enemies that will not hesitate to punish you when the time comes.

Check with stakeholders and try to find what is their current level of commitment and what is their general mood about the project. Check also if there seems to be an indication that they are contacting or already in a working relationship with other companies or teams to do similar work.

Address any issues of inter-departmental conflict of interests. Like in the case that QA is a separate team. Check to what extent the head of QA has an interest in getting a bigger share of the available budget. And, if the QA team apply tactics that maximize the amount testing time or resources. This could be for example, finding critical defects a day before the release on every release. Always check for the incentives that can reward this kind of behaviour and how it can damage the thrust between the development and QA teams.

After identifying the sources of internal and external political risk devise a strategy to counter these risks, and to mitigate them so that the development team is insulated from theses issues and have the conditions to continue their work in relative peace.

Work to keep an effective control loop on the project status with the development team, track issues early on and don’t fall for the trap of magical thinking. Devise clever methods to get information from the tracking tools available so that you can automate part of your process.

Be on the lookout for motivational issues, if team members are becoming unresponsive or an undeclared conflict is going on between two team members. Or worse, the team has split into factions. Work to resolve these issues as quickly as possible and don’t let them fester. Cause your success will depend on getting them to work as a cohesive team.

In these scenarios, the challenge will be navigating the internal political situation in the office, maintain external stakeholders committed, and resolving any issues with the project team, while having an unfavourable hand at the start. And probably a lot of bumpy roads to cross…

Hopeless Cases

Not all projects are salvageable, and there are times you should consider to forgo such “opportunities” for development. And, you will need to be able to identify when these situations are presented to you. Because some people might have an interest in passing the buck to you or someone else. To avoid being fooled you will need to read the signs, like:

No commitment by upper-management and/or external stakeholders in the project.
No budget, and there are only vague promises of new budget allocations.
The team is filled with flunkies.
The company is a hotbed of toxic office politics.

For example, you might be told that you are replacing another project manager on a much hyped project, and that this manager will be promoted to a new position. And after some due diligence on your part (or you already know), you find that the project was nothing more than placeholder for the previous PM, and that this PM had already been fast tracked for a promotion, but needed some sort of “fluff” project to justify said promotion. Now, the project was never meant to produce results, but now it needs to be shutdown without tarnishing the reputation of the previous PM. And in this case, you are the designated fall guy, in this there is no upside for you. Even if you are capable of showing results, upper-management might actively undermine the project to protect the previous PM. Try to find a good excuse to not accept these situations or get a new job.

Avoid getting yourself involved in projects that everyone expects a hail Mary moment to save the day. If it reeks of desperation run!

If the development team is filled with friends of the previous PM, or some other manager, and have a care free attitude about work. And you can only count with those in the team and can’t hire new ones to replace them. Don’t loose your hair, find another placement. Avoid working with people who you don’t thrust their work or their commitment.

In Conclusion

Turning a project around and safely navigating it to success is an art, as you could see I didn’t talk much about technical aspects but more on the human factor. I believe that most projects that fail, usually get into that situation by human intervention. The technical side is as daunting as the time necessary for the learning curve to be mastered, and the amount of resources it requires to completion. Unless, you are trying the impossible given the technical means that exist and the concepts of the time.

Projects usually fail due to failure in scope management, failure in balancing the project team with skilled people, failure in properly managing relationships between the team and stakeholders, not having proper levels of commitment from the higher-ups, and failure to manage office politics.

To be successful, you will need to find your own style for dealing with each one of these issues. A good piece of advice is, find and retain your allies. You will need them, cause no man or woman can do all by themselves.

CodeProject, ML, Rant

Machine Learning Road To Disappointment

From the media hype around AI and ML (Machine Learning) we would feel that common usage is just around the corner, with promises of fantastic results in terms of productivity and dire news for human employment. Although some of this might be partially true, I believe that for most part, the promise is being oversold and that a partial disappointment is on the cards. When I say partial disappointment, it is that many of the techniques that were developed in the 1970’s onwards have either matured or the conditions are now present to make their use viable, but more is needed besides computing power and lots of available data.

I will talk more here about ML, and not about AI related techniques, because many of these techniques already have been taught in the universities for decades but seldom were applied in the context of private enterprises — Heck, even tried and proven statistical methods are rarely applied today, so it is no surprise that ML had little traction. In any case, ML was considered to be mostly in the realm of the ivory tower academic research full with dreams and lots of promises but very little practical results, not that the results weren’t impressive sometimes. The difficulty in translating research into applications and distrust on the part of decision makers, on the viability of said techniques in getting results without a very high cost. When I talk about cost, it is not only time taken and money spent, it is also about the credibility and reputation cost associated to projects that fail to meet expectations. Since there were practical difficulties that made it challenging to use ML, issues like:

Lack of appropriate datasets for training and testing models, this in time was much less of an issue.
Lack of computing power to run several model iterations, again this was less of an issue over time.
Lack of development environments with tools and libraries that could make testing and comparing different ML algorithms and different models much easier.
Lack of people who had both ML technical experience but also domain experience.

In the early 2000’s using ML still meant implementing ML in a general purpose programming language or Matlab, some statistical packages like SAS provided some tools, BI tools provided some extensions that enabled some ML facilities. But these were the realm of the specialist, these were tools that were either too expensive or time-consuming to not allow a wider usage. Only big institutions that generated massive amounts of data and had big pockets could afford these tools. Inference Models, Decision Trees and other techniques were used to detect credit fraud, health care fraud, parse genetic data, anything that meant sifting through tonnes of data and find a possible needle.

The advent of R and Python enabled a democratization of the access to ML learning, but the biggest kick to enthuse the interest in ML was Google, Apple and Facebook work in the field. Without products like Siri, Google Translator, and many other related bots and autonomous agents, this field would still be relegated to the research labs. Now, these uber companies compete to get the available AI/ML researchers to develop their portfolio of products in an AI arms race.

Like I said earlier, lack of available professionals in ML with domain knowledge was always a problem. Having a ML professional with no domain knowledge in the field creates its own type of friction. It makes communication difficult with related stakeholders and generally means a higher learning curve to develop a successful application. And since, there is a lot to choose from on the ML toolbox, from numerical and logistical regression, k-means, vector machines, decision trees and random forests, each method with its own particular strengths and weaknesses so good judgement is a key factor.

But ML comes with another set of aggravations, being mostly data driven and statistical in nature it fails in a key aspect, the human need for certainty and predictable outcomes when in an organizational setting. An organization structure likes predictability, our codes of law in some cases require it under threat of penalties, and shareholders love it. What ML can provide depends on the analyst capacity to tweak and adapt the model, and the training data available as to minimize the total error. This means, at any given time the model that is used will flag some cases as false when they are true (a false negative), or flag some cases as positive when they are false (false positive).

What manager would like to know for a critical business process that the ML model has a 63% accuracy, even if in reality the current process has only 55% accuracy but the process is well-known and familiar. Now, if the current process has 90% accuracy with human operators but costs 20 times more and takes weeks instead of hours. Well, there is always that trade-off moment…

This somewhat uncertain pay-off meant that organizations focused their IT efforts on the development of systems that automate processes through sets of required rules with the expectation that these are adequate for the business. And for many cases this was more than adequate, and has been quite successful. There is little need for AI in a simple CRUD front end that is merely an interface to push and pull form data in a database.

The problem arises when there is need to classify data, when there is a low signal to noise ratio, or there is too much data for a human to classify within a reasonable time-frame. These problems are becoming more frequent as organizations accumulate a lot of data or buy it for marketing purposes. One thing is to aggregate and cross tabulate with a OLAP engine on large datasets, which is useful but with some loss of context, the other is targeting with an ML algorithm on specific groups of individuals to achieve particular behaviours. This has the promise of making marketing budgets much more effective but also has very troublesome implications.

The trend to move for ML/AI will not be smooth from the point of view of development teams and organizations. Big tech companies like Google, Facebook, Amazon and Fintech companies can afford the R&D, and this matches their business models well, while older tech giants like IBM might struggle for relevance in the field. Tech startups can also do well in technical terms, though profitability might not be a sure thing. Non tech small and medium companies, and mature and conservative companies, might struggle a lot to make sense of it all and in some cases might get gobbled up or go out of business because of it.

In many of these companies development teams live in their own microcosm, sometimes the less is known the better. But there are traits that are common, here are some examples:

In many companies BI and application development are separate silos.
Team leads are suspicious of technologies they don’t understand. And, they try to push for the use of tools that fit they particular tech niche (don’t underestimate the need some people have to use a database for everything).
Another aspect is that ML might be used as a status project moniker to advance someone’s career with the blessing of management, but where neither it makes business sense or the person to be advanced doesn’t have the required skills.
The team lacks the skills and is hostile to changes in the technology stack that might jeopardize their job.
Risk aversion on part of middle management leads to paralysis and delays in program implementation.

This doesn’t mean that these companies are doomed, probably they can live well in their particular niche for quite a long time, till their whole development team is replaced by attrition, or people go up in the organization. Implementing ML in the context of a SME or on a mature non-tech company is not a recipe for success by itself, and for most cases it will be invisible within and outside the organization.

Return Type Unknown

Outputing thoughts and ideas

Category Archives: CodeProject