Random Mandalas Generator

Introduction

This blog post proclaims the Python package “RandomMandala” and fully describes its function random_mandala that generates plots (and images) of random mandalas.

The design, implementation strategy, and unit tests closely resemble the Wolfram Repository Function (WFR) RandomMandala, [AAf1].

(Another, very similar function at WFR is RandomScribble, [AAf2].)

The Bezier mandala seeds are created using the Python package bezier, [DHp1].

For detailed descriptions of Machine Learning studies that use collections of random mandalas see the articles [AA1, AA2] and related presentation [AAv1].

Remark: This Markdown file was automatically generated from the notebook: “RandomMandala-package.ipynb”.


Installation

To install from GitHub use the shell command:

python -m pip install git+https://github.com/antononcube/Python-packages.git#egg=RandomMandala\&subdirectory=RandomMandala

To install from PyPI:

python -m pip install RandomMandala

Details and arguments

  • The mandalas made by random_mandala are generated through rotational symmetry of a “seed segment”.
  • The function random_mandala returns matplotlib figures (objects of type matplotlib.figure.Figure)
  • The function random_mandala can be given arguments of the creation function matplotlib.pyplot.figure.
  • If n_rows and n_columns are None a matplotlib figure object with one axes object is returned.
  • There are two modes of making random mandalas: (i) single-mandala mode and (ii) multi-mandala mode. The multi-mandala mode is activated by giving the radius argument a list of positive numbers.
  • If the argument radius is a list of positive reals, then a “multi-mandala” is created with the mandalas corresponding to each number in the radius list being overlain.
  • Here are brief descriptions of the arguments:
    • n_rows: Number of rows in the result figure.
    • n_columns: Number of columns in the result figure.
    • radius: Radius for the mandalas, a flot or a list of floats. If a list of floats the mandalas are overlain.
    • rotational_symmetry_order: Number of copies of the seed segment that comprise the mandala.
    • connecting_function: Connecting function, one of “line”, “fill”, “bezier”, “bezier_fill”, “random”, or None. If ‘random’ or None a random choice of the rest of values is made.
    • number_of_elements: Controls how may graphics elements are in the seed segment.
    • symmetric_seed: Specifies should the seed segment be symmetric or not. If ‘random’ of None random choice between True and False is made.
    • face_color: Face (fill) color.
    • edge_color: Edge (line) color.

Examples

Load the package RandomMandalamatplotlib, and PIL:

from RandomMandala import random_mandala, figure_to_image
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.cm
from PIL import Image, ImageOps
from mpl_toolkits.axes_grid1 import ImageGrid
import random

Here we generate a random mandala:

random.seed(99)
fig = random_mandala()
png

Here we generate a figure with 12 (3×4) random mandalas:

random.seed(33)
fig2 = random_mandala(n_rows=3, n_columns=4, figsize=(6,6))
fig2.tight_layout()
plt.show()
png

Arguments details

n_rows, n_columns

With the argument n_rows and n_columns are specified the number of rows and columns respectively in the figure object; n_rows * n_columns mandalas are generated:

random.seed(22)
fig=random_mandala(n_rows=1, n_columns=3)
png

connecting_function

The argument connecting_function specifies which graphics primitives to be used over the seed segment points:

fig = matplotlib.pyplot.figure(figsize=(6, 6), dpi=120)

k = 1
for cf in ['line', 'fill', 'bezier', 'bezier_fill', 'random', None]:
    random.seed(667)
    fig = random_mandala(connecting_function=cf,
                         figure=fig,
                         location=(2, 3, k))
    ax = fig.axes[-1]
    ax.set_title(str(cf))
    k = k + 1
plt.show()
plt.close(fig)
png

With values None or "random" a random choice is made from ['line', 'fill', 'bezier', 'bezier_fill'].

radius

In single-mandala mode the argument radius specifies the radius of the seed segment and the mandala:

fig = matplotlib.pyplot.figure(figsize=(8, 4), dpi=120)
k = 1
for r in [5, 10, 15, 20]:
    random.seed(2)
    fig = random_mandala(connecting_function="line", 
                         radius=r,
                         figure = fig,
                         location = (1, 4, k))
    ax = fig.axes[-1]
    ax.set_title("radius:" + str(r))
    ax.axis("on")
    k = k + 1
plt.show()
plt.close(fig)
png

If the value given to radius is a list of positive numbers then multi-mandala mode is used. If radius=[r[0],...,r[k]], then for each r[i] is made a mandala with radius r[i] and the mandalas are drawn upon each other according to their radii order:

random.seed(99)
fig3=random_mandala(radius=[8,5,3], 
                    face_color=["blue", "green", 'red'],
                    connecting_function="fill")                
png

Remark: The code above used different colors for the different radii.

rotational_symmetry_order

The argument rotational_symmetry_order specifies how many copies of the seed segment comprise the mandala:

fig = matplotlib.pyplot.figure(figsize=(6, 12), dpi=120)
k = 1
for rso in [2, 3, 4, 6]:
    random.seed(122)
    fig = random_mandala(connecting_function="fill", 
                         symmetric_seed=True,
                         rotational_symmetry_order=rso,
                         figure = fig,
                         location = (1, 4, k))
    ax = fig.axes[-1]
    ax.set_title("order:" + str(rso))
    k = k + 1
plt.show()
plt.close(fig)

png

number_of_elements

The argument number_of_elements controls how may graphics elements are in the seed segment:

fig = matplotlib.pyplot.figure(figsize=(6, 6), dpi=120)
k = 1
for ne in [2, 3, 4, 5, 6, 12]:
    random.seed(2)
    fig = random_mandala(connecting_function="line",
                         symmetric_seed=True,
                         rotationa_symmetry_order=6,
                         number_of_elements=ne,
                         figure = fig,
                         location = (2, 3, k))
    ax = fig.axes[-1]
    ax.set_title("n:" + str(ne))
    k = k + 1
plt.show()
plt.close(fig)
png
fig = matplotlib.pyplot.figure(figsize=(4, 4), dpi=120)
k = 1
for ne in [5, 10, 15, 20]:
    random.seed(26)
    fig = random_mandala(connecting_function="bezier",
                         radius=[1],
                         symmetric_seed=True,
                         rotationa_symmetry_order=6,
                         number_of_elements=ne,
                         figure = fig,
                         location = (2, 2, k))
    ax = fig.axes[-1]
    ax.set_title("n:" + str(ne))
    k = k + 1
plt.show()
plt.close(fig)
png

symmetric_seed

The argument symmetric_seed specifies should the seed segment be symmetric or not:

fig = matplotlib.pyplot.figure(figsize=(4, 4), dpi=120)
k = 1
for ssd in [True, False]:
    random.seed(2)
    fig = random_mandala(connecting_function="fill", 
                         symmetric_seed=ssd,
                         figure = fig,
                         location = (1, 2, k))
    ax = fig.axes[-1]
    ax.set_title(str(ssd))
    k = k + 1
plt.show()
plt.close(fig)
png

face_color and edge_color

The arguments face_color and edge_color take as values strings or list of strings that specify the coloring of the filled-in polygons and lines respectively:

fig = matplotlib.pyplot.figure(figsize=(6,3), dpi=120)
k = 1
for fc in [["0.8", "0.6", "0.2"], ["olive", "gold", "red"]]:
    random.seed(11)
    fig = random_mandala(radius=[10,6,4],
     					 connecting_function="bezier_fill", 
                         symmetric_seed=True,
                         face_color=fc,
                         figure = fig,
                         location = (1, 2, k))
    ax = fig.axes[-1]
    ax.set_title(str(fc))
    k = k + 1
    
plt.show()
plt.close(fig)
png

alpha

The argument alpha controls the opacity of the plots; it takes as values None and floats between 0 and 1.

fig = matplotlib.pyplot.figure(figsize=(6,3), dpi=120)
k = 1
for al in [None, 0.2, 1.0]:
    random.seed(23)
    fig = random_mandala(radius=[10,6,4],
     					 connecting_function="bezier_fill",
                         symmetric_seed=True,
                         alpha=al,
                         color_mapper=matplotlib.cm.rainbow_r,
                         figure = fig,
                         location = (1, 3, k))
    ax = fig.axes[-1]
    ax.set_title(str(al))
    k = k + 1

plt.show()
plt.close(fig)
png

color_mapper

The argument color_mapper takes as values None and matplotlib.colors.Colormap objects. See the color mappers in the reference page “color example code: colormaps_reference.py”. If color_mapper is specified then the arguments face_color and edge_color are ignored. Here is an example using two color mappers:

fig = matplotlib.pyplot.figure(figsize=(6,3), dpi=120)
cMappers=[matplotlib.cm.rainbow_r, matplotlib.cm.Accent_r]
cMappersNames=["rainbow_r", "Accent_r"]
for k in range(2): 
    random.seed(15)
    fig = random_mandala(radius=[10,6,4],
                         connecting_function="bezier_fill",
                         symmetric_seed=True,
                         color_mapper=cMappers[k],
                         figure = fig,
                         location = (1, 2, k+1))
    ax = fig.axes[-1]
    ax.set_title(cMappersNames[k])
    
plt.show()
plt.close(fig)
png

Applications

Generate a collection of images

In certain Machine Learning (ML) studies it can be useful to be able to generate large enough collections of (random) images.

In the code block below we:

  • Generate 64 random mandala plots
  • Convert them into PIL images using the package function figure_to_image
  • Invert and binarize images
  • Plot the images in an image grid
# A list to accumulate random mandala images
mandala_images = []

# Generation loop
random.seed(443)
for i in range(64):
    
    # Generate one random mandala figure
    fig2 = random_mandala(n_rows=None,
                          n_columns=None,
                          radius=[8, 6, 3],
                          rotational_symmetry_order=6,
                          symmetric_seed=True,
                          connecting_function='random',
                          face_color="0.")
    fig2.tight_layout()
    
    # Convert the figure into an image and add it to the list
    mandala_images = mandala_images + [figure_to_image(fig2)]
    
    # Close figure to save memoru
    plt.close(fig2)

# Invert image colors    
mandala_images2 = [ImageOps.invert(img) for img in mandala_images]

# Binarize images
mandala_images3 = [im.convert('1') for im in mandala_images2]

# Make a grid of images and display it
fig3 = plt.figure(figsize=(14., 14.))
grid = ImageGrid(fig3, 111,
                 nrows_ncols=(8, 8),
                 axes_pad=0.02,
                 )

for ax, img in zip(grid, mandala_images3):
    ax.imshow(img)
    ax.set(xticks=[], yticks=[])

plt.show()
png

Neat examples

A table of random mandalas

random.seed(124)
fig=random_mandala(n_rows=6, n_columns=6, figsize=(10,10), dpi=240)
png

A table of colorized mandalas

fig = matplotlib.pyplot.figure(figsize=(10, 10), dpi=120)
k = 1
random.seed(56)
for i in range(36):
    rs=list(range(1,random.choice([3,4,5,6])+1))
    rs.sort()
    rs.reverse()

    fig = random_mandala(connecting_function="bezier_fill",
                         color_mapper=matplotlib.cm.gist_earth,
   						 symmetric_seed=True,
                         radius=rs,
                         rotational_symmetry_order=random.choice([3,4,5,6,7]),
                         number_of_elements=random.choice([2,3,4]),
                         figure=fig,
                         location=(6, 6, k))
    ax = fig.axes[-1]
    ax.set_axis_off()
    k = k + 1

fig.tight_layout()
plt.show()
plt.close(fig)
png

A table of open colorized mandalas

fig = matplotlib.pyplot.figure(figsize=(10, 10), dpi=120)
k = 1
random.seed(883)
for rso in [2 * random.random() + 2 for _ in range(36)]:
    random.seed(33)
    fig = random_mandala(connecting_function="bezier_fill",
                         radius=3,
                         face_color="darkblue",
                         rotational_symmetry_order=rso,
                         number_of_elements=8,
                         figure=fig,
                         location=(6, 6, k))
    ax = fig.axes[-1]
    ax.set_axis_off()
    k = k + 1

plt.show()
plt.close(fig)
png

Acknowledgements


References

Articles

[AA1] Anton Antonov, “Comparison of dimension reduction algorithms over mandala images generation”, (2017), MathematicaForPrediction at WordPress.

[AA1] Anton Antonov, “Generation of Random Bethlehem Stars, (2020), MathematicaForPrediction at WordPress.

Functions

[AAf1] Anton Antonov, RandomMandala, (2019), Wolfram Function Repository.

[AAf2] Anton Antonov, RandomScribble, (2020), Wolfram Function Repository.

Packages

[DHp1] Daniel Hermes, bezier Python package, (2016), PyPy.org.

Videos

[AAv1] Anton Antonov, “Random Mandalas Deconstruction in R, Python, and Mathematica (Greater Boston useR Meetup, Feb 2022)” (2022), Anton Antonov’s channel at YouTube.

Random Sparse Matrix Generator

In brief

This blog post proclaims and briefly describes a Python package that implements the function random_sparse_matrix, which can be used to generate random sparse matrices.

The sparse matrices have named rows and columns — see the package “SSparseMatrix”, [AAp1].

Functions from the package “RandomDataGenerators”, [AAp2], are used to obtain row- and column names and entry values. (See the previous post of this blog.)


Installation

To install from GitHub use the shell command:

python -m pip install git+https://github.com/antononcube/Python-packages.git#egg=RandomSparseMatrix\&subdirectory=RandomSparseMatrix

To install from PyPI:

python -m pip install RandomSparseMatrix

Examples

Here is random sparse matrix (SSparseMatrix object) with 6 rows and 4 columns:

import random
from RandomSparseMatrix import *

random.seed(87)
rmat = random_sparse_matrix(6, 4,
                            column_names_generator=random_pet_name,
                            row_names_generator=random_word,
                            min_number_of_values=6,
                            max_number_of_values=None)
rmat.print_matrix(n_digits=20)

# ============================================================================================
#            |                Cleo             Diamond                 Max               Tessa
# --------------------------------------------------------------------------------------------
#  cuticular |                   .                   .                   .  12.886794438387263
#    elysian |                   .                   .                   .  13.891135469455826
# spot-check |                   .  11.465064963144142                   .                   .
#   cetacean |                   .   9.626463367706222                   .                   .
#       idem |   5.474873249244756                   .                   .                   .
#        lot |                   .                   .  10.818678723268317                   .
# ============================================================================================


References

[AAp1] Anton Antonov, SSparseMatrix Python package, (2021), PyPI.

[AAp2] Anton Antonov, RandomDataGenerators Python package, (2021), PyPI.

[AAp3] Anton Antonov, SparseMatrixRecommender Python package, (2021), PyPI.

Random Data Generators

Introduction

This blog post proclaims and briefly describes the Python package “RandomDataGenerators” that has functions for generating random strings, words, pet names, and (tabular) data frames.

The full list of features and development status can be found in the org-mode file Random-data-generators-work-plan.org.

Motivation

The primary motivation for this package is to have simple, intuitively named functions for generating random vectors (lists) and data frames of different objects.

Although, Python has support of random vector generation, it is assumed that commands like the following are easier to use:

random_string(6, chars = 4, pattern = "[\l\d]")


Installation

To install from GitHub use the shell command:

python -m pip install git+https://github.com/antononcube/Python-packages.git#egg=RandomDataGenerators\&subdirectory=RandomDataGenerators

To install from PyPi.org:

python -m pip install RandomDataGenerators


Setup

from RandomDataGenerators import *

The import command above is equivalent to the import commands:

from RandomDataGenerators.RandomDataFrameGenerator import random_data_frame
from RandomDataGenerators.RandomFunctions import random_string
from RandomDataGenerators.RandomFunctions import random_word
from RandomDataGenerators.RandomFunctions import random_pet_name
from RandomDataGenerators.RandomFunctions import random_pretentious_job_title

We are also going to use the packages randomnumpy, and pandas:

import random
import numpy
import pandas
pandas.set_option('display.max_columns', None)


Random strings

The function random_string generates random strings. (It is based on the package StringGenerator, \[PW1\].)

Here we generate a vector of random strings with length 4 and characters that belong to specified ranges:

random_string(6, chars=4, pattern = "[\d]") # digits only

## ['3749', '4572', '9812', '7395', '2388', '7625']

random_string(6, chars=4, pattern = "[\l]") # letters only

## ['FhSd', 'DNSu', 'YggC', 'ajqA', 'dIBt', 'Mjdc']

random_string(6, chars=4, pattern = "[\l\d]") # both digits and letters

## ['yp4u', '2Shk', 'pvpS', 'M43O', 'm5SX', 'It3L']


Random words

The function random_word generates random words.

Here we generate a list with 12 random words:

random_word(12)

## ['arteria', 'Sauria', 'mentation', 'elope', 'expositor', 'planetarium', 'agglutinin', 'Faunus', 'flab', 'slub', 'Chasidic', 'Jirrbal']

Here we generate a table of random words of different types (kinds):

dfWords = pandas.DataFrame({k: random_word(6, kind = k) for k in ["Any", "Common", "Known", "Stop"]})
print(dfWords.transpose().to_string())

##                0              1          2                 3            4              5
## Any     stuffing  mind-altering    angrily        Embothrium       sorbet        smoking
## Common    reason       mackerel  alignment        calculator     halfback      paranoiac
## Known     tannoy    double-date    deckled  gynandromorphous  gravitative  steganography
## Stop       about              N      noone              next         back          alone

Remark: None can be used instead of 'Any'.


Random pet names

The function random_pet_name generates random pet names.

The pet names are taken from publicly available data of pet license registrations in the years 2015–2020 in Seattle, WA, USA. See \[DG1\].

The following command generates a list of six random pet names:

random.seed(32)
random_pet_name(6)

## ['Oskar', 'Bilbo "Bobo" Waggins', 'Maximus', 'Gracie', 'Osa', 'Fabio']

The named argument species can be used to specify specie of the random pet names. (According to the specie-name relationships in \[DG1\].)

Here we generate a table of random pet names of different species:

dfPetNames = pandas.DataFrame({ wt: random_pet_name(6, species = wt) for wt in ["Any", "Cat", "Dog", "Goat", "Pig"] })
dfPetNames.transpose()

##             0                1         2        3          4         5
## Any     Lumen             Asha      Echo     Yuki    Francis   Charlie
## Cat     Ellie      Roxie Grace    Norman     Bean  Mr. Darcy  Hermione
## Dog   Brewski            Matzo      Joey    K. C.      Oscar    Gracie
## Goat     Lula  Brussels Sprout     Grace   Moppet     Frosty      Arya
## Pig    Millie         Guinness  Guinness  Atticus   Guinness    Millie

Remark: None can be used instead of 'Any'.

The named argument weighted can be used to specify random pet name choice based on known real-life number of occurrences:

random.seed(32);
random_pet_name(6, weighted=True)

## ['Zorro', 'Beeker', 'Lucy', 'Blanco', 'Winston', 'Petunia']

The weights used correspond to the counts from \[DG1\].

Remark: The implementation of random-pet-name is based on the Mathematica implementation RandomPetName, \[AAf1\].


Random pretentious job titles

The function random_pretentious_job_title generates random pretentious job titles.

The following command generates a list of six random pretentious job titles:

random_pretentious_job_title(6)

## ['Direct Identity Officer', 'District Group Synergist', 'Lead Brand Liason', 'Central Configuration Administrator', 'Senior Accountability Facilitator', 'Dynamic Web Producer']

The named argument number_of_words can be used to control the number of words in the generated job titles.

The named argument language can be used to control in which language the generated job titles are in. At this point, only Bulgarian and English are supported.

Here we generate pretentious job titles using different languages and number of words per title:

random.seed(2)
random_pretentious_job_title(12, number_of_words = None, language = None)

## ['Manager', 'Клиентов Асистент на Инфраструктурата', 'Customer Quality Strategist', 'Наследствен Анализатор по Идентичност', 'Administrator', 'Изпълнител на Фактори', 'Administrator', 'Architect', 'Investor Assurance Agent', 'Прогресивен Служител по Сигурност', 'Координатор', 'Анализатор по Оптимизация']

Remark: None can be used as values for the named arguments number_of_words and language.

Remark: The implementation uses the job title phrases in https://www.bullshitjob.com . It is, more-or-less, based on the Mathematica implementation RandomPretentiousJobTitle, \[AAf2\].


Random tabular datasets

The function random_data_frame can be used generate tabular data frames.

Remark: In this package a data frame is an object produced and manipulated by the package pandas.

Here are basic calls:

random_data_frame()
random_data_frame(None, row_names=True)
random_data_frame(None, None)
random_data_frame(12, 4)
random_data_frame(None, 4)
random_data_frame(5, None, column_names_generator = random_pet_name)
random_data_frame(15, 5, generators = [random_pet_name, random_string, random_pretentious_job_title])
random_data_frame(None, ["Col1", "Col2", "Col3"], row-names=False)

Here is example of a generated data frame with column names that are cat pet names:

random_data_frame(5, 4, column_names_generator = lambda size: random_pet_name(size, species = 'Cat'), row_names=True)

##          Meryl   Oreo  Douglas Fur Sprockett
## id.0 -1.053990  QhFlT            0     o7p5f
## id.1 -0.707621  G90kh            0     yBupF
## id.2  0.494162  eMVtF            0     Ez2Df
## id.3  0.400718  tx3HL            2     3Tz7I
## id.4 -1.345948  r3NRa            0     whfam

Remark: Both wide format and long format data frames can be generated.

Remark: The signature design and implementation are based on the Mathematica implementation RandomTabularDataset, \[AAf3\]. There are also corresponding packages written in R, \[AAp1\], and Raku, \[AAp2\].

Here is an example in which some of the columns have specified generators:

random.seed(66)
random_data_frame(10, 
                  ["alpha", "beta", "gamma", "zetta", "omega"], 
                  generators = {"alpha" : random_pet_name, 
                                "beta" :  numpy.random.normal, 
                                "gamma" : lambda size: numpy.random.poisson(lam=5, size=size) } )

##       alpha      beta  gamma  zetta             omega
## 0    Frayda  0.811681      4  1V05P             swing
## 1     Rosie  0.591327      3  tg7yn           Carolus
## 2      Jovi  0.563906      7  imaDl            sailor
## 3     Pilot  0.607250      7  WAg8u           echinus
## 4    Brodie  0.279003     12  yXEao          Ramayana
## 5  Springer -1.394703      5  JFBoz            simper
## 6       Uma -0.538088      8  7ATV1        consecrate
## 7      Diva  0.343234      4  GeJUh            blight
## 8    Fezzik  1.506241      6  yEPI5  misappropriation
## 9      Hana -1.359908      4  PG3IS          diploidy


References

Articles

[AA1] Anton Antonov, “Pets licensing data analysis”, (2020), MathematicaForPrediction at WordPress.

Functions, packages

[AAf1] Anton Antonov, RandomPetName, (2021), Wolfram Function Repository.

[AAf2] Anton Antonov, RandomPretentiousJobTitle, (2021), Wolfram Function Repository.

[AAf3] Anton Antonov, RandomTabularDataset, (2021), Wolfram Function Repository.

[AAp1] Anton Antonov, RandomDataFrameGenerator R package, (2020), R-packages at GitHub/antononcube.

[AAp2] Anton Antonov, Data::Generators Raku package, (2021), Raku Modules.

[PW1] Paul Wolf, StringGenerator Python package, (PyPi.org)(https://pypi.org).

[WRI1] Wolfram Research (2010), RandomVariate, Wolfram Language function.

Data repositories

[DG1] Data.Gov, Seattle Pet Licensescatalog.data.gov.

Latent semantic analyzer package

Introduction

This post proclaims and briefly describes the Python package, LatentSemanticAnalyzer, which has different functions for computations of Latent Semantic Analysis (LSA) workflows (using Sparse matrix Linear Algebra.) The package mirrors the Mathematica implementation [AAp1]. (There is also a corresponding implementation in R; see [AAp2].)

The package provides:

  • Class LatentSemanticAnalyzer
  • Functions for applying Latent Semantic Indexing (LSI) functions on matrix entries
  • “Data loader” function for obtaining a pandas data frame ~580 abstracts of conference presentations

Installation

To install from GitHub use the shell command:

python -m pip install git+https://github.com/antononcube/Python-packages.git#egg=LatentSemanticAnalyzer\&subdirectory=LatentSemanticAnalyzer

To install from PyPI:

python -m pip install LatentSemanticAnalyzer


LSA workflows

The scope of the package is to facilitate the creation and execution of the workflows encompassed in this flow chart:

LSAworkflows

For more details see the article “A monad for Latent Semantic Analysis workflows”, [AA1].


Usage example

Here is an example of a LSA pipeline that:

  1. Ingests a collection of texts
  2. Makes the corresponding document-term matrix using stemming and removing stop words
  3. Extracts 40 topics
  4. Shows a table with the extracted topics
  5. Shows a table with statistical thesaurus entries for selected words
import random
from LatentSemanticAnalyzer.LatentSemanticAnalyzer import *
from LatentSemanticAnalyzer.DataLoaders import *
import snowballstemmer

# Collection of texts
dfAbstracts = load_abstracts_data_frame()
docs = dict(zip(dfAbstracts.ID, dfAbstracts.Abstract))

# Stemmer object (to preprocess words in the pipeline below)
stemmerObj = snowballstemmer.stemmer("english")

# Words to show statistical thesaurus entries for
words = ["notebook", "computational", "function", "neural", "talk", "programming"]

# Reproducible results
random.seed(12)

# LSA pipeline
lsaObj = (LatentSemanticAnalyzer()
          .make_document_term_matrix(docs=docs,
                                     stop_words=True,
                                     stemming_rules=True,
                                     min_length=3)
          .apply_term_weight_functions(global_weight_func="IDF",
                                       local_weight_func="None",
                                       normalizer_func="Cosine")
          .extract_topics(number_of_topics=40, min_number_of_documents_per_term=10, method="NNMF")
          .echo_topics_interpretation(number_of_terms=12, wide_form=True)
          .echo_statistical_thesaurus(terms=stemmerObj.stemWords(words),
                                      wide_form=True,
                                      number_of_nearest_neighbors=12,
                                      method="cosine",
                                      echo_function=lambda x: print(x.to_string())))


Related Python packages

This package is based on the Python package “SSparseMatrix”, [AAp3]

The package “SparseMatrixRecommender” also uses LSI functions — this package uses LSI methods of the class SparseMatrixRecommender.


Related Mathematica and R packages

Mathematica

The Python pipeline above corresponds to the following pipeline for the Mathematica package [AAp1]:

lsaObj =
  LSAMonUnit[aAbstracts]⟹
   LSAMonMakeDocumentTermMatrix["StemmingRules" -> Automatic, "StopWords" -> Automatic]⟹
   LSAMonEchoDocumentTermMatrixStatistics["LogBase" -> 10]⟹
   LSAMonApplyTermWeightFunctions["IDF", "None", "Cosine"]⟹
   LSAMonExtractTopics["NumberOfTopics" -> 20, Method -> "NNMF", "MaxSteps" -> 16, "MinNumberOfDocumentsPerTerm" -> 20]⟹
   LSAMonEchoTopicsTable["NumberOfTerms" -> 10]⟹
   LSAMonEchoStatisticalThesaurus["Words" -> Map[WordData[#, "PorterStem"]&, {"notebook", "computational", "function", "neural", "talk", "programming"}]];

R

The package LSAMon-R, [AAp2], implements a software monad for LSA workflows.


LSA packages comparison project

The project “Random mandalas deconstruction with R, Python, and Mathematica”, [AAr1, AA2], has documents, diagrams, and (code) notebooks for comparison of LSA application to a collection of images (in multiple programming languages.)

A big part of the motivation to make the Python package “RandomMandala”, [AAp6], was to make easier the LSA package comparison. Mathematica and R have fairly streamlined connections to Python, hence it is easier to propagate (image) data generated in Python into those systems.


Code generation with natural language commands

Using grammar-based interpreters

The project “Raku for Prediction”, [AAr2, AAv2, AAp7], has a Domain Specific Language (DSL) grammar and interpreters that allow the generation of LSA code for corresponding Mathematica, Python, R packages.

Here is Command Line Interface (CLI) invocation example that generate code for this package:

> ToLatentSemanticAnalysisWorkflowCode Python 'create from aDocs; apply LSI functions IDF, None, Cosine; extract 20 topics; show topics table'
# LatentSemanticAnalyzer(aDocs).apply_term_weight_functions(global_weight_func = "IDF", local_weight_func = "None", normalizer_func = "Cosine").extract_topics(number_of_topics = 20).echo_topics_table( )

NLP Template Engine

Here is an example using the NLP Template Engine, [AAr2, AAv3]:

Concretize["create from aDocs; apply LSI functions IDF, None, Cosine; extract 20 topics; show topics table", 
  "TargetLanguage" -> "Python"]
(* 
lsaObj = (LatentSemanticAnalyzer()
          .make_document_term_matrix(docs=aDocs, stop_words=None, stemming_rules=None,min_length=3)
          .apply_term_weight_functions(global_weight_func='IDF', local_weight_func='None',normalizer_func='Cosine')
          .extract_topics(number_of_topics=20, min_number_of_documents_per_term=20, method='SVD')
          .echo_topics_interpretation(number_of_terms=10, wide_form=True)
          .echo_statistical_thesaurus(terms=stemmerObj.stemWords([\"topics table\"]), wide_form=True, number_of_nearest_neighbors=12, method='cosine', echo_function=lambda x: print(x.to_string())))
*)



References

Articles

[AA1] Anton Antonov, “A monad for Latent Semantic Analysis workflows”, (2019), MathematicaForPrediction at WordPress.

[AA2] Anton Antonov, “Random mandalas deconstruction in R, Python, and Mathematica”, (2022), MathematicaForPrediction at WordPress.

Mathematica and R Packages

[AAp1] Anton Antonov, Monadic Latent Semantic Analysis Mathematica package, (2017), MathematicaForPrediction at GitHub.

[AAp2] Anton Antonov, Latent Semantic Analysis Monad in R (2019), R-packages at GitHub/antononcube.

Python packages

[AAp3] Anton Antonov, SSparseMatrix Python package, (2021), PyPI.

[AAp4] Anton Antonov, SparseMatrixRecommender Python package, (2021), PyPI.

[AAp5] Anton Antonov, RandomDataGenerators Python package, (2021), PyPI.

[AAp6] Anton Antonov, RandomMandala Python package, (2021), PyPI.

[MZp1] Marinka Zitnik and Blaz Zupan, Nimfa: A Python Library for Nonnegative Matrix Factorization, (2013-2019), PyPI.

[SDp1] Snowball Developers, SnowballStemmer Python package, (2013-2021), PyPI.

Raku packages

[AAp7] Anton Antonov, DSL::English::LatentSemanticAnalysisWorkflows Raku package, (2018-2022), GitHub/antononcube. (At raku.land).

Repositories

[AAr1] Anton Antonov, “Random mandalas deconstruction with R, Python, and Mathematica” presentation project, (2022) SimplifiedMachineLearningWorkflows-book at GitHub/antononcube.

[AAr2] Anton Antonov, “Raku for Prediction” book project, (2021-2022), GitHub/antononcube.

Videos

[AAv1] Anton Antonov, “TRC 2022 Implementation of ML algorithms in Raku”, (2022), Anton A. Antonov’s channel at YouTube.

[AAv2] Anton Antonov, “Raku for Prediction”, (2021), The Raku Conference (TRC) at YouTube.

[AAv3] Anton Antonov, “NLP Template Engine, Part 1”, (2021), Anton A. Antonov’s channel at YouTube.

[AAv4] Anton Antonov “Random Mandalas Deconstruction in R, Python, and Mathematica (Greater Boston useR Meetup, Feb 2022)”, (2022), Anton A. Antonov’s channel at YouTube.

Sparse matrix recommender package

Introduction

This post proclaims and briefly describes the Python package, SparseMatrixRecommender, which has different functions for computations of recommendations based on (user) profile or history using Sparse Linear Algebra (SLA). The package mirrors the Mathematica implementation [AAp1]. (There is also a corresponding implementation in R; see [AAp2]).

The package is based on a certain “standard” Information retrieval paradigm — it utilizes Latent Semantic Indexing (LSI) functions like IDF, TF-IDF, etc. Hence, the package also has document-term matrix creation functions and LSI application functions. I included them in the package since I wanted to minimize the external package dependencies.

The package includes two data-sets dfTitanic and dfMushroom in order to make easier the writing of introductory examples and unit tests.

For more theoretical description see the article “Mapping Sparse Matrix Recommender to Streams Blending Recommender” , [AA1].

For detailed examples see the files “SMR-experiments-large-data.py” and “SMR-creation-from-long-form.py”.

The list of features and its implementation status is given in the org-mode file “SparseMatrixRecommender-work-plan.org”.

Remark: “SMR” stands for “Sparse Matrix Recommender”. Most of the operations of this Python package mirror the operations of the software monads “SMRMon-WL”, “SMRMon-R”, [AAp1, AAp2].


Workflows

Here is a diagram that encompasses the workflows this package supports (or will support):

SMRworkflows

Here is narration of a certain workflow scenario:

  1. Get a dataset.
  2. Create contingency matrices for a given identifier column and a set of “tag type” columns.
  3. Examine recommender matrix statistics.
  4. If the assumptoins about the data hold apply LSI functions.
    • For example, the “usual trio” IDF, Frequency, Cosine.
  5. Do (verify) example profile recommendations.
  6. If satisfactory results are obtained use the recommender as a nearest neighbors classifier.

Monadic design

Here is a diagram of typical pipeline building using a SparseMatrixRecommender object:

SMRMonpipelinePython

Remark: The monadic design allows “pipelining” of the SMR operations — see the usage example section.


Installation

To install from GitHub use the shell command:

python -m pip install git+https://github.com/antononcube/Python-packages.git#egg=SparseMatrixRecommender\&subdirectory=SparseMatrixRecommender

To install from PyPI:

python -m pip install SparseMatrixRecommender

Related Python packages

This package is based on the Python package SSparseMatrix, [AAp5].

The package LatentSemanticAnalyzer, [AAp6], uses the cross tabulation and LSI functions of this package.


Usage example

Here is an example of an SMR pipeline for creation of a recommender over Titanic data and recommendations for the profile “passengerSex:male” and “passengerClass:1st”:

from SparseMatrixRecommender.SparseMatrixRecommender import *
from SparseMatrixRecommender.DataLoaders import *

dfTitanic = load_titanic_data_frame()

smrObj = (SparseMatrixRecommender()
          .create_from_wide_form(data = dfTitanic, 
                                 item_column_name="id", 
                                 columns=None, 
                                 add_tag_types_to_column_names=True, 
                                 tag_value_separator=":")
          .apply_term_weight_functions(global_weight_func = "IDF", 
                                       local_weight_func = "None", 
                                       normalizer_func = "Cosine")
          .recommend_by_profile(profile=["passengerSex:male", "passengerClass:1st"], 
                                nrecs=12)
          .join_across(data=dfTitanic, on="id")
          .echo_value())

Remark: More examples can be found the directory “./examples”.


Related Mathematica packages

The software monad Mathematica package “MonadicSparseMatrixRecommender.m” [AAp1], provides recommendation pipelines similar to the pipelines created with this package.

Here is a Mathematica monadic pipeline that corresponds to the Python pipeline above:

smrObj =
  SMRMonUnit[]⟹
   SMRMonCreate[dfTitanic, "id", 
                "AddTagTypesToColumnNames" -> True, 
                "TagValueSeparator" -> ":"]⟹
   SMRMonApplyTermWeightFunctions["IDF", "None", "Cosine"]⟹
   SMRMonRecommendByProfile[{"passengerSex:male", "passengerClass:1st"}, 12]⟹
   SMRMonJoinAcross[dfTitanic, "id"]⟹
   SMRMonEchoValue[];   

(Compare the pipeline diagram above with the corresponding diagram using Mathematica notation .)


Related R packages

The package SMRMon-R, [AAp2], implements a software monad for SMR workflows. Most of SMRMon-R functions delegate to SparseMatrixRecommender.

The package SparseMatrixRecommenderInterfaces, [AAp3], provides functions for interactive Shiny interfaces for the recommenders made with SparseMatrixRecommender and/or SMRMon-R.

The package LSAMon-R, [AAp4], can be used to make matrices for SparseMatrixRecommender and/or SMRMon-R.

Here is the SMRMon-R pipeline that corresponds to the Python pipeline above:

smrObj <-
  SMRMonCreate( data = dfTitanic, 
                itemColumnName = "id", 
                addTagTypesToColumnNamesQ = TRUE, 
                sep = ":") %>%
  SMRMonApplyTermWeightFunctions(globalWeightFunction = "IDF", 
                                 localWeightFunction = "None", 
                                 normalizerFunction = "Cosine") %>%
  SMRMonRecommendByProfile( profile = c("passengerSex:male", "passengerClass:1st"), 
                            nrecs = 12) %>%
  SMRMonJoinAcross( data = dfTitanic, by = "id") %>%
  SMRMonEchoValue

Recommender comparison project

The project repository “Scalable Recommender Framework”, [AAr1], has documents, diagrams, tests, and benchmarks of a recommender system implemented in multiple programming languages.

This Python recommender package is a decisive winner in the comparison — see the first 10 min of the video recording [AAv1] or the benchmarks at [AAr1].


Code generation with natural language commands

Using grammar-based interpreters

The project “Raku for Prediction”, [AAr2, AAv2, AAp6], has a Domain Specific Language (DSL) grammar and interpreters that allow the generation of SMR code for corresponding Mathematica, Python, R, and Raku packages.

Here is Command Line Interface (CLI) invocation example that generate code for this package:

> ToRecommenderWorkflowCode Python 'create with dfTitanic; apply the LSI functions IDF, None, Cosine;recommend by profile 1st and male' 

obj = SparseMatrixRecommender().create_from_wide_form(data = dfTitanic).apply_term_weight_functions(global_weight_func = "IDF", local_weight_func = "None", normalizer_func = "Cosine").recommend_by_profile( profile = ["1st", "male"])

NLP Template Engine

Here is an example using the NLP Template Engine, [AAr2, AAv3]:

Concretize["create with dfTitanic; apply the LSI functions IDF, None, Cosine;recommend by profile 1st and male", 
 "TargetLanguage" -> "Python"]

(*
"smrObj = (SparseMatrixRecommender()
 .create_from_wide_form(data = None, item_column_name=\"id\", columns=None, add_tag_types_to_column_names=True, tag_value_separator=\":\")
 .apply_term_weight_functions(\"IDF\", \"None\", \"Cosine\")
 .recommend_by_profile(profile=[\"1st\", \"male\"], nrecs=profile)
 .join_across(data=None, on=\"id\")
 .echo_value())"
*)

References

Articles

[AA1] Anton Antonov, “Mapping Sparse Matrix Recommender to Streams Blending Recommender” (2017), MathematicaForPrediction at GitHub.

Mathematica/WL and R packages

[AAp1] Anton Antonov, Monadic Sparse Matrix Recommender Mathematica package, (2018), MathematicaForPrediction at GitHub.

[AAp2] Anton Antonov, Sparse Matrix Recommender Monad in R (2019), R-packages at GitHub/antononcube.

[AAp3] Anton Antonov, Sparse Matrix Recommender framework interface functions (2019), R-packages at GitHub/antononcube.

[AAp4] Anton Antonov, Latent Semantic Analysis Monad in R (2019), R-packages at GitHub/antononcube.

Python packages

[AAp5] Anton Antonov, SSparseMatrix package in Python (2021), Python-packages at GitHub/antononcube.

[AAp6] Anton Antonov, LatentSemanticAnalyzer package in Python (2021), Python-packages at GitHub/antononcube.

Raku packages

[AAp6] Anton Antonov, DSL::English::RecommenderWorkflows Raku package, (2018-2022), GitHub/antononcube. (At raku.land).

Repositories

[AAr1] Anton Antonov, Scalable Recommender Framework project, (2022) GitHub/antononcube.

[AAr2] Anton Antonov, “Raku for Prediction” book project, (2021-2022), GitHub/antononcube.

Videos

[AAv1] Anton Antonov, “TRC 2022 Implementation of ML algorithms in Raku”, (2022), Anton A. Antonov’s channel at YouTube.

[AAv2] Anton Antonov, “Raku for Prediction”, (2021), The Raku Conference (TRC) at YouTube.

[AAv3] Anton Antonov, “NLP Template Engine, Part 1”, (2021), Anton A. Antonov’s channel at YouTube.

Sparse matrices with named rows and columns

Introduction

This blog post introduces and describes the Python package “SSparseMatrix” that provides the class SSparseMatrix, the objects of which are sparse matrices with named rows and columns.

We can say the package attempts to cover as many as possible of the functionalities for sparse matrix objects that are provided by R’s library Matrix. (R is a implementation of S. S introduced named data structures for statistical computations, [RB1], hence the name SSparseMatrix.)

The package builds on top of the scipy sparse matrices. (The added functionalities though are general — other sparse matrix implementations could be used.)

Here is a list of functionalities provided for SSparseMatrix:

  • Sub-matrix extraction by row and column names:
    • Single element access
    • Subsets of row names and column names
  • Slices (with integers)
  • Row and column names propagation for dot products with:
    • Lists
    • Dense vectors (numpy.array)
    • scipy sparse matrices
    • SSparseMatrix objects
  • Row and column sums
    • Vector form
    • Dictionary form
  • Transposing
  • Representation:
    • Tabular, matrix form (“pretty printing”)
    • String and repr forms
  • Row and column binding of SSparseMatrix objects
  • “Export” functions
    • Triplets
    • Row-dictionaries
    • Column-dictionaries
    • Wolfram Language full form representation

The full list of features and development status can be found in the org-mode file SSparseMatrix-work-plan.org.

This package more or less follows the design of the Mathematica package SSparseMatrix.m.

The usage examples below can be also run through the file “examples.py”.

Usage in other packages

The class SSparseMatrix is foundational in the packages SparseMatrixRecommender and LatentSemanticAnalyzer. (The implementation of those packages was one of the primary motivations to develop SSparseMatrix.)

The package RandomSparseMatrix can be used to generate random sparse matrices (SSparseMatrix objects.)


Installation

Install from GitHub

pip install -e git+https://github.com/antononcube/Python-packages.git#egg=SSparseMatrix-antononcube\&subdirectory=SSparseMatrix

From PyPi

pip install SSparseMatrix


Setup

Import the package:

from SSparseMatrix import *

The import command above is equivalent to the import commands:

from SSparseMatrix.SSparseMatrix import SSparseMatrix
from SSparseMatrix.SSparseMatrix import make_s_sparse_matrix
from SSparseMatrix.SSparseMatrix import is_s_sparse_matrix
from SSparseMatrix.SSparseMatrix import column_bind

Creation

Create a sparse matrix with named rows and columns (a SSparseMatrix object):

mat = [[1, 0, 0, 3], [4, 0, 0, 5], [0, 3, 0, 5], [0, 0, 1, 0], [0, 0, 0, 5]]
smat = SSparseMatrix(mat)
smat.set_row_names(["A", "B", "C", "D", "E"])
smat.set_column_names(["a", "b", "c", "d"])
<5x4 SSparseMatrix (sparse matrix with named rows and columns) of type '<class 'numpy.int64'>'
	with 8 stored elements in Compressed Sparse Row format, and fill-in 0.4>

Print the created sparse matrix:

smat.print_matrix()
===================================
  |       a       b       c       d
-----------------------------------
A |       1       .       .       3
B |       4       .       .       5
C |       .       3       .       5
D |       .       .       1       .
E |       .       .       .       5
===================================

Another way to create using the function make_s_sparse_matrix:

ssmat=make_s_sparse_matrix(mat)
ssmat
<5x4 SSparseMatrix (sparse matrix with named rows and columns) of type '<class 'numpy.int64'>'
	with 8 stored elements in Compressed Sparse Row format, and fill-in 0.4>

Structure

The SSparseMatrix objects have a simple structure. Here are the attributes:

  • _sparseMatrix
  • _rowNames
  • _colNames
  • _dimNames

Here are the methods to “query” SSparseMatrix objects:

  • sparse_matrix()
  • row_names() and row_names_dict()
  • column_names() and column_names_dict()
  • shape()
  • dimension_names()

SSparseMatrix over-writes the methods of scipy.sparse.csr_matrix that might require the handling of row names and column names.

Most of the rest of the scipy.sparse.csr_matrix methods are delegated to the _sparseMatrix attribute.

For example, for a given SSparseMatrix object smat the dense version of smat‘s sparse matrix attribute can be obtained by accessing that attribute first and then using the method todense:

print(smat.sparse_matrix().todense())
[[1 0 0 3]
 [4 0 0 5]
 [0 3 0 5]
 [0 0 1 0]
 [0 0 0 5]]

Alternatively, we can use the “delegated” form and directly invoke todense on smat:

print(smat.todense())
[[1 0 0 3]
 [4 0 0 5]
 [0 3 0 5]
 [0 0 1 0]
 [0 0 0 5]]

Here is another example showing a direct application of the element-wise operation sin through the scipy.sparse.csr_matrix method sin:

smat.sin().print_matrix(n_digits=20)
>  ===================================================================================
      |                   a                   b                   c                   d
    -----------------------------------------------------------------------------------
    A |  0.8414709848078965                   .                   .  0.1411200080598672
    B | -0.7568024953079282                   .                   . -0.9589242746631385
    C |                   .  0.1411200080598672                   . -0.9589242746631385
    D |                   .                   .  0.8414709848078965                   .
    E |                   .                   .                   . -0.9589242746631385
    ===================================================================================

Representation

Here the function print uses the string representation of SSparseMatrix object:

print(smat)
  ('A', 'a')	1
  ('A', 'd')	3
  ('B', 'a')	4
  ('B', 'd')	5
  ('C', 'b')	3
  ('C', 'd')	5
  ('D', 'c')	1
  ('E', 'd')	5

Here we print the representation obtained with repr:

print(repr(smat))
<5x4 SSparseMatrix (sparse matrix with named rows and columns) of type '<class 'numpy.int64'>'
	with 8 stored elements in Compressed Sparse Row format, and fill-in 0.4>

Here is the matrix form (“pretty printing” ):

smat.print_matrix()
===================================
  |       a       b       c       d
-----------------------------------
A |       1       .       .       3
B |       4       .       .       5
C |       .       3       .       5
D |       .       .       1       .
E |       .       .       .       5
===================================

The method triplets can be used to obtain a list of (row, column, value) triplets:

smat.triplets()
[('A', 'a', 1),
 ('A', 'd', 3),
 ('B', 'a', 4),
 ('B', 'd', 5),
 ('C', 'b', 3),
 ('C', 'd', 5),
 ('D', 'c', 1),
 ('E', 'd', 5)]

The method row_dictionaries gives a dictionary with keys that are row-names and values that are column-name-to-matrix-value dictionaries:

smat.row_dictionaries()
{'A': {'a': 1, 'd': 3},
 'B': {'a': 4, 'd': 5},
 'C': {'b': 3, 'd': 5},
 'D': {'c': 1},
 'E': {'d': 5}}

Similarly, the method column_dictionaries gives a dictionary with keys that are column-names and values that are row-name-to-matrix-value dictionaries:

smat.column_dictionaries()
{'a': {'A': 1, 'B': 4},
 'b': {'C': 3},
 'c': {'D': 1},
 'd': {'A': 3, 'B': 5, 'C': 5, 'E': 5}}

Multiplication

Multiply with the transpose and print:

ssmat2 = ssmat.dot(smat.transpose())
ssmat2.print_matrix()
===========================================
  |       A       B       C       D       E
-------------------------------------------
0 |      10      19      15       .      15
1 |      19      41      25       .      25
2 |      15      25      34       .      25
3 |       .       .       .       1       .
4 |      15      25      25       .      25
===========================================

Multiply with a list-vector:

smat3 = smat.dot([1, 2, 1, 0])
smat3.print_matrix()
===========
  |       0
-----------
A |       1
B |       4
C |       6
D |       1
E |       .
===========

Remark: The type of the .dot argument can be:

  • SSparseMatrix
  • list
  • numpy.array
  • scipy.sparse.csr_matrix

Slices

Single element access:

print(smat["A", "d"])
print(smat[0, 3])
3
3

Get sub-matrix of rows using row names:

smat[["A", "D", "B"], :].print_matrix()
===================================
  |       a       b       c       d
-----------------------------------
A |       1       .       .       3
D |       .       .       1       .
B |       4       .       .       5
===================================

Get sub-matrix using row indices:

smat[[0, 3, 1], :].print_matrix()
===================================
  |       a       b       c       d
-----------------------------------
A |       1       .       .       3
D |       .       .       1       .
B |       4       .       .       5
===================================

Get sub-matrix with columns names:

smat[:, ['a', 'c']].print_matrix()
===================
  |       a       c
-------------------
A |       1       .
B |       4       .
C |       .       .
D |       .       1
E |       .       .
===================

Get sub-matrix with columns indices:

smat[:, [0, 2]].print_matrix()
===================
  |       a       c
-------------------
A |       1       .
B |       4       .
C |       .       .
D |       .       1
E |       .       .
===================

Remark: The current implementation of scipy (1.7.1) does not allow retrieval of sub-matrices by specifying both row and column ranges or slices.

Remark: “Standard” slices with integers also work.


Row and column sums

Row sums and dictionary of row sums:

print(smat.row_sums())
print(smat.row_sums_dict())
[4, 9, 8, 1, 5]
{'A': 4, 'B': 9, 'C': 8, 'D': 1, 'E': 5}

Column sums and dictionary of column sums:

print(smat.column_sums())
print(smat.column_sums_dict())
[5, 3, 1, 18]
{'a': 5, 'b': 3, 'c': 1, 'd': 18}

Column and row binding

Column binding

Here we create another SSparseMatrix object:

mat2=smat.sparse_matrix().transpose()
smat2 = SSparseMatrix(mat2, row_names=list("ABCD"), column_names="c")
smat2.print_matrix()
===========================================
  |      c0      c1      c2      c3      c4
-------------------------------------------
A |       1       4       .       .       .
B |       .       .       3       .       .
C |       .       .       .       1       .
D |       3       5       5       .       5
===========================================

Here we column-bind two SSparseMatrix objects:

smat[list("ABCD"), :].column_bind(smat2).print_matrix()
>===========================================================================
  |       a       b       c       d      c0      c1      c2      c3      c4
---------------------------------------------------------------------------
A |       1       .       .       3       1       4       .       .       .
B |       4       .       .       5       .       .       3       .       .
C |       .       3       .       5       .       .       .       1       .
D |       .       .       1       .       3       5       5       .       5
===========================================================================

Remark: If during column-binding some column names are duplicated then to the column names of both matrices are added suffixes that designate to which matrix each column belongs to.

Row binding

Here we rename the column names of smat to be the same as smat2:

smat3 = smat.copy()
smat3.set_column_names(smat2.column_names()[0:4])
smat3 = smat3.impose_column_names(smat2.column_names())
smat3.print_matrix()
===========================================
  |      c0      c1      c2      c3      c4
-------------------------------------------
A |       1       .       .       3       .
B |       4       .       .       5       .
C |       .       3       .       5       .
D |       .       .       1       .       .
E |       .       .       .       5       .
===========================================

Here we row-bind smat2 and smat3:

smat2.row_bind(smat3).print_matrix()

=============================================
    |      c0      c1      c2      c3      c4
---------------------------------------------
A.1 |       1       4       .       .       .
B.1 |       .       .       3       .       .
C.1 |       .       .       .       1       .
D.1 |       3       5       5       .       5
A.2 |       1       .       .       3       .
B.2 |       4       .       .       5       .
C.2 |       .       3       .       5       .
D.2 |       .       .       1       .       .
E.2 |       .       .       .       5       .
=============================================

Remark: If during row-binding some row names are duplicated then to the row names of both matrices are added suffixes that designate to which matrix each row belongs to.


In place computations

  • The methods for setting row- and column-names are “in place” methods — no new SSparseMatrix objects a created.
  • The dot product, arithmetic, and transposing methods have an optional argument whether to do computations in place or not.
    • The optional argument is copy, which corresponds to argument with the same name and function in scipy.sparse.
    • By default, the computations are not in place: new objects are created.
    • I.e. copy=True default.
  • The class SSparseMatrix has the method copy() that produces deep copies when invoked.

Unit tests

The unit tests (so far) are broken into functionalities; see the folder ./tests. Similar unit tests are given in [AAp2].


References

Articles

[AA1] Anton Antonov, “RSparseMatrix for sparse matrices with named rows and columns”, (2015), MathematicaForPrediction at WordPress.

[RB1] Richard Becker, “A Brief History of S”, (2004).

Packages

[AAp1] Anton Antonov, SSparseMatrix.m, (2018), MathematicaForPrediction at GitHub.

[AAp2] Anton Antonov, SSparseMatrix Mathematica unit tests, (2018), MathematicaForPrediction at GitHub.

Breakdown of Python people and projects

Introduction

This document presents an attempt to breakdown the:

  • People who use Python
  • People who program (and think) in Python
  • Projects with heavy use of Python

The attempt is mostly provocative but maybe also insightful.


Breakdown mind-map

Here is a mind map of the proposed (attempted) breakdown:

(The mind map was made with the old, better version of MindNode.)


Breakdown definitions

In this section we give definitions (or clarifications) of the nodes of mind-map above.

Types of people

Ideologists

  • Pythonistas: Really believe in Python as a way of life.
  • Pythonists: Really believe Python is a programming language of great utility.
  • Pythongelists: Enthusiastically and eagerly propagate the ideas of Pythonistas and Pythonists.
  • Pythonihilists: Do not believe in Python that much at all; think that believing in Python is both detrimental and exploitable.
    • In case it is not clear: this is me.

Users

  • Pythontourage: Managers, quality assurance engineers, or advocates found in and around Python projects.
  • Pythelons: Felons using Python. (For, say, phishing, crypto-extortion, hijacking, etc.)
  • Py-curious: Programmers or users of other programming languages that are curious about Python.

Programmers

  • Pythonizers: See every problem through the Python-lens and produce engineering solutions based on Python.
  • Reticulizers: Accommodate Python use or implementations in existing or future projects of different kind.
  • Pythoners: Long-run Python users; maintain long-run projects based on Python.
  • Пайтонджии: Труженици ползващи усилено Пайтон. На конвейeр. (Понякога принудително.)

Programming

  • Pythonitis: Unconscious transferring of Python idioms or macro or micro patterns into projects with other languages.
  • Pythonification: Attempts to have a unified user experience or user interface
  • Pythonomers: Standard computer science terms or algorithms misnamed in Python.
  • Pythonoscopy: Using Python to debug, monitor, or introspect a certain engineering solution.
  • Pythonectomy: Removing Python from projects. Happens often in sufficiently advanced projects; most frequently replaced with Java.
  • Game of pythons: Attempting to install or update Python on particular machine usually shows that there are too many Pythons, often, with competing agendas.
  • Pythonstan: Generally speaking that is PyPI.org. (Or any project with a similar scope.)
    • Some people might prefer the name “Pythonistan”.

Strategy

  • Pythonomy: Python-driven software architecture.
  • Pythonomics: School of thought that believes that using Python produces cheap and convenient solutions.
  • Pythonimists: People justifying Python on economical or accounting grounds.
  • Pythong: Using Python in project management decisions.
    • “Nobody got fired for proposing Python in a project.”

Warnings

  • Pythonimics: Python (macro- or micro-) thinking mimicry. Especially applied to projects Python is not suitable for.
  • Pythonomers: See the definition above. For some reason, Python is using different names for mainstream concepts in computer science. Maybe, because it was thought that that would make Python more “user friendly.”
  • Pythonitis: See the definition above. It happens on idiomatic level often enough.

Contexts for informed hate

In this section we give some context for some of the definitions above.

Users vs programmers

Most people utilizing Python a Python users not Python programmers:

  • Python users learn how to use a few Python libraries and all their professional lives unfold within the workflows envisioned by the designers and developers of those libraries.
    • (Regardless how good, sound, tested, or useful the corresponding implementations are.)
  • Python programmers are capable of writing complete products in Python.
    • That includes libraries.
    • Making a Python library requires language knowledge that Python users usually do not have or care to acquire.

Python is a reaction to Perl

Python’s inventor Guido van Rossum got scared from Perl, and decided to make a programming language based on the principle “There should be one– and preferably only one –obvious way to do it” (TSBEO-APOO-OWTDI), [TP1].

In other words, using a principle that is an opposite of “There’s more than one way to do it” (TIMTOWTDI).

Applying TSBEO-APOO-OWTDI is a very Dutch or Scandinavian way of doing things. For example, in those countries the following Japanese proverb broadly applies in every-day life:

A nail that sticks out gets hammered down.

Terminology by the underdeveloped

Much of the terminology of Python comes from the partially developed skills and computer science knowledge of its inventor and developers. Also, most Python users are “not brought to term” programmers, hence stupid or inadequate terminology gets “free passes” from them.

Also, see [GvR1] in which Guido van Rossum discussed his not understanding of functional programming.

Capitalistic SAFe

Scaled Agile Framework (SAFe) — also known as Shitty Agile For Enterprises, [GOTO14], [SM1] — tries to combine the Waterfall and Agile methodologies. SAFe, of course, is often combined with managers’ short attention span and desires for cheap solutions (and workforce.)

One consequence is that often Python is seen as the uniform skill and paradigm that is going to bring everything together cheaply.

Another consequence is that all data scientists and machine learning engineers in those projects are perceived to be fungible (because they all use Python.)

Both consequences above can be seen as derivatives of the TSBEO-APOO-OWTDI principle. (At least in managers’ minds.)


References

[GOTO14] Goto;, “A Retake on the Agile Manifesto • Humble, Thomas, Badiceanu, Fowler & Kirk • GOTO 2014”, (2014), “Goto Conferences” channel at YouTube. (Hear statements at 10:57.)

[GvR1] “The fate of reduce() in Python 3000”.

[SM1] SmHarter, “SAFE: A COLLECTION OF COMMENTS FROM LEADING EXPERTS – PART 2”smharter.com.

[TP1] Tim Peters, (19 August 2004). “PEP 20 – The Zen of Python”Python Enhancement Proposals. Python Software Foundation.

[Wk1] Wikipedia entry “Python (programming language)”.

[Wk3] Wikipedia entry “There’s more than one way to do it”.

[Wk4] Wikipedia entry “Scaled agile framework”.

Why this Python site?

In brief

This site is about comparison of systems for technological computations, like, Python, R, and Wolfram Language. Many of the posts are studies embarked on in order to confirm or reject my assumptions / impressions about Python.

Beware that I am not enamored by Python. I do not dislike Python, but I dislike many of the attitudes and opinions about its place and usefulness. I do prefer to keep (relatively) objective opinions about research and software developments tools. Hence, I voice opinions that paint Python in negative light.

Of course, in order to be objective, we have to compare same or similar functionalities implemented in the different programming languages (tools) and apply those functionalities to the same problems. This blog is going to show such applications using Python. Related comparisons of mine can be found in:

Previously stated opinions about Python

In different sites and meetups I stated and proclaimed my opinions about Python. Some are listed below.

(I plan to discuss my Python background in the next post…)