Trending February 2024 # Topic Modeling With Ml Techniques # Suggested March 2024 # Top 2 Popular

You are reading the article Topic Modeling With Ml Techniques updated in February 2024 on the website We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested March 2024 Topic Modeling With Ml Techniques


Topic modeling is a method to use and identify the themes that exist in large sets of data. It’s a kind of unsupervised learning technique where the model tries to predict the presence of underlying topics without ground truth labels. It is helpful in a wide range of industries, including healthcare, finance, and marketing, where there’s a lot of text-based data to analyze. Using topic modeling, organizations can quickly gain valuable insights from the topics that matter most to their business that can help them make better decisions and improve their products and services.

This article was published as a part of the Data Science Blogathon.

Project Description

Topic modeling is valuable for numerous industries, including and not limited to finance, healthcare, and marketing. It is beneficial for industries that deal with huge amounts of unstructured text data, such as, customer reviews, social media posts, or medical records, as it can help reduce the vast amount of time and labor to do the same without machines.

For example, in the healthcare industry, topic modeling can identify common themes or patterns in patient records that can help improve patient outcomes, identify risk factors, and guide clinical decision-making. In finance, topic modeling can analyze news articles, financial reports, and other text data to identify trends, market sentiment, and potential investment opportunities.

In marketing industry, topic modeling can analyze customer feedback, social media posts, and other text data to identify customer needs and preferences and develop targeted marketing campaigns. This can help companies improve customer satisfaction, increase sales, and gain a competitive market edge.

Problem Statement

The aim is to do topic modeling on the A million headlines news dataset. It is a collection of over one million news article headlines published by the ABC.

By identifying the main themes in the news headlines dataset. The project aims to provide insights into the types of news stories that will cover the ABC. Use this information by journalists, editors, and media organizations to better understand their audience and to tailor their news coverage to meet the needs and interests of their readers.

Dataset Description

The dataset contains a large collection of news headlines published over a period of nineteen years, between February 19, 2003, and December 31, 2023. The data is sourced from the Australian Broadcasting Corporation (ABC), a reputable news organization in Australia. The dataset is provided in CSV format and contains two columns: “publish_date” and “headline_text“.

The “publish_date” column provides the date when the news article was published, in the YYYYMMDD format. The “headline_text” column contains the text of the headline, written in ASCII, English, and lowercase.

Project Plan

The project steps for applying topic modeling to the news headlines dataset can be as follow:

1. Exploratory Data Analysis: The next step is analyzing the data to understand the distribution of headlines over time. The frequency of different words and phrases, and other patterns in the data. Also, you can visualizing the data using charts and graphs to gain insights into the data.

2. Data Pre-processing: The first step is cleaning and preprocessing the text to remove stop words, punctuation, etc. It also involves tokenization, stemming, and lemmatization to standardize the text data and make it suitable for analysis.

3. Topic Modeling: The core of the project is applying techniques such as LDA. Then, identify the main topics and themes in the news headlines dataset. It requires selecting the appropriate parameters for the topic modeling algorithms. For example, the number of topics, the size of the vocabulary, and the similarity measure.

4. Topic Interpretation: After identifying the main topics, the next step is interpreting the topics and assigning human-readable labels to them. It includes analyzing the top words and phrases associated with each topic and identifying the main themes and trends.

5. Evaluation: The final step involves evaluating the performance of the topic modeling algorithms. Then, comparing them based on metrics such as coherence score and perplexity. Identifying the limitations and challenges of the topic modeling approach and proposing possible solutions.

Steps for The Project

First, importing the necessary libraries.

import numpy as np import pandas as pd from IPython.display import display from tqdm import tqdm from collections import Counter import matplotlib.pyplot as plt import seaborn as sns from sklearn.feature_extraction.text import CountVectorizer from textblob import TextBlob import scipy.stats as stats from sklearn.decomposition import LatentDirichletAllocation from sklearn.manifold import TSNE from wordcloud import WordCloud, STOPWORDS from bokeh.plotting import figure, output_file, show from bokeh.models import Label from chúng tôi import output_notebook output_notebook() %matplotlib inline

Loading the csv format data in dataframe while parsing the dates in usable format.

path = '/content/drive/MyDrive/topic_modeling/abcnews-date-text.csv' #path of your dataset df = pd.read_csv(path, parse_dates=[0], infer_datetime_format=True) reindexed_data = df['headline_text'] reindexed_data.index = df['publish_date']

Seeing a glimpse of the loaded data through first five rows.


There are 2 columns named publish_date and headline_text as mentioned above in the dataset description. #general description of data

We can see that there are 12,44,184 rows in the dataset with no null values.

Now, using 100,000 rows of the data for convenience and feasibility for using LDA model

Exploratory Data Analysis

Starting with visualizing the top 15 words in the data without including stopwords.

def get_top_n_words(n_top_words, count_vectorizer, text_data): ''' returns a tuple of the top n words in a sample and their accompanying counts, given a CountVectorizer object and text sample ''' vectorized_headlines = count_vectorizer.fit_transform(text_data.values) vectorized_total = np.sum(vectorized_headlines, axis=0) word_indices = np.flip(np.argsort(vectorized_total)[0,:], 1) word_values = np.flip(np.sort(vectorized_total)[0,:],1) word_vectors = np.zeros((n_top_words, vectorized_headlines.shape[1])) for i in range(n_top_words): word_vectors[i,word_indices[0,i]] = 1 words = [word[0].encode('ascii').decode('utf-8') for word in count_vectorizer.inverse_transform(word_vectors)] return (words, word_values[0,:n_top_words].tolist()[0]) # CountVectorizer function maps words to a vector space with similar words closer together count_vectorizer = CountVectorizer(max_df=0.8, min_df=2,stop_words='english') words, word_values = get_top_n_words(n_top_words=15, count_vectorizer=count_vectorizer, text_data=reindexed_data) fig, ax = plt.subplots(figsize=(16,8)), word_values); ax.set_xticks(range(len(words))); ax.set_xticklabels(words, rotation='vertical'); ax.set_title('Top words in headlines dataset (excluding stop words)'); ax.set_xlabel('Word'); ax.set_ylabel('Number of occurences');

Now, doing part of speech tagging for the headlines.

import nltk'punkt')'averaged_perceptron_tagger') tagged_headlines = [TextBlob(reindexed_data[i]).pos_tags for i in range(reindexed_data.shape[0])] tagged_headlines[10] #checking the 10th headline tagged_headlines_df = pd.DataFrame({'tags':tagged_headlines}) word_counts = [] pos_counts = {} for headline in tagged_headlines_df[u'tags']: word_counts.append(len(headline)) for tag in headline: if tag[1] in pos_counts: pos_counts[tag[1]] += 1 else: pos_counts[tag[1]] = 1 print('Total number of words: ', np.sum(word_counts)) print('Mean number of words per headline: ', np.mean(word_counts))


Total number of words: 8166553

Mean number of words per headline: 6.563782366595294

Checking if the distribution is normal.

y = stats.norm.pdf(np.linspace(0,14,50), np.mean(word_counts), np.std(word_counts)) fig, ax = plt.subplots(figsize=(8,4)) ax.hist(word_counts, bins=range(1,14), density=True); ax.plot(np.linspace(0,14,50), y, 'r--', linewidth=1); ax.set_title('Headline word lengths'); ax.set_xticks(range(1,14)); ax.set_xlabel('Number of words');

Visualizing the proportion of top 5 used parts of speech.

# importing libraries import matplotlib.pyplot as plt import seaborn as sns # declaring data pos_sorted_types = sorted(pos_counts, key=pos_counts.__getitem__, reverse=True) pos_sorted_counts = sorted(pos_counts.values(), reverse=True) top_five = pos_sorted_types[:5] data = pos_sorted_counts[:5] # declaring exploding pie explode = [0, 0.1, 0, 0, 0] # define Seaborn color palette to use palette_color = sns.color_palette('dark') # plotting data on chart plt.pie(data, labels=top_five, colors=palette_color, explode=explode, autopct='%.0f%%') # displaying chart

Here, it’s visible that 50% of the words in headlines are Noun which sounds reasonable.


First, sampling 100,000 healines and converting sentences to words.

def sent_to_words(sentences): for sentence in sentences: # deacc=True removes punctuations yield(gensim.utils.simple_preprocess(str(sentence), deacc=True)) text_sample = reindexed_data.sample(n=100000, random_state=0).values data = text_sample.tolist() data_words = list(sent_to_words(data)) print(data_words[0])

Making bigram and trigram models.

# Build the bigram and trigram models bigram = gensim.models.Phrases(data_words, min_count=5, threshold=100) trigram = gensim.models.Phrases(bigram[data_words], threshold=100) # higher threshold fewer phrases. # Faster way to get a sentence clubbed as a trigram/bigram bigram_mod = gensim.models.phrases.Phraser(bigram) trigram_mod = gensim.models.phrases.Phraser(trigram)

We will do Stopwords removal, bigrams and trigrams and lemmatization in this step.

import nltk'stopwords') from nltk.corpus import stopwords stop_words = stopwords.words('english') stop_words.extend(['from', 'subject', 're', 'edu', 'use']) # Define functions for stopwords, bigrams, trigrams and lemmatization def remove_stopwords(texts): return [[word for word in simple_preprocess(str(doc)) if word not in stop_words] for doc in texts] def make_bigrams(texts): return [bigram_mod[doc] for doc in texts] def make_trigrams(texts): return [trigram_mod[bigram_mod[doc]] for doc in texts] def lemmatization(texts, allowed_postags=['NOUN', 'ADJ', 'VERB', 'ADV']): texts_out = [] for sent in texts: doc = nlp(" ".join(sent)) texts_out.append([token.lemma_ for token in doc if chúng tôi in allowed_postags]) return texts_out # !python -m spacy download en_core_web_sm import spacy # Remove Stop Words data_words_nostops = remove_stopwords(text_sample) # Form Bigrams data_words_bigrams = make_bigrams(data_words_nostops) # Initialize spacy 'en' model, keeping only tagger component (for efficiency) nlp = spacy.load("en_core_web_sm", disable=['parser', 'ner']) data_lemmatized = lemmatization(data_words_bigrams, allowed_postags=['NOUN', 'ADJ', 'VERB', 'ADV']) import gensim.corpora as corpora # Create Dictionary id2word = corpora.Dictionary(data_lemmatized) # Create Corpus texts = data_lemmatized # Term Document Frequency corpus = [id2word.doc2bow(text) for text in texts] Topic Modeling

Applying LDA model assuming 15 themes in whole dataset

num_topics = 15 lda_model = gensim.models.LdaMulticore(corpus=corpus, id2word=id2word, num_topics=num_topics, random_state=100, chunksize=100, passes=10, alpha=0.01, eta=0.9) Topic Interpretation from pprint import pprint # Print the Keyword in the 15 topics pprint(lda_model.print_topics()) doc_lda = lda_model[corpus] Output: [(0, '0.046*"new" + 0.034*"fire" + 0.020*"year" + 0.018*"ban" + 0.016*"open" + ' '0.014*"set" + 0.011*"consider" + 0.009*"security" + 0.009*"name" + ' '0.008*"melbourne"'), (1, '0.021*"urge" + 0.020*"attack" + 0.016*"government" + 0.014*"lead" + ' '0.014*"driver" + 0.013*"public" + 0.011*"want" + 0.010*"rise" + ' '0.010*"student" + 0.010*"funding"'), (2, '0.019*"day" + 0.015*"flood" + 0.013*"go" + 0.013*"work" + 0.011*"fine" + ' '0.010*"launch" + 0.009*"union" + 0.009*"final" + 0.007*"run" + ' '0.006*"game"'), (3, '0.023*"australian" + 0.023*"crash" + 0.016*"health" + 0.016*"arrest" + ' '0.013*"fight" + 0.013*"community" + 0.013*"job" + 0.013*"indigenous" + ' '0.012*"victim" + 0.012*"support"'), (4, '0.024*"face" + 0.022*"nsw" + 0.018*"council" + 0.018*"seek" + 0.017*"talk" ' '+ 0.016*"home" + 0.012*"price" + 0.011*"bushfire" + 0.010*"high" + ' '0.010*"return"'), (5, '0.068*"police" + 0.019*"car" + 0.015*"accuse" + 0.014*"change" + ' '0.013*"road" + 0.010*"strike" + 0.008*"safety" + 0.008*"federal" + ' '0.008*"keep" + 0.007*"problem"'), (6, '0.042*"call" + 0.029*"win" + 0.015*"first" + 0.013*"show" + 0.013*"time" + ' '0.012*"trial" + 0.012*"cut" + 0.009*"review" + 0.009*"top" + 0.009*"look"'), (7, '0.027*"take" + 0.021*"make" + 0.014*"farmer" + 0.014*"probe" + ' '0.011*"target" + 0.011*"rule" + 0.008*"season" + 0.008*"drought" + ' '0.007*"confirm" + 0.006*"point"'), (8, '0.047*"say" + 0.026*"water" + 0.021*"report" + 0.020*"fear" + 0.015*"test" ' '+ 0.015*"power" + 0.014*"hold" + 0.013*"continue" + 0.013*"search" + ' '0.012*"election"'), (9, '0.024*"warn" + 0.020*"worker" + 0.014*"end" + 0.011*"industry" + ' '0.011*"business" + 0.009*"speak" + 0.008*"stop" + 0.008*"regional" + ' '0.007*"turn" + 0.007*"park"'), (10, '0.050*"man" + 0.035*"charge" + 0.017*"jail" + 0.016*"murder" + ' '0.016*"woman" + 0.016*"miss" + 0.016*"get" + 0.014*"claim" + 0.014*"school" ' '+ 0.011*"leave"'), (11, '0.024*"find" + 0.015*"push" + 0.015*"drug" + 0.014*"govt" + 0.010*"labor" + ' '0.008*"state" + 0.008*"investigate" + 0.008*"threaten" + 0.008*"mp" + ' '0.008*"world"'), (12, '0.028*"court" + 0.026*"interview" + 0.025*"kill" + 0.021*"death" + ' '0.017*"die" + 0.015*"national" + 0.014*"hospital" + 0.010*"pay" + ' '0.009*"announce" + 0.008*"rail"'), (13, '0.020*"help" + 0.017*"boost" + 0.016*"child" + 0.016*"hit" + 0.016*"group" ' '+ 0.013*"case" + 0.011*"fund" + 0.011*"market" + 0.011*"appeal" + ' '0.010*"local"'), (14, '0.036*"plan" + 0.021*"back" + 0.015*"service" + 0.012*"concern" + ' '0.012*"move" + 0.011*"centre" + 0.010*"inquiry" + 0.010*"budget" + ' '0.010*"law" + 0.009*"remain"')] Evaluation

1. Calculating Coherence score (ranges between -1 and 1), which is a measure of how similar the words in a topic are.

from gensim.models import CoherenceModel # Compute Coherence Score coherence_model_lda = CoherenceModel(model=lda_model, texts=data_lemmatized, dictionary=id2word, coherence='c_v') coherence_lda = coherence_model_lda.get_coherence() print('Coherence Score: ', coherence_lda)


Coherence Score: 0.38355488160129025

2. Calculating perplexity score that is a measure of randomness in the model and how well the probability distribution predicts the sample. (lower value indicates better model)

perplexity = lda_model.log_perplexity(corpus) print(perplexity)



We can see that the coherence score is fairly low but can still predict relevant themes well and can surely be improved by doing hyperparameter tuning. Also, perplexity is low which can be justified with the normal distribution of the data as was seen in exploratory data analysis section.


Topic Modeling is an unsupervised learning technique to identify themes in large sets of data. It is useful in various domains such as healthcare, finance, and marketing, where there is a huge amount of text-based data to analyze. In this project, you had to apply topic modeling to a dataset called “A million headlines” consisting of over one million news article headlines published by the ABC. The aim is to use Latent Dirichlet Allocation (LDA) algorithm, which is a probabilistic generative model, to identify the main topics in the dataset.

The project plan involves several steps: exploratory data analysis to understand the data distribution, preprocessing the text by removing stop words, punctuation, etc., and applying techniques like tokenization, stemming, and lemmatization. The essence of the project revolves around topic modeling, leveraging LDA to identify the primary topics and themes within the news headlines. We analyze associated words and phrases to interpret the topics and assign human-readable labels to them. The evaluation of topic modeling algorithms encompasses metrics such as coherence score and perplexity, while also taking into account the limitations of the approach.

Key Takeaways

Topic Modeling is an effective way of finding broad themes from the data with Machine Learning (ML) without labels.

It has a wide range of applications from healthcare to recommender systems.

LDA is one effective way of implementing topic modeling.

Coherence score and perplexity are effective evaluation metrics for checking the performance of topic modeling through ML models.

Frequently Asked Questions

Q1. What is topic modeling in ML?

A. Topic modeling in ML refers to a technique that automatically extracts underlying themes or topics from a collection of text documents. It helps uncover latent patterns and structures, enabling tasks like document clustering, text summarization, and content recommendation in natural language processing (NLP) and machine learning.

Q2. What is topic modeling with examples?

A. Topic modeling, with an example, involves extracting topics from a set of news articles. The algorithm identifies topics such as “politics,” “sports,” and “technology” based on word co-occurrence patterns. This helps organize and categorize articles, making browsing and searching for specific topics of interest easier.

Q3. What is the best algorithm for topic modeling?

A. The best algorithm for topic modeling depends on the specific requirements and characteristics of the dataset. Popular algorithms include Latent Dirichlet Allocation (LDA), Non-negative Matrix Factorization (NMF), and Latent Semantic Analysis (LSA). Each algorithm has its strengths and weaknesses, so the choice should align with the task at hand.

Q4. Is topic modeling an NLP technique?

A. Yes, topic modeling is a technique commonly used in natural language processing (NLP). It leverages machine learning algorithms to identify and extract topics from text data, allowing for better understanding, organization, and analysis of textual information. It aids in various NLP tasks, including text classification, sentiment analysis, and information retrieval.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.


You're reading Topic Modeling With Ml Techniques

Data Modeling Techniques To Organize Dax Measures

As you progress in Power BI, you will gradually be working with more DAX measures and calculations, and things can easily get cluttered. For today’s tutorial, I will share a few data modeling techniques on how to organize your DAX measures better for a more efficient workflow. You can watch the full video of this tutorial at the bottom of this blog.

Typically a lot of people will take their measures and put that on their Fact or Sales table. Instead, we’re going to organize the measures into a new dedicated DAX table or folder that is going to sit at the top of our Fields list.

To do that, we’re going to go Enter Data under the Home ribbon. We’re going to leave this completely empty and the only thing we’re going to do is name this as DAX for the table name and select Load.

And now, I’m going to go to my Sales table and select all my measures here. I’m using the shift key and highlighting these to select them. Then, I’m going to drag them into the DAX folder that we just created.

And now, you can see all the Sales measures inside the DAX folder.

Moreover, we can create more folders to organize this better, so let’s go ahead and do that. As you can see, all my Base amounts include my Budget, my Forecast, and my Sales. Those are like the core calculations. We can have them in a Display folder. I’m going to call this folder, Base for Base calculations, and select Enter. Notice that that created a folder for Base that is in this hierarchy structure as well.

So we now have our DAX folder with our measures, which has a sub-folder with other measures. Likewise, I’m going to do the same to my Variances. I’ll select all these Variances, and I’m going to add this to a Display folder of Variances. And now, we have a Base and a Variances folder within our DAX folder.

Another trick for this as well is if you have measures in the same level inside of this measure folder, you want to ensure that these folders are always kept at the top. To do that, you can add a punctuation mark, such as a period at the very beginning of the folder title. This will ensure that the folder remains at the top of the list because the folders are sorted alphanumerically.

Now, if you want to do a sub folder within the sub folder, just add a backslash to the folder title, and then let’s name the sub folder. In this case, let’s call it Sub Folder. You can now see that inside of Variances, there’s a sub folder. You can also drag and drop them to any of the folders created after you’ve done that.

Finally, there’s one other really cool trick that I want to share with you. If I wanted to put this Actual Amount Variance to Budget (VTB) measure in one or multiple folders, I’ll just add a semicolon to the folder title. Notice that it is now in two different folders.

Now, observe what happens with a new visual on the page. If I had a single value card visual here, as an example, I’m going to drag the measure from Budget. Notice that it added it from both folders. So, technically there is only one measure, but you can see the changes added for both locations.

In this blog tutorial, I’ve shared with you some of my data modeling techniques to organize your DAX measures better. You can add a period at the beginning of the folder title if you want a folder to be at the top of the list. An additional backslash allows you to put your measures into a sub folder, while a semi-colon allows you to put them into multiple locations.

Hopefully, you’ve found these tips useful, and you’ve seen now how to create a root folder itself that is located at the top of your Field’s list, as far as that DAX folder grows. You can also put the DAX measures inside of that folder in a sub folder as well for better organization.

All the best!


Topic Modeling: Predicting Multiple Tags Of Research Articles Using Onevsrest Strategy

This article was published as a part of the Data Science Blogathon

Recently I participated in an NLP hackathon — “Topic Modeling for Research Articles 2.0”. This hackathon was hosted by the Analytics Vidhya platform as a part of their HackLive initiative. The participants were guided by experts in a 2-hour live session and later on were given a week to compete and climb the leaderboard.

Problem Statement

Given the abstracts for a set of research articles, the task is to predict the tags for each article included in the test set.

The research article abstracts are sourced from the following 4 topics — Computer Science, Mathematics, Physics, Statistics. Each article can possibly have multiple tags among 25 tags like Number Theory, Applications, Artificial Intelligence, Astrophysics of Galaxies, Information Theory, Materials Science, Machine Learning et al. Submissions are evaluated on micro F1 Score between the predicted and observed tags for each article in the test set.

Complete Problem Statement and the dataset is available here.

Without further ado let’s get started with the code.

Loading and Exploring data

Importing necessary libraries —

%matplotlib inline import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns from nltk.tokenize import word_tokenize from chúng tôi import PorterStemmer from sklearn.feature_extraction.text import CountVectorizer from sklearn.feature_extraction.text import TfidfVectorizer from sklearn import metrics from sklearn.metrics import accuracy_score from sklearn.metrics import f1_score

Load train and test data from .csv files into Pandas DataFrame —

train_data = pd.read_csv(‘Train.csv’) test_data = pd.read_csv(‘Test.csv’)

Train and Test data shape —

print(“Train size:”, train_data.shape) print(“Test size:”, test_data.shape)


There are ~ 14k datapoints in the Train dataset and ~6k datapoints in the Test set. Overview of train and test datasets —



As we can see from the train data info, there are 31 columns — 1 column for id, 1 column for Abstract text, 4 columns for topics, these all form our feature variables, and the next 25 columns are class-labels that we have to ‘learn’ for the prediction task.

topic_cols = [‘Computer Science’, ‘Mathematics’, ‘Physics’, ‘Statistics’] target_cols = [‘Analysis of PDEs’, ‘Applications’, ‘Artificial Intelligence’, ‘Astrophysics of Galaxies’, ‘Computation and Language’, ‘Computer Vision and Pattern Recognition’, ‘Cosmology and Nongalactic Astrophysics’, ‘Data Structures and Algorithms’, ‘Differential Geometry’, ‘Earth and Planetary Astrophysics’, ‘Fluid Dynamics’, ‘Information Theory’, ‘Instrumentation and Methods for Astrophysics’, ‘Machine Learning’, ‘Materials Science’, ‘Methodology’, ‘Number Theory’, ‘Optimization and Control’, ‘Representation Theory’, ‘Robotics’, ‘Social and Information Networks’, ‘Statistics Theory’, ‘Strongly Correlated Electrons’, ‘Superconductivity’, ‘Systems and Control’]

How many datapoints have more than 1 tags?

my_list = [] for i in range(train_data.shape[0]): my_list.append(sum(train_data.iloc[i, 6:])) pd.Series(my_list).value_counts()


So, most of our research articles have either 1 or 2 tags.

Data cleaning and preprocessing for OneVsRest Classifier

Before proceeding with data cleaning and pre-processing, it’s a good idea to first print and observe some random samples from training data in order to get an overview. Based on my observation I built the below pipeline for cleaning and pre-processing the text data:

De-contraction → Removing special chars → Removing stopwords →Stemming

First, we define some helper functions needed for text processing.

De-contracting the English phrases —

def decontracted(phrase): #specific phrase = re.sub(r”won’t”, “will not”, phrase) phrase = re.sub(r”can’t”, “cannot”, phrase) # general phrase = re.sub(r”n’t”, “ not”, phrase) phrase = re.sub(r”’re”, “ are”, phrase) phrase = re.sub(r”’s”, “ is”, phrase) phrase = re.sub(r”’d”, “ would”, phrase) phrase = re.sub(r”’ll”, “ will”, phrase) phrase = re.sub(r”’t”, “ not”, phrase) phrase = re.sub(r”’ve”, “ have”, phrase) phrase = re.sub(r”’m”, “ am”, phrase) phrase = re.sub(r”’em”, “ them”, phrase) return phrase

(I prefer my own custom set of stopwords to the in-built ones. It helps to me readily modify the stopwords set depending on the problem)

stopwords = [‘i’, ‘me’, ‘my’, ‘myself’, ‘we’, ‘our’, ‘ours’, ‘ourselves’, ‘you’, “you’re”, “you’ve”, “you’ll”, “you’d”, ‘your’, ‘yours’, ‘yourself’, ‘yourselves’, ‘he’, ‘him’, ‘his’, ‘himself’, ‘she’, “she’s”, ‘her’, ‘hers’, ‘herself’, ‘it’, “it’s”, ‘its’, ‘itself’, ‘they’, ‘them’, ‘their’, ‘theirs’, ‘themselves’, ‘what’, ‘which’, ‘who’, ‘whom’, ‘this’, ‘that’, “that’ll”, ‘these’, ‘those’, ‘am’, ‘is’, ‘are’, ‘was’, ‘were’, ‘be’, ‘been’, ‘being’, ‘have’, ‘has’, ‘had’, ‘having’, ‘do’, ‘does’, ‘did’, ‘doing’, ‘a’, ‘an’, ‘the’, ‘and’, ‘but’, ‘if’, ‘or’, ‘because’, ‘as’, ‘until’, ‘while’, ‘of’, ‘at’, ‘by’, ‘for’, ‘with’, ‘about’, ‘against’, ‘between’, ‘into’, ‘through’, ‘during’, ‘before’, ‘after’, ‘above’, ‘below’, ‘to’, ‘from’, ‘up’, ‘down’, ‘in’, ‘out’, ‘on’, ‘off’, ‘over’, ‘under’, ‘again’, ‘further’, ‘then’, ‘once’, ‘here’, ‘there’, ‘when’, ‘where’, ‘why’, ‘how’, ‘all’, ‘any’, ‘both’, ‘each’, ‘few’, ‘more’, ‘most’, ‘other’, ‘some’, ‘such’, ‘only’, ‘own’, ‘same’, ‘so’, ‘than’, ‘too’, ‘very’, ‘s’, ‘t’, ‘can’, ‘will’, ‘just’, ‘don’, “don’t”, ‘should’, “should’ve”, ‘now’, ‘d’, ‘ll’, ‘m’, ‘o’, ‘re’, ‘ve’, ‘y’, ‘ain’, ‘aren’, “aren’t”, ‘couldn’, “couldn’t”, ‘didn’, “didn’t”, ‘doesn’, “doesn’t”, ‘hadn’, “hadn’t”, ‘hasn’, “hasn’t”, ‘haven’, “haven’t”, ‘isn’, “isn’t”, ‘ma’, ‘mightn’, “mightn’t”, ‘mustn’, “mustn’t”, ‘needn’, “needn’t”, ‘shan’, “shan’t”, ‘shouldn’, “shouldn’t”, ‘wasn’, “wasn’t”, ‘weren’, “weren’t”, ‘won’, “won’t”, ‘wouldn’, “wouldn’t”]

Alternatively, you can directly import stopwords from word cloud API —

from wordcloud import WordCloud, STOPWORDS stopwords = set(list(STOPWORDS))

Stemming using Porter stemmer —

def stemming(sentence): token_words = word_tokenize(sentence) stem_sentence = [] for word in token_words: stemmer = PorterStemmer() stem_sentence.append(stemmer.stem(word)) stem_sentence.append(“ “) return “”.join(stem_sentence)

Now that we’ve defined all the functions, let’s write a text pre-processing pipeline —

def text_preprocessing(text): preprocessed_abstract = [] for sentence in text: sent = decontracted(sentence) sent = re.sub(‘[^A-Za-z0–9]+’, ‘ ‘, sent) sent = ‘ ‘.join(e.lower() for e in sent.split() if e.lower() not in stopwords) sent = stemming(sent) preprocessed_abstract.append(sent.strip()) return preprocessed_abstract

Preprocessing the train data abstract text—

train_data[‘preprocessed_abstract’] = text_preprocessing(train_data[‘ABSTRACT’].values) train_data[[‘ABSTRACT’, ‘preprocessed_abstract’]].head()


Likewise, preprocessing the test dataset –

test_data[‘preprocessed_abstract’] = text_preprocessing(test_data[‘ABSTRACT’].values) test_data[[‘ABSTRACT’, ‘preprocessed_abstract’]].head()


Now we longer need the original ‘ABSTRACT’ column. You may drop this column from the datasets.

Text data encoding

Splitting train data into train and validation datasets —

X = train_data[[‘Computer Science’, ‘Mathematics’, ‘Physics’, ‘Statistics’, ‘preprocessed_abstract’]] y = train_data[target_cols] from sklearn.model_selection import train_test_split X_train, X_cv, y_train, y_cv = train_test_split(X, y, test_size = 0.25, random_state = 21) print(X_train.shape, y_train.shape) print(X_cv.shape, y_cv.shape)


As we can see, we have got ~ 10500 datapoints in our training set and ~3500 datapoints in the validation set.

TF-IDF vectorization of text data

Building vocabulary —

combined_vocab = list(train_data[‘preprocessed_abstract’]) + list(test_data[‘preprocessed_abstract’])

Yes, here I’ve knowingly committed a sin! I have used the complete train and test data for building vocabulary to train a model on it. Ideally, your model shouldn’t be seeing the test data.

vectorizer = TfidfVectorizer(min_df = 5, max_df = 0.5, sublinear_tf = True, ngram_range = (1, 1)) X_train_tfidf = vectorizer.transform(X_train[‘preprocessed_abstract’]) X_cv_tfidf = vectorizer.transform(X_cv[‘preprocessed_abstract’]) print(X_train_tfidf.shape, y_train.shape) print(X_cv_tfidf.shape, y_cv.shape)


After TF-IDF encoding we obtain 9136 features, each of them corresponding to a distinct word in the vocabulary.

Some important things you should know here —

I didn’t directly jump to a conclusion that I should go with TF-IDF vectorization. I tried different methods like BOW, W2V using a pre-trained GloVe model, etc. Among them, TF-IDF turned out to be the best performing so here I’m demonstrating only this.

It didn’t magically appear to me that I should be going with uni-grams. I tried bi-grams, tri-grams, and even four-grams; the model employing the unigrams gave the best performance among all.

Text data encoding is a tricky thing. Especially in competitions where even a difference of 0.001 in the performance metric can push you several places behind on the leaderboard. So, one should be open to trying different permutations & combinations at a rudimentary stage.

Before we proceed with modeling, we stack all the features(topic features + TF-IDF encoded text features) together for both train and test datasets respectively.

from scipy.sparse import hstack X_train_data_tfidf = hstack((X_train[topic_cols], X_train_tfidf)) X_cv_data_tfidf = hstack((X_cv[topic_cols], X_cv_tfidf)) Multi-label classification using OneVsRest Classifier

Until now we were only dealing with refining and vectorizing the feature variables. As we know, this is a multi-label classification problem and each document may have one or more predefined tags simultaneously. We already saw that several datapoints have 2 or 3 tags.

Most traditional machine learning algorithms are developed for single-label classification problems. Therefore a lot of approaches in the literature transform the multi-label problem into multiple single-label problems so that the existing single-label algorithms can be used.

(‘C’ denotes inverse of regularization strength. Smaller values specify stronger regularization).

from sklearn.multiclass import OneVsRestClassifier from sklearn.linear_model import LogisticRegression C_range = [0.01, 0.1, 1, 10, 100] for i in C_range: clf = OneVsRestClassifier(LogisticRegression(C = i, solver = ‘sag’)), y_train) y_pred_train = clf.predict(X_train_data_tfidf) y_pred_cv = clf.predict(X_cv_data_tfidf) f1_score_train = f1_score(y_train, y_pred_train, average = ‘micro’) f1_score_cv = f1_score(y_cv, y_pred_cv, average = ‘micro’) print(“C:”, i, “Train Score:”,f1_score_train, “CV Score:”, f1_score_cv) print(“- “*50)


We can see that the highest validation score is obtained at C = 10. But the training score here is also very high, which was kind of expected.

Let’s tune the hyper-parameter even further —

from sklearn.multiclass import OneVsRestClassifier from sklearn.linear_model import LogisticRegressionC_range = [10, 20, 40, 70, 100] for i in C_range: clf = OneVsRestClassifier(LogisticRegression(C = i, solver = ‘sag’)), y_train) y_pred_train = clf.predict(X_train_data_tfidf) y_pred_cv = clf.predict(X_cv_data_tfidf) f1_score_train = f1_score(y_train, y_pred_train, average = ‘micro’) f1_score_cv = f1_score(y_cv, y_pred_cv, average = ‘micro’) print(“C:”, i, “Train Score:”,f1_score_train, “CV Score:”, f1_score_cv) print(“- “*50)


The model with C = 20 gives the best score on the validation set. So, going further, we take C = 20.

If you notice, here we have used the default L2 penalty for regularization as the model with L2 gave me the best result among L1, L2, and elastic-net mixing.

Determining the right thresholds for OneVsRest Classifier

The default threshold in binary classification algorithms is 0.5. But this may not be the best threshold given the data and the performance metrics that we intend to maximize. As we know, the F1 score is given by —

A good threshold(for each distinct label) would be the one that maximizes the F1 score.

def get_best_thresholds(true, pred): thresholds = [i/100 for i in range(100)] best_thresholds = [] for idx in range(25): best_thresh = thresholds[np.argmax(f1_scores)] best_thresholds.append(best_thresh) return best_thresholds

In a nutshell, what the above function does is, for each of the 25 class labels, it computes the F1 scores corresponding to each of the hundred thresholds and then selects that threshold which returns the maximum F1 score for the given class label.

If the individual F1 score is high, the micro-average F1 will also be high. Let’s get the thresholds —

clf = OneVsRestClassifier(LogisticRegression(C = 20, solver = ‘sag’)), y_train) y_pred_train_proba = clf.predict_proba(X_train_data_tfidf) y_pred_cv_proba = clf.predict_proba(X_cv_data_tfidf) best_thresholds = get_best_thresholds(y_cv.values, y_pred_cv_proba) print(best_thresholds)


[0.45, 0.28, 0.19, 0.46, 0.24, 0.24, 0.24, 0.28, 0.22, 0.2, 0.22, 0.24, 0.24, 0.41, 0.32, 0.15, 0.21, 0.33, 0.33, 0.29, 0.16, 0.66, 0.33, 0.36, 0.4]

As you can see we have obtained a distinct threshold value for each class label. We’re going to use these same values in our final OneVsRest Classifier model. Making predictions using the above thresholds —

y_pred_cv = np.empty_like(y_pred_cv_proba)for i, thresh in enumerate(best_thresholds): print(f1_score(y_cv, y_pred_cv, average = ‘micro’))



Thus, we have managed to obtain a significantly better score using the variable thresholds.

So far we have performed hyper-parameter tuning on the validation set and managed to obtain the optimal hyperparameter (C = 20). Also, we tweaked the thresholds and obtained the right set of thresholds for which the F1 score is maximum.

Making a prediction on the test data using OneVsRest Classifier

Using the above parameters let’s move on to build train a full-fledged model on the entire training data and make a prediction on the test data.

# train and test data X_tr = train_data[[‘Computer Science’, ‘Mathematics’, ‘Physics’, ‘Statistics’, ‘preprocessed_abstract’]] y_tr = train_data[target_cols] X_te = test_data[[‘Computer Science’, ‘Mathematics’, ‘Physics’, ‘Statistics’, ‘preprocessed_abstract’]] # text data encoding X_tr_tfidf = vectorizer.transform(X_tr['preprocessed_abstract']) X_te_tfidf = vectorizer.transform(X_te['preprocessed_abstract']) # stacking X_tr_data_tfidf = hstack((X_tr[topic_cols], X_tr_tfidf)) X_te_data_tfidf = hstack((X_te[topic_cols], X_te_tfidf)) # modeling and making prediction with best thresholds clf = OneVsRestClassifier(LogisticRegression(C = 20)), y_tr) y_pred_tr_proba = clf.predict_proba(X_tr_data_tfidf) y_pred_te_proba = clf.predict_proba(X_te_data_tfidf) y_pred_te = np.empty_like(y_pred_te_proba) for i, thresh in enumerate(best_thresholds):

Once we obtain our test predictions, we attach them to the respective ids (as in the sample submission file) and make a submission in the designated format.

ss = pd.read_csv(‘SampleSubmission.csv’) ss[target_cols] = y_pred_te ss.to_csv(‘LR_tfidf10k_L2_C20.csv’, index = False)

The best thing about participating in the hackathons is that you get to experiment with different techniques, so when you encounter similar kind of problem in the future you have a fair understanding of what works and what doesn’t. And also you get to learn a lot from other participants by actively participating in the discussions.

You can find the complete code here on my GitHub profile.

About the Author


Data Modeling With Dax

Data Modeling with DAX – Concepts

Business Intelligence (BI) is gaining importance in several fields and organizations. Decision making and forecasting based on historical data have become crucial in the evergrowing competitive world. There is huge amount of data available both internally and externally from diversified sources for any type of data analysis.

However, the challenge is to extract the relevant data from the available big data as per the current requirements, and to store it in a way that is amicable for projecting different insights from the data. A data model thus obtained with the usage of key business terms is a valuable communication tool. The data model also needs to provide a quick way of generating reports on an as needed basis.

Data modeling for BI systems enables you to meet many of the data challenges.

Prerequisites for a Data Model for BI

A data model for BI should meet the requirements of the business for which data analysis is being done. Following are the minimum basics that any data model has to meet −

The data model needs to be Business Specific

A data model that is suitable for one line of business might not be suitable for a different line of business. Hence, the data model must be developed based on the specific business, the business terms used, the data types, and their relationships. It should be based on the objectives and the type of decisions made in the organization.

The data model needs to have built-in Intelligence

The data model should include built-in intelligence through metadata, hierarchies, and inheritances that facilitate efficient and effective Business Intelligence process. With this, you will be able to provide a common platform for different users, eliminating repetition of the process.

The data model needs to be Robust

The data model should precisely present the data specific to the business. It should enable effective disk and memory storage so as to facilitate quick processing and reporting.

The data model needs to be Scalable

The data model should be able to accommodate the changing business scenarios in a quick and efficient way. New data or new data types might have to be included. Data refreshes might have to be handled effectively.

Data Modeling for BI

Data modeling for BI consists of the following steps −

Shaping the data

Loading the data

Defining the relationships between the tables

Defining data types

Creating new data insights

Shaping the Data

The data required to build a data model can be from various sources and can be in different formats. You need to determine which portion of the data from each of these data sources is required for specific data analysis. This is called Shaping the Data.

For example, if you are retrieving the data of all the employees in an organization, you need to decide what details of each employee are relevant to the current context. In other words, you need to determine which columns of the employee table are required to be imported. This is because, the lesser the number of columns in a table in the data model, the faster will be the calculations on the table.

Loading the Data

You need to load the identified data – the data tables with the chosen columns in each of the tables.

Defining the Relationships Between Tables

Next, you need to define the logical relationships between the various tables that facilitate combining data from those tables, i.e. if you have a table – Products – containing data about the products and a table – Sales – with the various sales transactions of the products, by defining a relationship between the two tables, you can summarize the sales, product wise.

Defining Data Types

Identifying the appropriate data types for the data in the data model is crucial for the accuracy of calculations. For each column in each table that you have imported, you need to define the data type. For example, text data type, real number data type, integer data type, etc.

Creating New Data Insights

This is a crucial step in date modeling for BI. The data model that is built might have to be shared with several people who need to understand data trends and make the required decisions in a very short time. Hence, creating new data insights from the source data will be effective, avoiding rework on the analysis.

The new data insights can be in the form of metadata that can be easily understood and used by specific business people.

Data Analysis

Once the data model is ready, the data can be analyzed as per the requirement. Presenting the analysis results is also an important step because the decisions will be made based on the reports.


Predictive Modeling With Python Course (8 Courses, Online Certification)

About Predictive Modeling with Python Course

Course Name Online Predictive Modeling with Python Course

Deal You get access to all videos for the lifetime

Hours 31+ Video Hours

Core Coverage Learn how to analyze and visualize data using Python libraries.

Course Validity Lifetime Access

Eligibility Anyone who is serious about learning predictive modeling and wants to make a career in the data analytics field

Pre-Requisites Basis Statistical concepts and predictive modeling knowledge

What do you get? Certificate of Completion for the course

Certification Type Course Completion Certificates

Verifiable Certificates? Yes, you get verifiable certificates for each8 course, Projects with a unique link. These link can be included in your resume/Linkedin profile to showcase your enhanced skills

Type of Training Video Course – Self Paced Learning

Software Required None

System Requirement 1 GB RAM or higher

Other Requirement Speaker / Headphone

Predictive Modeling with Python Course Curriculum

Serial No Course Name Duration Description

1 Predictive Modeling with Python 9h 44m In this module, you will get an introduction to Predictive Modelling with Python. You will be guided through the installation of the required software. Data Pre-processing, which includes Data frame, splitting dataset, feature scaling, etc. You will gain an edge on Linear Regression, Salary Prediction, Logistic Regression. You will get to work on various datasets dealing with Credit Risk and Diabetes.

2 Machine Learning with Python Project – Predict Diabetes on Diagnostic Measures 1h 07m In this section, you will work on Pima Indians Diabetes using Machine Learning. You will be guided through the installation and will have practical lessons on Pima Classification, Splitting Dataset, Checking the ROC.

3 Project – Linear Regression in Python 2h 15ms You will be introduced to Linear Regression in Python in depth in this module. You will be learning about the use case and libraries and also regarding the graphical univariate analysis. Along with that, you will be taught Boxplot, Bivariate Analysis, etc.

4 Project on Python Data Science – Predicting the Survival of Passenger in Titanic 2h 11m Here you will learn about Import Libraries, Decision Tree Classifiers, Logistic Regression, Load libraries, bar plot, modeling, training set, etc.

5 Financial Analytics with Python 2h 11m This section emphasizes on the use of Python Libraries and the working of the Data Frames. In-depth study of Analytics and Financial Time series analysis along with data visualization, financial plots, and 3D Charts.

6 Project – Credit Default using Logistic Regression 3h 9m You will explain in detail about the project, the files that need to be imported, data pre-processing, splitting data, and confusion matrix. Topics like Hyper Parameter Tuning, Decision Tree Theory, Installation of Graph viz and Pydotplus, etc

7 Project – House Price Prediction using Linear Regression 2h 8m This project helps you to focus on coding feature engineering, handling missing values, exploratory data analysis, calculation variation inflation factor, etc.

8 Forecasting the Sales using Time Series Analysis in Python 2h 29 m This project emphasizes and will give you more insight into data processing and feature engineering along with graph visualization components.

Predictive Modeling with Python Course – Certificate of Completion

What is Predictive Modeling with Python?

Python is used for predictive modeling because Python-based frameworks give us results faster and also help in the planning of the next steps based on the results.

Which Skills will you learn in this Training


Our course ensures that you will be able to think with a predictive mindset and understand well the basics of the techniques used in prediction. Critical thinking is very important to validate models and interpret the results. Hence, our course material emphasizes on hardwiring this similar kind of thinking ability.

You will have good knowledge about the predictive modeling in python, linear regression, logistic regression, the fitting model with a sci-kit learn library, the fitting model with stat model library, ROC curves, backward elimination approach, stats model package, etc.


To get started with Predictive Modelling with Python a solid foundation in statistics is much appreciated. It takes a good amount of understanding to interpret those numbers to understand whether the numbers are adding up or not.

Along with the above-mentioned knowledge, one must know to code in Python.

Knowing SQL also acts as a complementary skillset.

Even if someone is not well equipped with the above-mentioned skill, it should not act as a hindrance as everything is possible with an honest effort and strong will.

Target Audience

This Predictive Modeling with Python Course can be taken up by anyone who shares a decent amount of interest in this field. The earlier someone starts the further they can reach. In the case of students who are pursuing a course in statistics, or computer science graduates it is a very good opportunity to direct your career in that direction. As this is a much demand skill every IT professional is looking for a good switch and entering the domain of predictive analysis.

After successfully having hands-on with Predictive Analysis you get open up career opportunities within job roles like that of a Data Analyst, Data Scientist, Business Analyst, Market Research Analyst, Quality Engineer, Solution Architect, Programmer Analyst, Statistical Analyst, Statistician, etc.

Predictive Modeling with Python Course – FAQ’s How is predictive modeling different from that of forecasting?

In Predictive modeling, we use data mining and probability to forecast the outcomes. There are several predictors which are variables that influence future results. Once the data is fetched for relevant predictors, a statistical model is formulated.

Do I receive a certificate at the end of completing this Predictive Modeling with the Python Course?

Yes, a certificate is handed out on completing the online training bundle. You can issue for the certificate once you have completed 70% of the course content for this particular course.

I am a working professional, is this Predictive Modeling with Python Course for me?

This is a self-paced course and this can be taken up by anyone who has interests in this subject and can be completed even from the comfort of your home.

Sample Preview

Career Benefits

Predictive modeling is a field which has immense growth in line in due years to come due to the definite explosion of data that we are noticing. In the year 2023, it was forecasted by IBM that the demand for data scientists and analytical professionals will grow by 15% in the year 2023.

Many companies have realized the importance of using predictive modeling for their business but currently, there is a shortage of skilled professionals. A substantial amount of salaries is offered to people with this skillset because of the nature of the job.

The demand for qualified candidates is increasing at a significant rate.

It is the right time to invest in learning for such a niche skill as the market for predictive analytics is not coming down any sooner. EDUCBA is the right platform for getting you to achieve your goals as we understand the need of the industry and update our course and course content accordingly.


Great course


Nyckees Daan

Predictive Modeling


Lee Tze Hui

Glad that I enrolled


Rakibul Hossein

How To Build Grit With Meditation: 3 Simple Techniques

Listening to Angela Lee Duckworth’s wonderful TED talk on “Grit: The power of passion and perseverance”, I was reminded of my own workshops on enhancing creativity and determination.

I have discovered that the best way to cultivate and sustain a growth mindset over the longer term lies in daily acts of creativity and resilience.

Creativity enhances our love for learning new things, while resilience boosts personal power and the will to carry on through challenges.

Awakening the Creativity Center

The Creativity Center is found three inches below our diaphragm. We hold our passions here, our dreams, fantasies, and latent ambitions.

It governs our sense of self-worth, as well as our ability to be open and friendly toward others and try new things.

When the Creativity Center is in balance, we exhibit tolerance, positive outlook, and refined behavior.

We use creative energy when we cook, bake, or paint. We awaken creative energy when we visit an art museum or read an inspiring biography.

We use creative energy every time we try something new – even when we take a different route on our way back home.

We are all born as creative beings. As children, we all color, paint, or make wonderful shapes.

We used to break into song or dance at will.

But somehow along the way, some of us transform into less creative beings due to societal or economic pressures. We become rigid that we find it hard to learn or invent something new.

The good news is that we can train our mind with simple techniques to become more open and receptive to change.

Meditative Technique 1: Alternate Nostril Breathing 

“Just as the activities of the mind influence the breath, so does the breath influence our state of mind.” ~T.K.V. Desikachar

“Nadi Shodhana” refers to an alternate nostril breathing technique. Whenever I find myself losing focus, worrying about a new project, or learning a new area, I practice this technique to broaden my state of mind.

To practice this technique, I follow these three steps:

I close my right nostril with the first two fingers of my right hand.

I inhale and exhale through the left nostril, for 8 to 10 breaths.

Then, closing my left nostril with the first two fingers in the left hand, I inhale and exhale.

Usually, I repeat this 2-4 times with symmetry on both sides.

Practiced weekly or daily, “Nadi Shodhana” often starts a flywheel of creativity and opens the mind to new possibilities.

Meditative Technique 2: The Moon-Energy Meditation  

The moon is a powerful symbol that reminds us of constant change. I find this symbolism especially powerful when I need to let go of past failure and embrace transition.

This meditation is most effective when practiced every night from the start of a lunar cycle.

I find a comfortable seat near a window, with a view of the moon. I sit tall with a straight spine and gaze at the moon.

Gently closing the eyes and deepening the breath, I go inwards. I imagine the subconscious combine creative energy with a sense of purpose and personal power. I often visualize this as a spinning, orange-colored circle of light traveling from the base of the spine, to the area behind the navel, to the diaphragm.

Continuing to breathe deeply for a few more minutes, I meditate on the following words before transitioning back to my surroundings.

“I trust myself to follow my dreams. I can adapt with grace to any situation. I release ideas that are no longer useful.”

This is a great meditation that reminds me of the constant flow of time through the universe. I even mix things up by playing soothing background music, lying down vs. sitting, or sitting near a water fountain.

I also weave in yoga poses like Triangle, Dancer, or Gate when I need to strengthen the effects.

It’s important to keep in mind that adaptability is not always the solution.

Sometimes, I find that I need to tap into my personal power to channel my energies away from an existing situation to a new environment that fulfills my true potential. For that I use the Breath of Fire meditation below.

Finding Your Personal Power

The energy center for personal power is located near the solar plexus. It governs self-esteem and determination, and enables transformation.

When the Power Center is in balance, we feel self-confident, have a strong sense of purpose, and are self-motivated.

When imbalanced, we suffer from low self-esteem, have difficulty making decisions, and may have anger or control issues. These behavioral traits can keep us from focusing on the long-term or achieving our full potential.

Boat and Warrior poses are great for boosting personal power. I also use the Breath of Fire meditation at the end of a tough workday to recharge and transition to family time, or at the start of a busy day to refocus.

Meditative Technique 3: The Breath of Fire 

Before getting started with this meditation, I usually need to strengthen my core.

When returning to this meditation after a while, it often takes me a few weeks to build up to the full practice (4-5 min meditation with several forceful breaths per second).

I start by lighting a candle of my choice. Basking in the warmth of the candle, I sit up tall, lengthening the space between my tailbone and my heart:

Breathing in through the nose, I expand my lung cavity and imagine the abdominal cavity filling with air.

On exhale, I forcefully draw the abdominal muscles toward my spine and push the air out through my lungs and nose. The exhales are loud and quick, and sound like waves in a stormy sea.

Starting with an interval of 30 seconds between each breath, I slowly pick up the pace to repeat about 10 times. I try to equalize the duration of inhale and exhale.

After the meditation, I often pause and remain seated and think of a current experience that is not going well. Thinking of my intentions for the situation and visions of the future, I meditate on the following phrases:

“I claim my power and accept responsibility for every part of my life. My enthusiasm empowers me to achieve my goals. My personal power equips me to overcome all challenges and excel.”

Alternatively, I simply repeat the sound “ram” which activates the Power Center.

The Breath Of Fire is an ancient Vedic technique with immense benefits. It is frequently used in modern day yoga as a cleansing ritual, to kick start a feeling of empowerment and transformation.


When I find myself in a new role or a project that stretches out of my comfort zone, I weave in these techniques in the morning or at night.

More importantly, I make it a habit to indulge myself in daily acts of fun and creativity.

In immersing myself in what I love, I rediscover my passions over and over again, and focus my personal power on what matters most in the long-term.

More Resources:

Update the detailed information about Topic Modeling With Ml Techniques on the website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!