Tutorial 3: Stereo-seq

In this tutorial, we demonstrate how to apply GraphST to Stereo-seq data for spatial domains identification. We take mouse embryo 9.5 data as example and set the number of clusters as 22. Mouse embryo Stereo-seq data were downloaded from https://db.cngb.org/stomics/mosta/ and provided at https://drive.google.com/drive/folders/1QWHFMzhQ7WorVNLwx88xT-rbojf4nh9T.

Before running the model, please download input data by the link above.

[1]:
import os
import torch
import pandas as pd
import scanpy as sc
from sklearn import metrics
import multiprocessing as mp
/home/yahui/anaconda3/envs/STGAT/lib/python3.8/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
[2]:
from GraphST import GraphST
[3]:
dataset = 'Mouse_Embryo'
[4]:
# Run device,by default, the package is implemented on 'cpu'. We recommend using GPU.
device = torch.device('cuda:3' if torch.cuda.is_available() else 'cpu')

# the location of R, which is necessary for mclust algorithm. Please replace it with local R installation path
os.environ['R_HOME'] = '/scbio4/tools/R/R-4.0.3_openblas/R-4.0.3'
[5]:
# the number of clusters
n_clusters = 22

Reading data

[6]:
# read data
file_path = '/home/yahui/anaconda3/work/CellCluster_DEC/data//Mouse_Embryo/' #please replace 'file_path' with the download path
adata = sc.read_h5ad(file_path + 'E9.5_E1S1.MOSTA.h5ad')
adata.var_names_make_unique()

Implementing GraphST for spatial clustering

[7]:
# define model
model = GraphST.GraphST(adata, datatype='Stereo', device=device)

# run model
adata = model.train()
/home/yahui/anaconda3/envs/STGAT/lib/python3.8/site-packages/scanpy/preprocessing/_highly_variable_genes.py:62: UserWarning: `flavor='seurat_v3'` expects raw count data, but non-integers were found.
  warnings.warn(
Graph constructed!
Building sparse matrix ...
Begin to train ST data...
100%|███████████████████████████████████████████████████████████████████████████████| 600/600 [00:14<00:00, 42.39it/s]
Optimization finished for ST data!

Spatial clustering

After model training, the representation for spots are generated and used as input of clustering tool for spatial clustering. Here we provid three available kinds of tools for spaital clustering, including mclust, leiden, and louvain. In our experiment, we find mclust performs better than leiden and louvain on spatial data in most cases. Therefore, we recommend using mclust.

[8]:
# clustering
from GraphST.utils import clustering

tool = 'mclust' # mclust, leiden, and louvain

# clustering
from GraphST.utils import clustering

if tool == 'mclust':
   clustering(adata, n_clusters, method=tool)
elif tool in ['leiden', 'louvain']:
   clustering(adata, n_clusters, method=tool, start=0.1, end=2.0, increment=0.01)
R[write to console]:     __  ___________    __  _____________
   /  |/  / ____/ /   / / / / ___/_  __/
  / /|_/ / /   / /   / / / /\__ \ / /
 / /  / / /___/ /___/ /_/ /___/ // /
/_/  /_/\____/_____/\____//____//_/    version 5.4.9
Type 'citation("mclust")' for citing this R package in publications.

fitting ...
  |======================================================================| 100%

Visualization

[10]:
#import matplotlib.pyplot as plt
#adata.obsm['spatial'][:, 1] = -1*adata.obsm['spatial'][:, 1]
#plt.rcParams["figure.figsize"] = (3, 4)
#plot_color=["#F56867","#556B2F","#C798EE","#59BE86","#006400","#8470FF",
#            "#CD69C9","#EE7621","#B22222","#FFD700","#CD5555","#DB4C6C",
#            "#8B658B","#1E90FF","#AF5F3C","#CAFF70", "#F9BD3F","#DAB370",
#           "#877F6C","#268785", '#82EF2D', '#B4EEB4']

#ax = sc.pl.embedding(adata, basis="spatial",
#                     color="domain",
#                     s=30,
#                     show=False,
#                     palette=plot_color,
#                     title='GraphST')
#ax.axis('off')
#ax.set_title('Mouse Embryo E9.5')