Run STitch3D on the adult mouse brain dataset
In this tutorial, we show STitch3D’s analysis of the adult mouse brain dataset.
The spatial transcriptomics mouse brain data are publicly available at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE147747. The mouse brain dataset profiled by 10x Genomics Chromium platform is available at https://www.ebi.ac.uk/biostudies/arrayexpress/studies/E-MTAB-11115.
Import packages
[1]:
import pandas as pd
import numpy as np
import scanpy as sc
import anndata as ad
import os
import sys
import STitch3D
import warnings
warnings.filterwarnings("ignore")
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
Preprocessing
Load datasets
Load spatial transcriptomics datasets:
[2]:
count = pd.read_csv("./data/GSE147747_expr_raw_counts_table.tsv",
sep="\t", index_col=0)
meta = pd.read_csv("./data/GSE147747_meta_table.tsv",
sep="\t", index_col=0)
adata_st_all = ad.AnnData(X = count.values)
adata_st_all.obs.index = count.index
adata_st_all.obs = meta.loc[adata_st_all.obs.index]
adata_st_all.var.index = count.columns
slice_use = ['01A', '02A', '03A', '04B', '05A', '06B', '07A', '08B', '09A', '10B',
'11A', '12A', '13B', '14A', '15A', '16A', '17A', '18A', '19A', '20B',
'21A', '22A', '23A', '24A', '25A', '26A', '27A', '28A', '29A', '30A',
'31A', '32A', '33A', '34A', '35A',]
meta_slice = pd.read_csv("./data/molecular-atlas-master/data/slicestable.tsv",
index_col=0, sep="\t")
slice_AP = [float(meta_slice.loc[s]['AP']) * 1000 for s in slice_use]
slice_dist_micron = [-(slice_AP[i+1] - slice_AP[i]) for i in range(len(slice_AP)-1)]
adata_st_all = adata_st_all[[s in slice_use for s in adata_st_all.obs.section_index.values]]
sc.pp.filter_cells(adata_st_all, min_genes=1)
adata_st_all.obsm['spatial'] = np.concatenate((adata_st_all.obs['HE_X'].values.reshape(-1, 1),
adata_st_all.obs['HE_Y'].values.reshape(-1, 1)), axis=1)
adata_st_list_raw = []
for slice_id in sorted(set(adata_st_all.obs.section_index.values)):
adata_st_i = adata_st_all[adata_st_all.obs.section_index.values == slice_id]
array_col = np.array([np.float(loc.split("_")[1].split("x")[0]) for loc in adata_st_i.obs.index])
if slice_id == '01A':
array_col = array_col - 0.3
array_row = np.array([np.float(loc.split("_")[1].split("x")[1]) for loc in adata_st_i.obs.index])
adata_st_i.obs['array_col'] = np.round(array_col)
adata_st_i.obs['array_row'] = np.round(array_row)
adata_st_i.obsm['loc_use'] = np.concatenate((adata_st_i.obs['array_row'].values.reshape(-1, 1),
adata_st_i.obs['array_col'].values.reshape(-1, 1)), axis=1)
adata_st_list_raw.append(adata_st_i.copy())
for i in range(1, 6): # adjust spatial locations
adata_st_list_raw[i].obsm['loc_use'] = np.concatenate((adata_st_list_raw[i].obsm['loc_use'][:, 0].reshape(-1, 1),
2*np.mean(adata_st_list_raw[i].obsm['loc_use'][:, 1])-adata_st_list_raw[i].obsm['loc_use'][:, 1].reshape(-1, 1)), axis=1)
Load single-cell reference dataset:
[3]:
adata_ref = ad.read_h5ad("./data/MouseBrain/all_cells_20200625.h5ad")
adata_ref.var.index = adata_ref.var["SYMBOL"].astype(str)
adata_ref.var_names_make_unique()
labels = pd.read_csv("./data/MouseBrain/snRNA_annotation_astro_subtypes_refined59_20200823.csv", index_col=0)
labels = labels.reindex(index=adata_ref.obs_names)
adata_ref.obs[labels.columns] = labels
adata_ref = adata_ref[~adata_ref.obs['annotation_1'].isna(), :]
adata_ref.obs['celltype'] = adata_ref.obs['annotation_1']
Alignment of ST tissue slices
[4]:
adata_st_list = STitch3D.utils.align_spots(adata_st_list_raw,
data_type = "ST",
coor_key="loc_use",
plot=True)
Using the Iterative Closest Point algorithm for alignemnt.
Detecting edges...
Aligning edges...
Selecting highly variable genes and building 3D spatial graph
[5]:
adata_st, adata_basis = STitch3D.utils.preprocess(adata_st_list,
adata_ref,
sample_col="sample",
coor_key="loc_use",
slice_dist_micron=slice_dist_micron,
c2c_dist=200.,
n_hvg_group=500)
Finding highly variable genes...
6227 highly variable genes selected.
Calculate basis for deconvolution...
6 batches are used for computing the basis vector of cell type <Astro_AMY>.
6 batches are used for computing the basis vector of cell type <Astro_AMY_CTX>.
6 batches are used for computing the basis vector of cell type <Astro_CTX>.
6 batches are used for computing the basis vector of cell type <Astro_HPC>.
6 batches are used for computing the basis vector of cell type <Astro_HYPO>.
4 batches are used for computing the basis vector of cell type <Astro_STR>.
5 batches are used for computing the basis vector of cell type <Astro_THAL_hab>.
6 batches are used for computing the basis vector of cell type <Astro_THAL_lat>.
6 batches are used for computing the basis vector of cell type <Astro_THAL_med>.
6 batches are used for computing the basis vector of cell type <Astro_WM>.
4 batches are used for computing the basis vector of cell type <Endo>.
6 batches are used for computing the basis vector of cell type <Ext_Amy_1>.
6 batches are used for computing the basis vector of cell type <Ext_Amy_2>.
6 batches are used for computing the basis vector of cell type <Ext_ClauPyr>.
6 batches are used for computing the basis vector of cell type <Ext_Hpc_CA1>.
4 batches are used for computing the basis vector of cell type <Ext_Hpc_CA2>.
6 batches are used for computing the basis vector of cell type <Ext_Hpc_CA3>.
6 batches are used for computing the basis vector of cell type <Ext_Hpc_DG1>.
6 batches are used for computing the basis vector of cell type <Ext_Hpc_DG2>.
6 batches are used for computing the basis vector of cell type <Ext_L23>.
6 batches are used for computing the basis vector of cell type <Ext_L25>.
6 batches are used for computing the basis vector of cell type <Ext_L56>.
6 batches are used for computing the basis vector of cell type <Ext_L5_1>.
6 batches are used for computing the basis vector of cell type <Ext_L5_2>.
6 batches are used for computing the basis vector of cell type <Ext_L5_3>.
6 batches are used for computing the basis vector of cell type <Ext_L6>.
6 batches are used for computing the basis vector of cell type <Ext_L6B>.
5 batches are used for computing the basis vector of cell type <Ext_Med>.
6 batches are used for computing the basis vector of cell type <Ext_Pir>.
6 batches are used for computing the basis vector of cell type <Ext_Thal_1>.
6 batches are used for computing the basis vector of cell type <Ext_Thal_2>.
5 batches are used for computing the basis vector of cell type <Ext_Unk_1>.
3 batches are used for computing the basis vector of cell type <Ext_Unk_2>.
5 batches are used for computing the basis vector of cell type <Ext_Unk_3>.
6 batches are used for computing the basis vector of cell type <Inh_1>.
6 batches are used for computing the basis vector of cell type <Inh_2>.
6 batches are used for computing the basis vector of cell type <Inh_3>.
6 batches are used for computing the basis vector of cell type <Inh_4>.
5 batches are used for computing the basis vector of cell type <Inh_5>.
6 batches are used for computing the basis vector of cell type <Inh_6>.
6 batches are used for computing the basis vector of cell type <Inh_Lamp5>.
6 batches are used for computing the basis vector of cell type <Inh_Meis2_1>.
6 batches are used for computing the basis vector of cell type <Inh_Meis2_2>.
6 batches are used for computing the basis vector of cell type <Inh_Meis2_3>.
6 batches are used for computing the basis vector of cell type <Inh_Meis2_4>.
6 batches are used for computing the basis vector of cell type <Inh_Pvalb>.
6 batches are used for computing the basis vector of cell type <Inh_Sst>.
6 batches are used for computing the basis vector of cell type <Inh_Vip>.
6 batches are used for computing the basis vector of cell type <LowQ_1>.
6 batches are used for computing the basis vector of cell type <LowQ_2>.
6 batches are used for computing the basis vector of cell type <Micro>.
6 batches are used for computing the basis vector of cell type <Nb_1>.
6 batches are used for computing the basis vector of cell type <Nb_2>.
6 batches are used for computing the basis vector of cell type <OPC_1>.
6 batches are used for computing the basis vector of cell type <OPC_2>.
6 batches are used for computing the basis vector of cell type <Oligo_1>.
6 batches are used for computing the basis vector of cell type <Oligo_2>.
6 batches are used for computing the basis vector of cell type <Unk_1>.
6 batches are used for computing the basis vector of cell type <Unk_2>.
Preprocess ST data...
Start building a graph...
Radius for graph connection is 1.1000.
6.2781 neighbors per cell on average.
Running STitch3D model
[6]:
model = STitch3D.model.Model(adata_st, adata_basis)
model.train()
0%| | 1/20000 [00:00<2:34:30, 2.16it/s]
Step: 0, Loss: 5119.6099, d_loss: 5114.2056, f_loss: 54.0438
10%|█ | 2001/20000 [09:03<1:23:01, 3.61it/s]
Step: 2000, Loss: -1540.9728, d_loss: -1543.6028, f_loss: 26.3004
20%|██ | 4001/20000 [18:06<1:13:40, 3.62it/s]
Step: 4000, Loss: -1753.4928, d_loss: -1756.0886, f_loss: 25.9588
30%|███ | 6001/20000 [27:10<1:04:31, 3.62it/s]
Step: 6000, Loss: -1802.1528, d_loss: -1804.7277, f_loss: 25.7478
40%|████ | 8001/20000 [36:13<55:30, 3.60it/s]
Step: 8000, Loss: -1811.4734, d_loss: -1814.0398, f_loss: 25.6645
50%|█████ | 10001/20000 [45:17<46:09, 3.61it/s]
Step: 10000, Loss: -1815.8132, d_loss: -1818.3693, f_loss: 25.5607
60%|██████ | 12001/20000 [54:21<36:59, 3.60it/s]
Step: 12000, Loss: -1818.3048, d_loss: -1820.8562, f_loss: 25.5137
70%|███████ | 14001/20000 [1:03:25<27:45, 3.60it/s]
Step: 14000, Loss: -1818.5482, d_loss: -1821.1080, f_loss: 25.5986
80%|████████ | 16001/20000 [1:12:28<18:25, 3.62it/s]
Step: 16000, Loss: -1817.8567, d_loss: -1820.4270, f_loss: 25.7029
90%|█████████ | 18001/20000 [1:21:31<09:14, 3.61it/s]
Step: 18000, Loss: -1824.0372, d_loss: -1826.5833, f_loss: 25.4606
100%|██████████| 20000/20000 [1:30:34<00:00, 3.68it/s]
[7]:
save_path = "./results_mouse_brain"
result = model.eval(adata_st_list_raw, save=True, output_path=save_path)