付録A: 環境構築ガイド

本書のコード例を実行するための環境構築手順を説明します。

A.1 必要な環境

推奨環境仕様:

OS: Ubuntu 22.04 LTS以降または macOS 12以降
Python: 3.10以上（3.11推奨）
メモリ: 16GB以上（推奨32GB）
ストレージ: 100GB以上の空き容量
GPU: CUDA対応GPU（深層学習用、オプション。CUDAバージョンは使用するtorchの公式対応表に従う）

A.2 Dockerを使用した環境構築

# Dockerfile
FROM ubuntu:22.04

# 基本ツールのインストール
RUN apt-get update && apt-get install -y \
    python3.10 \
    python3-pip \
    python3.10-venv \
    r-base \
    git \
    wget \
    build-essential

# Pythonパッケージ
COPY requirements.txt .
RUN pip3 install -r requirements.txt

# Rパッケージ
RUN R -e "install.packages('BiocManager')"
RUN R -e "BiocManager::install(c('Biostrings', 'GenomicRanges'))"

# バイオインフォマティクスツール
RUN apt-get install -y \
    bwa \
    samtools \
    bcftools \
    blast2

WORKDIR /workspace

A.3 依存関係管理

# requirements.txt
numpy>=1.20.0,<1.24.0
pandas>=1.3.0,<2.0.0
scipy>=1.7.0,<1.10.0
scikit-learn>=1.0.0,<1.3.0
tensorflow>=2.8.0,<2.12.0
torch>=1.10.0,<2.0.0
biopython>=1.79
matplotlib>=3.4.0
seaborn>=0.11.0
jupyterlab>=3.0.0
scanpy>=1.8.0
anndata>=0.8.0

A.4 トラブルシューティング

よくある問題と解決法:

メモリ不足エラー

# メモリ効率的な読み込み
def read_large_fasta(filename, chunk_size=1000):
    """大きなFASTAファイルをチャンク単位で読み込む"""
    from Bio import SeqIO
    
    for i, batch in enumerate(batch_iterator(SeqIO.parse(filename, "fasta"), chunk_size)):
        yield batch

def batch_iterator(iterator, batch_size):
    """イテレータをバッチに分割"""
    batch = []
    for entry in iterator:
        batch.append(entry)
        if len(batch) == batch_size:
            yield batch
            batch = []
    if batch:
        yield batch

GPU認識エラー

# CUDA環境確認
nvidia-smi
python3 -c "import torch; print(torch.cuda.is_available())"